How to create a loop that turns columns into variables and returns shapiro.test at the end?

3

I have several .csv files with a large number of columns. I would like to optimize the work by creating a function that reads the columns and returns the normality test result ( shapiro.test ) of each of them.

    data <- read.csv2("C:/Users/z/Desktop/CSVFOREST_WB.csv")

tnorm <- function(x){
  for (a in x) {
    a = x[[1,]]
    return(shapiro.test(a))

}  

                     }
tnorm(data)

The code, of course, returns error. What can I do?

    
asked by anonymous 03.09.2018 / 07:40

1 answer

6

R is not a very good language for using loops like for and while . Depending on the number of replications and their complexity, execution may be slow.

However, it has some functions that facilitate the work of those who want to repeat the same calculation many times. Some of these functions are in the *apply family, such as apply , sapply and lapply .

Take, for example, the data set below. It has 5 columns, each with 100 observations. All have normal distribution with mean 0 and standard deviation 1:

n <- 100 # tamanho amostral
r <- 5   # quantidade de amostras

dados <- data.frame(matrix(rnorm(n*r, mean=0, sd=1), ncol=5))

If I want to test the normality of each column in this dataset, just run

apply(dados, 2, shapiro.test)

where

  • dados : is the mu data set

  • 2 : indicates that I'm going to apply a function to every column of dados . If I had put 1 , this function would be applied on the lines of dados

  • shapiro.test : indicates the function that I'm going to apply to each column ( 2 in the above item) of dados

The result is the following:

$X1

    Shapiro-Wilk normality test

data:  newX[, i]
W = 0.98757, p-value = 0.4773


$X2

    Shapiro-Wilk normality test

data:  newX[, i]
W = 0.98678, p-value = 0.4228


$X3

    Shapiro-Wilk normality test

data:  newX[, i]
W = 0.95448, p-value = 0.001656


$X4

    Shapiro-Wilk normality test

data:  newX[, i]
W = 0.98871, p-value = 0.5622


$X5

    Shapiro-Wilk normality test

data:  newX[, i]
W = 0.98234, p-value = 0.2015

See that in each column the Shapiro-Wilk test was applied and we got the value of the statistic and the p-value associated with it.

    
03.09.2018 / 12:15