R
is not a very good language for using loops like for
and while
. Depending on the number of replications and their complexity, execution may be slow.
However, it has some functions that facilitate the work of those who want to repeat the same calculation many times. Some of these functions are in the *apply
family, such as apply
, sapply
and lapply
.
Take, for example, the data set below. It has 5 columns, each with 100 observations. All have normal distribution with mean 0 and standard deviation 1:
n <- 100 # tamanho amostral
r <- 5 # quantidade de amostras
dados <- data.frame(matrix(rnorm(n*r, mean=0, sd=1), ncol=5))
If I want to test the normality of each column in this dataset, just run
apply(dados, 2, shapiro.test)
where
-
dados
: is the mu data set
-
2
: indicates that I'm going to apply a function to every column of dados
. If I had put 1
, this function would be applied on the lines of dados
-
shapiro.test
: indicates the function that I'm going to apply to each column ( 2
in the above item) of dados
The result is the following:
$X1
Shapiro-Wilk normality test
data: newX[, i]
W = 0.98757, p-value = 0.4773
$X2
Shapiro-Wilk normality test
data: newX[, i]
W = 0.98678, p-value = 0.4228
$X3
Shapiro-Wilk normality test
data: newX[, i]
W = 0.95448, p-value = 0.001656
$X4
Shapiro-Wilk normality test
data: newX[, i]
W = 0.98871, p-value = 0.5622
$X5
Shapiro-Wilk normality test
data: newX[, i]
W = 0.98234, p-value = 0.2015
See that in each column the Shapiro-Wilk test was applied and we got the value of the statistic and the p-value associated with it.