# Creating an array with variables with different correlations in R?

5

I need to generate data series that have correlations defined using R. I used a method that I found here in the OS (#) and managed to create the variables with the desired correlation, however, when trying to automate this process for the creation of 1000 estimates and for different correlations, the result obtained is a 1000x5 matrix with all the identical values. The code I'm using is as follows:

``````set.seed(2423049)
corr = matrix(,1000,5)
for(k in 1:5){
for (i in 1:5){
for(j in 1:1000){
rho = c(-0.7,-0.3,0,0.1,0.5) # correlações que preciso utilizar

xstar=rnorm(1000,2,2) # x* com distribuicao normal N(2,2)

a2=rnorm(1000,2,2) # parametro criado para obter w a partir da correlacao rho com x*

w = rho[k]*xstar+sqrt(1-rho[k]^2)*a2 # w calculado a partir de uma correlacao definida com x

corr[j,i]=cor(xstar,w) # matriz de correlacoes entre x* e w
}
}
}
``````

Through this process, the result was a 1000x5 array where all values were 0.5499732

What am I doing wrong?

asked by anonymous 31.03.2014 / 23:03

3

Pedro,

First, you have an extra loop in your code. Note that you are generating a 1000 by 5 array. Then you start a loop by `k` (correlations), then `i` (columns), and then `j` (rows). See that you run for each `k` 5 columns and 1000 rows, that is, every `k` you are writing over all previous results. So at the end you will only save the results of the last `k` ( `rho=0.5` ) in the array.

To avoid this problem the loop should be just something like:

``````for (i in 1:5){
for(j in 1:1000){
rho = c(-0.7,-0.3,0,0.1,0.5) # correlações que preciso utilizar

xstar=rnorm(1000,0,1) # x* com distribuicao normal N(2,2)

a2=rnorm(1000,0,1) # parametro criado para obter w a partir da correlacao rho com x*

w = rho[i]*xstar+sqrt(1-rho[i]^2)*a2 # w calculado a partir de uma correlacao definida com x

corr[j,i]=cor(xstar,w) # matriz de correlacoes entre x* e w
}
}
``````

However, note that I changed the variables to normal with zero mean and standard deviation one, since this formula you are using `w = rho[k]*xstar+sqrt(1-rho[k]^2)*a2` is only for Normal (0,1).

To generate multiple arbitrarily related variables, you can use the `MASS` package. In the case of normal, you can use the `mvrnorm` function, it would look something like:

``````rho = c(-0.7,-0.3,0,0.1,0.5)
library(MASS)

### definindo uma função para gerar variáveis correlacionadas
### rho é a correlação, mu é o vetor de médias, e var o vetor de variâncias
sim.cor <- function(rho,mu=c(2,2), var=c(2,2), n=1000, sim=1000){
correlacoes <- vector(length=sim)
cov <- rho*sqrt(var)*sqrt(var)
for (i in 1:sim){
simulacao <- mvrnorm(n=n, mu=mu, Sigma=matrix(c(var,cov, cov, var), ncol=2))
correlacoes[i] <- cor(simulacao[,1], simulacao[,2])
}
correlacoes
}

### aplicando a função para cada rho