Whenever possible, avoid using for
in R
. It is computationally slow and can lead to silly mistakes. For example, doing for
starting this way
for(i in 199501:201703)
It will take you to consider the months 199501, 199502, ..., 199512, 199513, 199514 and so on. Not a good idea.
Another problem is to save something inside a position reserved for number ( dados[i]
) something that has two dimensions ( subset(dados,data==i)
). This will not work. The ideal is to save these results within a list. Also, you were trying to save new objects inside the old object, thus creating a recipe for the loop not to work.
Assuming your dataset is named dados
and it has a column with dates named data
, one way to solve this problem using for
is as follows:
dadosLista <- list()
for (i in unique(dados$data)){
dadosLista[[i]] <- subset(dados, data==i)
}
This will generate a small drawback that the first 199500 positions of the dadosLista
list will be NULL
, and all positions that do not have a corresponding year and month, type 199533, will be NULL as well. The advantage is that the command
dadosLista[[199803]]
will return the data for March 1998. You can remove the NULL
by turning
dadosLista <- Filter(Negate(is.null), dadosLista)
The problem with doing this is that you lose references to the indexes of years and months. There is no free lunch.
However, there is a better solution. Assuming your dataset is named dados
and it has a column with dates named data
, do the following:
dadosLista <- split(dados, dados$data)
This will put your data in a list. It will be possible to access each of the separate data sets through commands similar to
dadosLista$199501
Thus, each position in the list will be identified by a name, identical to the desired year and month, and not by a number. It will make the code more organized, cleaner and, I believe, run faster than if you used a for
.