Split base with "for" in R

4

As my first for in R I found it difficult to apply this function. I have a base that has a base date with different years and I would like to split the base by base dates.

The variable "date" has a split date from January 1995 (199501) until March 2017 (201703).

With this, I tried to divide the following unsuccessfully:

for(i in 199501:201703){
dados[i]<-
subset(dados,data==i)
}

Do you know where you have good material on this function?

    
asked by anonymous 29.06.2017 / 22:53

1 answer

9

Whenever possible, avoid using for in R . It is computationally slow and can lead to silly mistakes. For example, doing for starting this way

for(i in 199501:201703)

It will take you to consider the months 199501, 199502, ..., 199512, 199513, 199514 and so on. Not a good idea.

Another problem is to save something inside a position reserved for number ( dados[i] ) something that has two dimensions ( subset(dados,data==i) ). This will not work. The ideal is to save these results within a list. Also, you were trying to save new objects inside the old object, thus creating a recipe for the loop not to work.

Assuming your dataset is named dados and it has a column with dates named data , one way to solve this problem using for is as follows:

dadosLista <- list()

for (i in unique(dados$data)){
  dadosLista[[i]] <- subset(dados, data==i)
}

This will generate a small drawback that the first 199500 positions of the dadosLista list will be NULL , and all positions that do not have a corresponding year and month, type 199533, will be NULL as well. The advantage is that the command

dadosLista[[199803]]

will return the data for March 1998. You can remove the NULL by turning

dadosLista <- Filter(Negate(is.null), dadosLista)

The problem with doing this is that you lose references to the indexes of years and months. There is no free lunch.

However, there is a better solution. Assuming your dataset is named dados and it has a column with dates named data , do the following:

dadosLista <- split(dados, dados$data)

This will put your data in a list. It will be possible to access each of the separate data sets through commands similar to

dadosLista$199501

Thus, each position in the list will be identified by a name, identical to the desired year and month, and not by a number. It will make the code more organized, cleaner and, I believe, run faster than if you used a for .

    
30.06.2017 / 01:51