Split a date frame and save to separate directories

3

I have a data frame composed of 100 rows and two columns (name and quantity). The quantity column is an integer ranging from 1 to 4. How can I split my original date frame into four date frames following column 2 (quantity)?

In other words, I expect the following result after division: data frame 01, with 20 lines of quantity 01. Data frame 02, with 25 lines with quantity 02. Data frame 03, with 30 lines of quantity 03 and date frame 03, with 25 lines of quantity 04. This is a fictitious example.

    
asked by anonymous 07.07.2017 / 21:36

3 answers

4

This is the ideal case for the split function. With the split function you can split your data.frame according to the values of the Quantidade column:

tab_split <- split(tab, tab$Quantidade)

The result in the above command was saved in a list with the four%% of separated%:

str(tab_split)
    List of 4
     $ 1:'data.frame':  20 obs. of  2 variables:
      ..$ Nome      : Factor w/ 26 levels "A","B","C","D",..: 22 2 12 5 25 22 24 15 20 19 ...
      ..$ Quantidade: num [1:20] 1 1 1 1 1 1 1 1 1 1 ...
     $ 2:'data.frame':  25 obs. of  2 variables:
      ..$ Nome      : Factor w/ 26 levels "A","B","C","D",..: 26 10 20 17 20 21 1 1 14 20 ...
      ..$ Quantidade: num [1:25] 2 2 2 2 2 2 2 2 2 2 ...
     $ 3:'data.frame':  30 obs. of  2 variables:
      ..$ Nome      : Factor w/ 26 levels "A","B","C","D",..: 24 21 1 19 24 13 6 22 25 15 ...
      ..$ Quantidade: num [1:30] 3 3 3 3 3 3 3 3 3 3 ...
     $ 4:'data.frame':  25 obs. of  2 variables:
      ..$ Nome      : Factor w/ 26 levels "A","B","C","D",..: 8 22 25 3 5 21 23 12 5 8 ...
      ..$ Quantidade: num [1:25] 4 4 4 4 4 4 4 4 4 4 ...

I recommend leaving the four date.frames on the list, it's easier and more organized to work. But if you want to put the data.frames in the global environment just use data.frames :

names(tab_split) <- paste0("df", seq_along(tab_split))
list2env(tab_split, envir = globalenv())
    
09.07.2017 / 01:08
2
tab <- data.frame("Nome" = sample(LETTERS, 100, rep = T),
                  "Quantidade" = c(rep(1,20),rep(2,25),rep(3,30),rep(4,25)))
tab1 <- tab[which(tab$Quantidade == 1),]
tab2 <- tab[which(tab$Quantidade == 2),]
tab3 <- tab[which(tab$Quantidade == 3),]
tab4 <- tab[which(tab$Quantidade == 4),]
    
07.07.2017 / 22:00
2

Two other ways to solve the problem. The first one uses the dplyr package:

library(dplyr)
tab01 <- tab %>%
  filter(Quantidade==1)
tab02 <- tab %>%
  filter(Quantidade==2)
tab03 <- tab %>%
  filter(Quantidade==3)
tab04 <- tab %>%
  filter(Quantidade==4)

The second uses the command subset :

tab01 <- subset(tab, Quantidade==1)
tab02 <- subset(tab, Quantidade==2)
tab03 <- subset(tab, Quantidade==3)
tab04 <- subset(tab, Quantidade==4)
    
08.07.2017 / 01:54