Fill column of a data frame with data from another date frame in R

6

I have the following df (data1):

ITEM    CLASSIFICACAO
123     AZUL
456     AMARELO
789 
234     VERDE
345     PRETO
456 
567 
678     ROSA

I need to fill in the blank lines of the CLASSIFICACAO column using another data frame:

ITEM    CLASSIFICACAO
789     LARANJA
456     MARROM
567     BRANCO
100     CASA
200     BOLA

How do I fill in the blank df lines (data_1)? Thankful

    
asked by anonymous 14.09.2018 / 22:02

3 answers

5

I would do this using the dplyr package. With dplyr you will combine simple operations until you achieve the result you want:

First, the databases:

dados1 <- data.frame(
  ITEM = c(123,456,789,234,345,456,567,678),
  CLASSIFICACAO = c("AZUL", "AMARELO", NA, "VERDE", "PRETO", NA, NA, "ROSA"), stringsAsFactors = FALSE)
dados2 <- data.frame(
  ITEM = c(789, 456, 567, 100, 200),
  CLASSIFICACAO = c("LARANJA", "MARROM", "BRANCO", "CASA", "BOLA"), stringsAsFactors = F)

Now let's go:

dados1 %>% 
  filter(is.na(CLASSIFICACAO)) %>% # pegamos só as linhas vazias
  select(-CLASSIFICACAO) %>% # tiramos a variável classificação
  left_join(dados2, by = "ITEM") %>% # fazemos o join com o outro bd
  bind_rows(dados1 %>% filter(!is.na(CLASSIFICACAO))) # empilhamos os dois
    
15.09.2018 / 00:08
4

Ideally you should have made your database available (via the dput function), or a part of it at least.

With this example you passed, if these blank lines are NA , you can use the FillIn function of the DataCombine package.

dados1 <- data.frame(
  ITEM = c(123,456,789,234,345,456,567,678),
  CLASSIFICACAO = c("AZUL", "AMARELO", NA, "VERDE", "PRETO", NA, NA, "ROSA"))
dados2 <- data.frame(
  ITEM = c(789,456,567,100,200),
  CLASSIFICACAO = c("LARANJA", "MARROM", "BRANCO", "CASA", "BOLA"))

DataCombine::FillIn(dados1, dados2, Var1 = "CLASSIFICACAO", Var2 = "CLASSIFICACAO",
                    KeyVar = "ITEM")

  ITEM CLASSIFICACAO
1  123          AZUL
2  234         VERDE
3  345         PRETO
4  456       AMARELO
5  456        MARROM
6  567        BRANCO
7  678          ROSA
8  789       LARANJA
    
14.09.2018 / 22:31
3

Here is another example with a code that does not use dependencies (only with the base of the R).

In your example, 456 is the same code used for YELLOW and BROWN. I created another code (457 for BROWN) to avoid duplicates (but I do not know if that was your intention).

First, I define the rule to complete the missing data with a match of the NAs in df1 that may be present in df2. And then I apply the rule on the dataframe df1

df1 <- data.frame(ITEM = c(123,456,789,234,345,457,567,678),
             CLASS = c("AZUL","AMARELO",NA,"VERDE","PRETO",NA,NA,"ROSA"), 
             stringsAsFactors = FALSE)
df2 <- data.frame(ITEM = c(457, 567,100,200, 789),
              CLASS = c("MARROM","BRANCO","CASA","BOLA","LARANJA"), 
              stringsAsFactors = FALSE)

r <- match(df1[is.na(df1$CLASS), "ITEM"], df2$ITEM)

df1[is.na(df1$CLASS), "CLASS" ] <- df2[r, "CLASS"]

print(df1)

  ITEM   CLASS
1  123    AZUL
2  456 AMARELO
3  789 LARANJA
4  234   VERDE
5  345   PRETO
6  457  MARROM
7  567  BRANCO
8  678    ROSA
    
17.09.2018 / 20:41