Look up values in one data.frame and add in another (R)

10

I have 2 data.frames, the first is a data.frame that contains stock data and a column with a unique identifier (column "ISIN") , as shown below:

>
Teste=data.frame(matrix(runif(20), nrow=5, ncol=4))
Teste$ISIN <- c("A1","A2","A3","A4","A5")
colnames(Teste) <- c("AVG_VOLUME","AVG_RETURN","VOL","PRICE","ISIN")

          AVG_VOLUME AVG_RETURN    VOL         PRICE  ISIN
 Stock 1  0.7028197  0.09264265    0.002169411 100    A1
 Stock 2  0.7154557  0.03314615    0.004839466 100    A2
 Stock 3  0.4038030  0.04347487    0.003441471 100    A3
 Stock 4  0.5392530  0.06414982    0.004482528 100    A4
 Stock 5  0.8720084  0.09615865    0.008081017 100    A5

My second data.frame has an "ISIN" column but they are not unique, as shown below:

Teste2 <- data.frame(matrix(runif(10), nrow=5, ncol=2))
Teste2$ISIN <- c("A1","A1","A3","A2","A2")

  X1        X2          ISIN
1 0.0273074 0.6829592   A1
2 0.1928437 0.3768154   A1
3 0.9693224 0.3331828   A3
4 0.9434274 0.1549707   A2
5 0.6211476 0.3360101   A2

What I would like to do is fetch in the data.frame Test the value of the column "AVG_VOLUME" corresponding to the "ISIN" of data.frame Test2 and add this information as a new column in Test2, resulting in:

  X1        X2          ISIN  AVG_VOLUME
1 0.0273074 0.6829592   A1    0.7028197
2 0.1928437 0.3768154   A1    0.7028197
3 0.9693224 0.3331828   A3    0.4038030
4 0.9434274 0.1549707   A2    0.7154557
5 0.6211476 0.3360101   A2    0.7154557

What is the best way to get this result, type a PROCV in excel?

    
asked by anonymous 19.04.2016 / 21:30

2 answers

8

Using dplyr you can do this:

> library(dplyr)
> Teste3 <- left_join(Teste2, Teste %>% select(ISIN, AVG_VOLUME))
Joining by: "ISIN"
> Teste3
         X1        X2 ISIN AVG_VOLUME
1 0.5896936 0.1847698   A1  0.3473872
2 0.9254054 0.1172393   A1  0.3473872
3 0.3235616 0.3382006   A3  0.1330457
4 0.8382510 0.6392993   A2  0.9750695
5 0.3537203 0.7187851   A2  0.9750695

You could specify the argument by in the left_join function if the columns in the two data.frames did not have the same name:

Teste3 <- left_join(Teste2, Teste %>% select(ISIN, AVG_VOLUME), 
                    by = c("ISIN" = "ISIN"))

If you wanted to bring all the columns of the other bank, you could do the following:

> Teste3 <- left_join(Teste2, Teste, 
+                     by = c("ISIN" = "ISIN"))
> Teste3
         X1        X2 ISIN AVG_VOLUME AVG_RETURN       VOL     PRICE
1 0.5896936 0.1847698   A1  0.3473872  0.1309524 0.6389297 0.3433055
2 0.9254054 0.1172393   A1  0.3473872  0.1309524 0.6389297 0.3433055
3 0.3235616 0.3382006   A3  0.1330457  0.2889567 0.3713113 0.8803770
4 0.8382510 0.6392993   A2  0.9750695  0.1394496 0.5789795 0.7136534
5 0.3537203 0.7187851   A2  0.9750695  0.1394496 0.5789795 0.7136534

I think it's cool to leave the bechmark of solutions here as well. The solution using dplyr is about 2x faster as per the table below:

> microbenchmark::microbenchmark(
+   merge =  merge(Teste, Teste2, by = "ISIN"),
+   left_join = left_join(Teste2, Teste, by = "ISIN")  
+ )
Unit: microseconds
      expr     min       lq     mean  median      uq       max neval
     merge 738.565 773.4590 927.1096 792.249 816.673 13732.206   100
 left_join 287.354 307.5555 323.3633 316.747 327.374   671.206   100
    
19.04.2016 / 21:54
7

merge is the function you are looking for - it can join two data frames. Since you only want a column of the first date frame, you can filter it to just have it:

Teste=data.frame(matrix(runif(20), nrow=5, ncol=4))
Teste$ISIN <- c("A1","A2","A3","A4","A5")
colnames(Teste) <- c("AVG_VOLUME","AVG_RETURN","VOL","PRICE","ISIN")

Teste2 <- data.frame(matrix(runif(10), nrow=5, ncol=2))
Teste2$ISIN <- c("A1","A1","A3","A2","A2")

Teste3 <- merge(Teste[,c("ISIN", "AVG_VOLUME")], Teste2, by = "ISIN")
    
19.04.2016 / 21:39