Compare rows between rows between two data frames?

3

Considering:

q1 <- data.frame(COD = 1:5, CLAS=letters[1:5])
q2 <- data.frame(COD = c(25,1,31,3,2), CLAS=c(45,letters[1],100,letters[3],letters[10]))

I need to know which lines are common between the data frames, taking into account that the entire line must be the same. Is there a function that returns the index from q1 to q2, or vice versa?

In the case of q1 to q2 , lines 1 and 3 are in 2 and 4. Or lines 2 and 4 of q2 are equal to 1 and 3 of q1 .

How can I make this comparison?

    
asked by anonymous 08.08.2016 / 22:57

2 answers

2

I do not know if you need the indexes. If a data.frame with the lines in common is sufficient, a workaround is to use inner_join :

> q1 <- data.frame(COD = 1:5, CLAS=letters[1:5], stringsAsFactors = F)
> q2 <- data.frame(COD = c(25,1,31,3,2), CLAS=c(45,letters[1],100,letters[3],letters[10]), stringsAsFactors = F)
> library(dplyr)
> inner_join(q1, q2)
Joining by: c("COD", "CLAS")
  COD CLAS
1   1    a
2   3    c

By default inner_join uses all columns that have the same name in both data.frames , if you did not want to use all, you could use the by argument. Read more at: help("join") .

Of course you can adapt this code to save the indexes of the duplicates of each of the data sets, but the code is not so elegant anymore.

> inner_join(
+   q1 %>% mutate(id_q1 = 1:nrow(.)), 
+   q2 %>% mutate(id_q2 = 1:nrow(.))
+   )
Joining by: c("COD", "CLAS")
  COD CLAS id_q1 id_q2
1   1    a     1     2
2   3    c     3     4
    
09.08.2016 / 02:27
3

This code below works great. I guess it's fast even if your data frames are a little big.

q1 <- data.frame(COD = 1:5, CLAS=letters[1:5])
q2 <- data.frame(COD = c(25,1,31,3,2), CLAS=c(45,letters[1],100,letters[3],letters[10]))

q  <- rbind(q1, q2)

duplicados <- duplicated(q)

which(duplicados==TRUE)-dim(q1)[1]
    
08.08.2016 / 23:21