Is it possible to pair values from two dataframes with different numbers of observations?

3

I have two data frames:

Sexo <- rep(1:2 , length.out = 51)
Estudo <- rep(1: 17, length.out = 51)
Salário <- c(runif(51, min=900, max=3000))

data1 <- data.frame(Sexo, Estudo, Salário)

data2 <-  data.frame(TaxaHomens = c(seq(0.1,0.99,length=17)), 
                TaxaMulheres = c(seq(0.2,0.99,length=17)), 
                Estudo = c(1:17))

The variable Sex at date1 is classified as 1 for men and 2 for women, Study corresponds to the person's years of studies and salary for the monthly earnings.

Is it possible to create a column called Rate on the first date frame so that the rates of the second date frame are organized according to Sex and Study?

For example, in the new data1 column I want to have Sex = 1 and Study = 3 a rate of 0.211250, for all observations of that type, and so on.

    
asked by anonymous 16.03.2018 / 01:44

3 answers

4

Using the dplyr and tidyr packages:

library(dplyr)
library(tidyr)
data2 <- data2 %>% 
  gather(Sexo, Taxa, TaxaHomens:TaxaMulheres) 
data2$Sexo <- ifelse(data2$Sexo == "TaxaHomens", 1, 2)
left_join(data1, data2, by = c("Sexo", "Estudo"))

What I did was transform your data2 so that it was only with a rate column (men and women in the same column).

    
16.03.2018 / 13:00
0

Maybe it has a more efficient way, but if you do not have many rows it will not be a problem:

for(i in 1: dim(data1)[1]) {
  data1$Taxa[i] <- data2[which(data2$Estudo == data1$Estudo[i]), data1$Sexo[i]]
}
    
16.03.2018 / 03:21
0

Using base R:

Sexo <- rep(1:2 , length.out = 51)
Estudo <- rep(1: 17, length.out = 51)
Salario <- c(runif(51, min=900, max=3000))

data1 <- data.frame(Sexo, Estudo, Salario)

data2 <-  data.frame(TaxaHomens = c(seq(0.1,0.99,length=17)), 
                TaxaMulheres = c(seq(0.2,0.99,length=17)), 
                Estudo = c(1:17))


data1$Fator = data1$Estudo
data1$Fator = factor(data1$Fator)
levels(data1$Fator) = data2$TaxaHomens
data1$Fator = as.numeric(as.character(data1$Fator))
head(data1)

  Sexo Estudo   Salario    Fator
1    1      1 2287.2813 0.100000
2    2      2 2845.6058 0.155625
3    1      3 2606.5139 0.211250
4    2      4  911.2628 0.266875
5    1      5 2658.2753 0.322500
6    2      6 2082.1462 0.378125

The key is to create a column of factors with the same values in the Study column and replace the levels with the values in the Men's Tax or Women's column. Important that this only works because in data2 there are no duplicate data.

    
20.03.2018 / 14:24