Columns with different line numbers, how to join?

4

I have some time series (monthly) that end in different months, for example, some end in March 2017, others in April 2017 and some until May 2017,

228 2016-12-01      1.12
229 2017-01-01      1.09
230 2017-02-01      0.87
231 2017-03-01      1.05
232 2017-04-01      0.79
233 2017-05-01      0.08



    data       v7827
206 2016-10-01 12.36
207 2016-11-01 12.37
208 2016-12-01 11.84
209 2017-01-01 11.11
210 2017-02-01 10.53
211 2017-03-01  9.85



  data          v3698
227 2016-11-01 3.3420
228 2016-12-01 3.3523
229 2017-01-01 3.1966
230 2017-02-01 3.1042
231 2017-03-01 3.1279
232 2017-04-01 3.1362

I need to merge them into a single data.frame, but using a cbind results in an error because the number of lines is different. How can I make a join between these tables to turn them into one in a simplified way? No need to be adding values on empty dates? The expected end result is this:

                            v7827   v3698
228 2016-12-01      1.12    11.84   3.3523
229 2017-01-01      1.09    11.11   3.1966
230 2017-02-01      0.87    10.53   3.1042
231 2017-03-01      1.05    9.85    3.1279
232 2017-04-01      0.79            3.1362
233 2017-05-01      0.08    
    
asked by anonymous 04.05.2017 / 20:58

3 answers

2

You can join several bases with only the base R. Since the merge function only accepts two data.frame 's when more than two have to be put in a list and the Reduce function is called merge .

With the example of the question data, I will assume that the bases are in .GlobalEnv and therefore I will first create a list with them.

df_list <- ls(pattern = "^df\d+$")  # Encontra os nomes das df's
df_list <- mget(df_list)             # Vai buscar os objetos com esses nomes

Now it's just a line of code.

Reduce(function(x, y) merge(x, y, all = TRUE), df_list)
#        data v1234 v7827  v3698
#1 2016-10-01    NA 12.36     NA
#2 2016-11-01    NA 12.37 3.3420
#3 2016-12-01  1.12 11.84 3.3523
#4 2017-01-01  1.09 11.11 3.1966
#5 2017-02-01  0.87 10.53 3.1042
#6 2017-03-01  1.05  9.85 3.1279
#7 2017-04-01  0.79    NA 3.1362
#8 2017-05-01  0.08    NA     NA

Data.

Note that in the question the first df has no column header, I used data and v followed by any number.

df1 <- read.table(text = "
    data           v1234
228 2016-12-01      1.12
229 2017-01-01      1.09
230 2017-02-01      0.87
231 2017-03-01      1.05
232 2017-04-01      0.79
233 2017-05-01      0.08
", header = TRUE)

df2 <- read.table(text = "
    data       v7827
206 2016-10-01 12.36
207 2016-11-01 12.37
208 2016-12-01 11.84
209 2017-01-01 11.11
210 2017-02-01 10.53
211 2017-03-01  9.85
", header = TRUE)

df3 <- read.table(text = "
  data          v3698
227 2016-11-01 3.3420
228 2016-12-01 3.3523
229 2017-01-01 3.1966
230 2017-02-01 3.1042
231 2017-03-01 3.1279
232 2017-04-01 3.1362
", header = TRUE)

df1$data <- as.Date(df1$data)
df2$data <- as.Date(df2$data)
df3$data <- as.Date(df3$data)
    
21.08.2018 / 09:48
4

I have simulated some bases to illustrate what you want, please let me know if there is any inconsistency:

The bases with 3 and 4 lines, where the equal values are a1 and a2 (in their case are the dates):

df1 <- data.frame(
  data=c("a1", "a2", "a3"),
  b=c("b1", "b2", "b3"),
  stringsAsFactors = F
)

df2 <- data.frame(
  data=c("a1", "a2", "a4", "a5"),
  c=c("c1", "c2", "c3", "c4"),
  stringsAsFactors = F
)

Now I extract the single dates and I make a left_join() of dplyr to add the columns:

datas <- as.data.frame(unique(c(df1$data, df2$data)), stringsAsFactors = F)
colnames(datas) <- "data"

df <- left_join(datas, df1) %>%
  left_join(., df2)
    
04.05.2017 / 21:14
1

Run:

x<-c(1:9)
y<-c(1:10)

library(qpcR)
qpcR:::cbind.na(x,y)

or

qpcR:::rbind.na(x,y) # para linhas
    
21.08.2018 / 04:50