Creating a dataframe based on two other dataframes using dplyr in R

4

These are my dataframes:

df<-    as.data.frame(matrix(rexp(200),, 25))
colnames(df)<-c("A","B","C","D","E","F","G","H","I","J",
        "K","L","M","N","O","P","Q","R","S","T",
        "U","V","X","Z","W")


df.new<-as.data.frame(matrix(rexp(200),,20))
colnames(df.new)<-c("A.B","A.B.new",
            "A.C","A.C.new",
            "A.F","A.F.new",
                "B.C","B.C.new",
                    "C.D","C.D.new",
                    "F.G","F.G.new",
                "H.I","H.I.new",
                    "H.K","H.K.new",
                    "L.M","L.M.new",
            "N.Q","N.Q.new")
df.new<-df.new[-c(9:10),]


finaldf<-cbind(df,df.new)
colnames(finaldf)<-c("A","B","C","D","E","F","G","H","I","J",
        "K","L","M","N","O","P","Q","R","S","T",
        "U","V","X","Z","W","A.B","A.B.new",
            "A.C","A.C.new",
            "A.F","A.F.new",
                "B.C","B.C.new",
                    "C.D","C.D.new",
                    "F.G","F.G.new",
                "H.I","H.I.new",
                    "H.K","H.K.new",
                    "L.M","L.M.new",
            "N.Q","N.Q.new")

I'd like to have the dataframe finaldf sort the columns like this:

colnames(finaldf)<-c("A","A.B","A.B.new",
            "A","A.C","A.C.new",
            "A","A.F","A.F.new",
            "B","B.C","B.C.new",
            "C","C.D","C.D.new",
            "F","F.G","F.G.new",
            "H","H.I","H.I.new",
            "H","H.K","H.K.new",
            "L","L.M","L.M.new",
            "N","N.Q","N.Q.new")

As my original dataframe is much larger, I may need a more robust code, which runs away from my ability because I'm new to R.

Notice that the idea is to simply grab columns from the df dataframe and play in the df.new dataframe. But these columns should fit in the order established by the df.new dataframe.

And I would like to do this using the dplyr package. Is it possible?

Edited:

Well, my original code comes with the names of the bovespa bolts:

nms.new<-c("ABEV3.BBAS3", "ABEV3.BBAS3.new", "ABEV3.BRAP4", "ABEV3.BRAP4.new", 
"ABEV3.BRKM5", "ABEV3.BRKM5.new", "ABEV3.CSAN3", "ABEV3.CSAN3.new", 
"ABEV3.CSNA3", "ABEV3.CSNA3.new", "ABEV3.CYRE3", "ABEV3.CYRE3.new", 
"ABEV3.DTEX3", "ABEV3.DTEX3.new", "ABEV3.ELPL4", "ABEV3.ELPL4.new", 
"ABEV3.EVEN3", "ABEV3.EVEN3.new", "ABEV3.FIBR3", "ABEV3.FIBR3.new", 
"ABEV3.GGBR4", "ABEV3.GGBR4.new", "ABEV3.GOAU4", "ABEV3.GOAU4.new", 
"ABEV3.HYPE3", "ABEV3.HYPE3.new", "ABEV3.JBSS3", "ABEV3.JBSS3.new","SANB11.BBAS3","SANB11.BBAS3.new")

I would need to separate what comes before and after the point and not 1 in 1. Sometimes it comes 5 letters before the point, other times it comes 6 letters.

    
asked by anonymous 23.09.2018 / 04:36

1 answer

3

Here it goes. The trick is to group by the first few characters of names(finaldf) and then process the list of groups.

First we choose the names that interest.

nms <- names(finaldf)
nms <- nms[sapply(nms, nchar) > 1]

Now group by the first 3 characters.

sp <- split(nms, substr(nms, 1, 3))
nms <- lapply(sp, function(s) c(substr(s[1], 1, 1), s))
nms <- unlist(nms)
result <- finaldf[nms]

The result result has the dimensions of the question.

dim(result)
#[1]  8 30

But be careful, since there can not be repeated names, there will be A , A.1 and A.2 . And the same thing for any others where there are repetitions, for example H.I and H.K .

names(result)
# [1] "A"       "A.B"     "A.B.new" "A.1"     "A.C"     "A.C.new"
# [7] "A.2"     "A.F"     "A.F.new" "B"       "B.C"     "B.C.new"
#[13] "C"       "C.D"     "C.D.new" "F"       "F.G"     "F.G.new"
#[19] "H"       "H.I"     "H.I.new" "H.1"     "H.K"     "H.K.new"
#[25] "L"       "L.M"     "L.M.new" "N"       "N.Q"     "N.Q.new"
    
23.09.2018 / 17:43