How to transform rows into columns, when the number of rows is variable in R

Question

How to transform rows into columns, when the number of rows is variable in R

Navigation

#1 by (1 votes)

2

I have a Data Frame like this:

Itneedstolooklikethis:

Thevariablethatdefinesthecolumncanbecalculatedaccordingtotheimagebelow

The sequence starts with every change of the id, in excel it would be easy but I want to learn how to do it in R.

r

asked by anonymous 14.04.2016 / 01:40

1 answer

Problem with query in php because of quotation marks Query mysql with subquery in NodeJs

score 1 · Answer 1

William, what you want to do is to pass the data.frame from long format to wide format.

Constructing the variable col according to the ID change, would look like this.

Reproducing your original data.frame:

df <- data.frame(ID = c(rep(313721, 3),
                        rep(313718, 2),
                        rep(313729, 6)),
                 COD_pr = c(205073,
                            176779,
                            191991,
                            198089,
                            201429,
                            167489,
                            119926,
                            170093,
                            170363,
                            123486,
                            158028))

Calculating the column according to ID change.

df$col <- unlist(sapply(rle(df$ID)$lengths, seq, from = 1))

Passing the data.frame to the wide format.

library(tidyr)
spread(df, col, COD_pr)
      ID      1      2      3      4      5      6
1 313718 198089 201429     NA     NA     NA     NA
2 313721 205073 176779 191991     NA     NA     NA
3 313729 167489 119926 170093 170363 123486 158028

library(reshape2)
dcast(df, ID~col, value.var = "COD_pr")
      ID      1      2      3      4      5      6
1 313718 198089 201429     NA     NA     NA     NA
2 313721 205073 176779 191991     NA     NA     NA
3 313729 167489 119926 170093 170363 123486 158028

Previous answer

However, in the example you gave, you would lack the variable "col", that is, we need to know how to define in which column each COD_pr variable comes in so we can distribute the values the way you want them.

Assuming this "col" variable exists, it is easy to do this with both tidyr and reshape2 . Here is an illustrative example, similar to your data:

set.seed(10)
df <- data.frame(ID = rep(c(1,2,3), 4),
                 col = c(rep("col1", 3), 
                         rep("col2", 2),
                         rep("col3", 1), 
                         rep("col4", 1),
                         rep("col5", 2),
                         rep("col6", 3)),
                 COD_pr = rnorm(12))


library(tidyr)
spread(df, col, COD_pr)
  ID        col1       col2      col3      col4      col5       col6
1  1  0.01874617 -0.5991677        NA -1.208076        NA -0.2564784
2  2 -0.18425254  0.2945451        NA        NA -0.363676  1.1017795
3  3 -1.37133055         NA 0.3897943        NA -1.626673  0.7557815

library(reshape2)
dcast(df, ID~col, value.var = "COD_pr")
  ID        col1       col2      col3      col4      col5       col6
1  1  0.01874617 -0.5991677        NA -1.208076        NA -0.2564784
2  2 -0.18425254  0.2945451        NA        NA -0.363676  1.1017795
3  3 -1.37133055         NA 0.3897943        NA -1.626673  0.7557815