Date frame manipulation

2

Hello,

I'm trying to manipulate a database with 300+ observations, but each observation has a different number of columns. The problem is that from a certain point, the columns start repeating themselves, but they have different values and I wanted to somehow put them together.

What I want essentially is, within an object, to save the information of each observation in a way that helps in the visualization.

For example:

            a   b   c   d    e  c   d   e   c     d    e
    Obs_1   AA  BB  CC  8.3  A  SSD 2.3 RN  S76   5    A
    Obs_2   SS  DA  LL  5    A  SDD         AD23  8.2  A

And I would like to transform into something like this:

         a   b    c      d   e
Obs_1   AA  BB  CC76    8.3  A
                SSD89   2.3  RN
                S76      5   A
Obs_2   SS  DA  LL       5   A
                SDD     
                AD23    8.2  A

Is it possible to do this in a non-manual way in R? Note also that there are values in the database that have no value, some values the R complete with NA, others it simply leaves blank.

    
asked by anonymous 28.03.2018 / 21:08

1 answer

2

See if this code helps. I'm testing with mtcars . You need to install packages dplyr , tidyr and janitor

mtcars
# renomeia as colunas para ficar parecido com seu exemplo
names(mtcars) <- c("a", "b", "c", "d", "b", "c", "d", "a", "b", "c", "d")

mtcars %>% 
  slice(c(1, 2))

# A tibble: 2 x 11
      a     b     c     d     b     c     d     a     b     c     d
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1   21.    6.  160.  110.    6.  160.  110.   21.    6.  160.  110.
2   21.    6.  160.  110.    6.  160.  110.   21.    6.  160.  110.


res <- mtcars %>% 
  # pega as primeiras linhas para ficar parecido com seu exemplo
  head(2) %>% 
  # transforma os nomes em unicos
  janitor::clean_names() %>% 
  # empilha todos os dados
  gather() %>% 
  # separa as colunas respectivas e os respectivos índices
  separate(key, c("letra", "numero"), sep = "_", fill = "right") %>% 
  # substitui NA por 1 no caso da primeira separação
  replace_na(list(numero = "1")) %>% 
  # completa as combinacoes de indices e letras que nao apareceram
  complete(letra, numero) %>% 
  # essa parte nao sei se precisa 
  group_by(letra, numero) %>% 
  summarise(res = first(value)) %>% 
  ungroup() %>% 
  # espalha as letras nas colunas
  spread(letra, res)


res

# A tibble: 3 x 5
  numero     a     b      c      d
  <chr>  <dbl> <dbl>  <dbl>  <dbl>
1 1        21.  6.00 160.   110.  
2 2         0.  3.90   2.62  16.5 
3 3        NA   1.00   4.00   4.00
    
30.03.2018 / 07:08