Add string in R

Question

Add string in R

Navigation

#1 by (3 votes)
#2 by (0 votes)
#3 by (0 votes)

2

I have the following base:

cidade      a                 b         c
AGRONOMICA  CRESO             NA        NA
AGRONOMICA  NA                SICOOB    NA
ALFREDO     CREDIVERTENTES    NA        NA
ALMIRANTE   SICOPER           NA        NA
ALMIRANTE   NA                SICRED    NA
ALTO        SICOPER           NA        NA
ALTO        NA                SICOOB    NA
ALTO        NA                NA        SICRED

The idea is to add the base so that it looks like this:

cidade      a                 b         c
AGRONOMICA  CRESO             SICOOB    NA
ALFREDO     CREDIVERTENTES    NA        NA
ALMIRANTE   SICOPER           SICRED    NA    
ALTO        SICOPER           SICOOB    SICRED

The aggregate requires that the values be numeric. How to do this with these nominal variables?

r

asked by anonymous 08.01.2018 / 21:29

3 answers

0

The purrrlyr package makes the excellent @Rui solution simpler:

library(dplyr)
library(purrrlyr)
library(zoo)

dados %>% 
  group_by(cidade) %>% 
  dmap(na.locf) %>% 
  distinct(cidade,.keep_all=T)

13.01.2018 / 16:13

0

Without needing anything other than dplyr:

dados %>%
  group_by(cidade) %>%
  summarise_all(function(x) {
    res <- x[!is.na(x)]
    ifelse(length(res) == 0, NA_character_, res)
  })

15.01.2018 / 19:00

Parallel with only one core? Search interval dates Laravel / Eloquent?

score 3 · Accepted Answer

I believe that this code answers the question, but we must pay attention to the following: In the desired result, which is in the question, the ALFREDO line has the b column equal to SICRED when in the input table it is NA . So, this code holds the NA value in the result.

res <- lapply(split(dados, dados$cidade), zoo::na.locf)
res <- lapply(res, zoo::na.locf, fromLast = TRUE)
res <- do.call(rbind, res)
res <- res[!duplicated(res), ]
row.names(res) <- NULL
res
#      cidade              a      b      c
#1 AGRONOMICA          CRESO SICOOB   <NA>
#2    ALFREDO CREDIVERTENTES   <NA>   <NA>
#3  ALMIRANTE        SICOPER SICRED   <NA>
#4       ALTO        SICOPER SICOOB SICRED

Explanation.
Step by step, the above code works as follows.

First uses split to divide the input data.frame by cidade .

Next, it applies the na.locf function of the zoo package to each sub-df to take the previous value that is not NA forward .

Now does the same, but taking the value not NA back back .

Then join the sub-df's with do.call/rbind .

And choose only non-duplicate lines.

The result has line names, to number them consecutively just give them the value NULL .

DATA.

dados <-
structure(list(cidade = c("AGRONOMICA", "AGRONOMICA", "ALFREDO", 
"ALMIRANTE", "ALMIRANTE", "ALTO", "ALTO", "ALTO"), a = c("CRESO", 
NA, "CREDIVERTENTES", "SICOPER", NA, "SICOPER", NA, NA), b = c(NA, 
"SICOOB", NA, NA, "SICRED", NA, "SICOOB", NA), c = c(NA, NA, 
NA, NA, NA, NA, NA, "SICRED")), .Names = c("cidade", "a", "b", 
"c"), class = "data.frame", row.names = c(NA, -8L))