Change chr to number in R

5

My dear, I am trying to convert the data from columns 4 and 5 to number, but it is giving this error. Any suggestion? Thank you in advance!

library(tidyverse)dadosarrumados<-data_frame(Região=c("Brasil", "Norte", "Rondônia", "Acre", "Amazonas"),
  Total = c(102083, 6715, 711, 285, 1597),
  'Anos de estudo' = rep("menor que 4 anos", 5),
  Quantidade = c("5068.075", "348.574", "42.42", "18.042", "73.231"),
  Porcentagem = c("5", "5.2", "6", "6.3", "4.")
)

as.numeric(dadosarrumados[, c(4, 5)])
  

Error: (list) object can not be coerced to type 'double'

    
asked by anonymous 24.12.2018 / 14:14

2 answers

5

I would do so:

library(tidyverse)
dadosarrumados %>% 
  mutate_at(vars(Quantidade, Porcentagem), parse_number)

# A tibble: 5 x 5
Região    Total 'Anos de estudo' Quantidade Porcentagem
<chr>     <dbl> <chr>                 <dbl>       <dbl>
  1 Brasil   102083 menor que 4 anos     5068.          5  
2 Norte      6715 menor que 4 anos      349.          5.2
3 Rondônia    711 menor que 4 anos       42.4         6  
4 Acre        285 menor que 4 anos       18.0         6.3
5 Amazonas   1597 menor que 4 anos       73.2         4

The advantage of using parse_number instead of as.numeric is that it has several other options, for example specifying which decimal and thousands separator:

> parse_number(c("1,10"), locale = locale(decimal_mark = ","))
[1] 1.1

> as.numeric("1,1")
[1] NA
Warning message:
NAs introduced by coercion

In addition to working in other contexts:

> parse_number("1%")
[1] 1
> as.numeric("1%")
[1] NA
Warning message:
NAs introduced by coercion
One possible problem in your case is that missing values came with some unwanted character instead of empty, it could be a . or something like that. In this case you could use the argument na of parse_number and do so:

dadosarrumados %>% 
  mutate_at(vars(Quantidade, Porcentagem), ~parse_number(.x, na = c(".")))

note parse_number is a function of the readr package that is within tidyverse .

    
27.12.2018 / 01:45
4

The problem is that R understands that dadosarrumados[, c(4, 5)] is a list:

is.list(dadosarrumados[, c(4, 5)])
[1] TRUE

One way to solve this problem is to undo the list and then convert it to numeric:

as.numeric(unlist(dadosarrumados[, c(4, 5)]))
[1] 5068.075  348.574   42.420   18.042   73.231    5.000    5.200    6.000
[9]    6.300    4.000

But notice that we got out of one problem and fell into another: we lost the formatting that was in two columns. The unlist function has transformed the dataset into a vector. We could transform this vector into a data frame, but I prefer another approach.

Use the apply function. It is used to apply other functions in columns or rows of data frames. For example, when rotating

apply(dadosarrumados[, c(4, 5)], 2, as.numeric)
     Quantidade Porcentagem
[1,]   5068.075         5.0
[2,]    348.574         5.2
[3,]     42.420         6.0
[4,]     18.042         6.3
[5,]     73.231         4.0

I'm told to R apply ( apply ) to the as.numeric function in the columns (number 2 ) of the data frame dadosarrumados[, c(4, 5)] . If I had used 1 instead of 2 in the second argument of apply , the as.numeric function would have been applied on the lines and then we would not have the desired result.

One way to get the full frame data, with the columns converted to numeric, is to do this:

bind_cols(dadosarrumados[, 1:3],
          as_data_frame(apply(dadosarrumados[, c(4, 5)], 2, as.numeric)))
# A tibble: 5 x 5
  Região    Total 'Anos de estudo' Quantidade Porcentagem
  <chr>     <dbl> <chr>                 <dbl>       <dbl>
1 Brasil   102083 menor que 4 anos     5068.          5  
2 Norte      6715 menor que 4 anos      349.          5.2
3 Rondônia    711 menor que 4 anos       42.4         6  
4 Acre        285 menor que 4 anos       18.0         6.3
5 Amazonas   1597 menor que 4 anos       73.2         4

I'm using the bind_cols function to join two data frames: the original, from columns 1 to 3, and the resultant from the conversion we made above.

    
27.12.2018 / 00:46