Read file with non-ascii format [à="U + 00E0"]


I'm reading a file in R called roubobs.rds . is a proprietary R format and I could not open it in excel. I can import the data into a variable but, within the records, the texts are with non-ascii (unicode? Utf-8?) Codes. I've browsed to try to find out what code this is, as well as tried exporting as CSV, but it does not work. Does anyone have a light? I need what appears as "Armed Assault" to appear as "Armed Robbery."

The R code you're reading is this one:

dados <- readRDS("roubo2.rds")

The file can be downloaded here: link I'm running RStudio on Mac. SessionInfo below.

R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.12.1 (Sierra)
asked by anonymous 03.01.2017 / 19:55

1 answer


To export to .csv in the correct encoding just add the argument fileEncoding to the function write.csv()

The code would look like this:

dados <- readRDS('roubo2.rds')

write.csv2(dados, 'roubo2.csv', fileEncoding = 'UTF-8')

I also suggest you change the variables in the Factor format to Char, since you are working with texts. To do this, just use as.character() . Example:

roubo$tipo <- as.character(roubo$tipo)

When reading a .csv file you can do this directly by passing the argument stringsAsFactors = FALSE to the function read.csv()

To finish, it would be good to use version 3.2 of R, since the vast majority of packages are designed for this version.

04.01.2017 / 17:19