How to import data (.csv) into the R while retaining the original format

4

I'm trying to import Excel data (already in .csv format) into R; the values contained in the files to be imported are in the following formats, for example 8509,80 ...

To do the import, I am using the command:

variavel=read.table("dados.csv", header=T, dec=",") 

However, when viewing the imported data, I see that R imported only the part that is not the whole number (in this case, the R would bring the value of 8509.80 to only 80).

In this way, I ask you kindly to help me make the import correctly, that is, the value of 8509.80.

    
asked by anonymous 04.07.2015 / 15:07

2 answers

2

You need to set the field separator. In your case, what should be the European / Brazilian csv, the separator is probably ";".

variavel=read.table("dados.csv", header=T, dec=",", sep=";")

A shortcut to this is to use the read.csv2 function:

variavel=read.csv2("dados.csv", header=T)
    
04.07.2015 / 22:28
2

The base functions for reading tables are sufficient to suit most cases. However, they are relatively slow, and there are faster alternatives if too many files and / or large files have other small advantages.

The readr package was created exactly for the purpose of improving the default functions, at the following points :

  • Arguments have names that are more consistent with each other (eg col_names and col_types and not header and colClasses ).

  • They are approximately 10x faster.

  • Show a progress bar if reading takes longer than a few seconds.

  • Strings are not transformed into factors by default.

  • Column names are not transformed into R's "valid" expressions, meaning the columns keep the name identical to the original (even if they start with number, have space, etc.).

    / li>

In this package the functions have a name similar to base , replacing the point with an underscore (_). For example:

#base:
variavel <- read.table("dados.csv", header=T, dec=",", sep=";")
variavel <- read.csv2("dados.csv", header=T)

#readr
library(readr)
variavel <- read_csv2("dados.csv")

Similarly, there are functions read_csv() , read_table() , read_delim() , read_tsv() , read_lines() and read_fwf() .

Another alternative, too, is the fread() function of the data.table package. The fread() is even faster (about 2x) than the package functions readr , and tries to automatically identify the separator, if there are column names, etc. The fread() function has arguments with names equal to the functions of base , such as sep , header , and stringsAsFactors . In this example, it would look like this:

library(data.table)
variavel <- fread("dados.csv", sep = ";", header = TRUE)

Depending on the format of the data, sep and header can be omitted, but in doubt, it is safer to put them explicitly.

Finally, it is important to note that it only makes sense to use these functions if reading performance is a problem, or if the packet is already loaded anyway (in data.table ). Otherwise, there is no need to load a package to do something that can be done identically in base .

    
05.07.2015 / 18:40