The base functions for reading tables are sufficient to suit most cases. However, they are relatively slow, and there are faster alternatives if too many files and / or large files have other small advantages.
The readr
package was created exactly for the purpose of improving the default functions, at the following points :
-
Arguments have names that are more consistent with each other (eg col_names
and col_types
and not header
and colClasses
).
-
They are approximately 10x faster.
-
Show a progress bar if reading takes longer than a few seconds.
-
Strings are not transformed into factors by default.
-
Column names are not transformed into R's "valid" expressions, meaning the columns keep the name identical to the original (even if they start with number, have space, etc.).
/ li>
In this package the functions have a name similar to base
, replacing the point with an underscore (_). For example:
#base:
variavel <- read.table("dados.csv", header=T, dec=",", sep=";")
variavel <- read.csv2("dados.csv", header=T)
#readr
library(readr)
variavel <- read_csv2("dados.csv")
Similarly, there are functions read_csv()
, read_table()
, read_delim()
, read_tsv()
, read_lines()
and read_fwf()
.
Another alternative, too, is the fread()
function of the data.table
package. The fread()
is even faster (about 2x) than the package functions readr
, and tries to automatically identify the separator, if there are column names, etc. The fread()
function has arguments with names equal to the functions of base
, such as sep
, header
, and stringsAsFactors
. In this example, it would look like this:
library(data.table)
variavel <- fread("dados.csv", sep = ";", header = TRUE)
Depending on the format of the data, sep
and header
can be omitted, but in doubt, it is safer to put them explicitly.
Finally, it is important to note that it only makes sense to use these functions if reading performance is a problem, or if the packet is already loaded anyway (in data.table
). Otherwise, there is no need to load a package to do something that can be done identically in base
.