Here are some tips to solve your problem:
1 - Read the file using another function:
> microbenchmark(
+ base = read.csv(file = "df-write-csv.csv", header = T),
+ readr = readr::read_csv("df-write-csv.csv"),
+ data.table = data.table::fread("df-write-csv.csv"),
+ rio = rio::import("df-write-csv.csv", format = "csv")
+ )
Unit: microseconds
expr min lq mean median uq max neval
base 1836.230 1912.1815 2253.6071 1980.3995 2282.1675 4148.787 100
readr 823.960 881.3625 1072.4790 921.6605 1120.2365 3538.359 100
data.table 327.759 364.4810 442.5933 402.3295 458.7895 920.436 100
rio 312.317 351.2260 444.1087 382.9325 439.7960 2938.490 100
See that reading the file with the fread
function of the data.table
or the import
function of the rio
is 4x faster than with the native function of R.
Make sure you can actually read.
2 - Verify that you actually managed to filter your database. Save the subset result to an auxiliary object.
If the problem is this, try filtering using functions from other packages like dplyr
or data.table
.
When they are long operations, data.table
can be much faster.
> df <- data.frame(x = 1:100000, y = 1:100000, l = sample(letters, size = 100, replace = T))
> microbenchmark(
+ base = subset(df, l == "a"),
+ dplyr = dplyr::filter(df, l == "a"),
+ data.table = data.table(df)[l == "a",]
+ )
Unit: milliseconds
expr min lq mean median uq max neval
base 10.329514 12.467143 14.962479 13.976907 17.171858 24.3761 100
dplyr 7.331626 8.624356 10.063947 8.853807 11.140871 16.8939 100
data.table 2.986519 4.580536 6.774548 4.824227 5.957255 119.9709 100
3 - Use the function write_csv
of package readr
it is more or less 2x faster than the function write.csv
native of R.
microbenchmark(
base = write.csv(df, file = "df-write-csv.csv", row.names = F),
rio = rio::export(df, file = "df-rio.csv", format = "csv"),
readr = readr::write_csv(df, path = "df-readr.csv")
)
Unit: microseconds
expr min lq mean median uq max neval
base 713.564 1097.534 2025.377 1467.4980 2996.136 4168.352 100
rio 718.141 1156.998 2243.143 2011.5310 3106.479 7368.046 100
readr 366.306 594.629 1265.297 734.0445 1793.405 5852.142 100
Anyway, if you could read the 5GB file, it is very likely that you can also write it, since it is already in the RAM of your computer.