Is there a way to open a straight SQL table in a data.table without doing the SQL data.frame data.table path?

7

I want to open a straight SQL table in a data.table. When I query with dbGetQuery , what I get is a data.frame. I know I can later turn that data.frame into a data.table easily. But I'd like to skip this step - which on some occasions may not be computationally efficient.

Below a reproducible minimum code:

    library(sqldf)
    library(data.table)

    A <- data.frame(var1 = 1:10, var2=55:64)
    drv <- dbDriver("SQLite") 
    con <- dbConnect(drv, "basequalquer.db") 
    dbWriteTable(con, "TabelaA", A) 

On the command line below, I'm applying the command data.table on the result of dbGetQuery (which comes as data.frame). That's exactly what I would NOT want to do ...

    data.table(dbGetQuery(con, "Select * from TabelaA"))
        row_names var1 var2
     1:         1    1   55
     2:         2    2   56
     3:         3    3   57
     4:         4    4   58
     5:         5    5   59
     6:         6    6   60
     7:         7    7   61
     8:         8    8   62
     9:         9    9   63
    10:        10   10   64

Suggestions?

    
asked by anonymous 19.09.2014 / 13:27

1 answer

6

Giving a code hint that the secret is in the fetch function, which is an implementation made in the RSQLite package, for objects of type SQLiteResult (class S4), which implements the generic method of the same name defined in the DBI package. In your code, the dbGetQuery function uses fetch . So, first I'll reproduce your code by isolating the fetch function:

library(DBI)
library(RSQLite)

A <- data.frame(var1 = 1:10, var2=55:64)
drv <- dbDriver("SQLite") 
con <- dbConnect(drv, "basequalquer.db") 
dbWriteTable(con, "TabelaA", A) 
res <- dbSendQuery(con, "Select * from TabelaA")
fetch(res)

Looking at the RSQLite source code , we can see that fetch calls the sqliteFetch function, copied below, which in turn calls an algorithm in C, which does all the dirty work and returns NULL or a list .

sqliteFetch <- function(res, n = 0) {  
  check_valid(res)

  # Returns NULL, or a list
  rel <- .Call(rsqlite_query_fetch, res@Id, nrec = as.integer(n))
  if (is.null(rel)) return(data.frame())

  attr(rel, "row.names") <- .set_row_names(length(rel[[1]]))
  attr(rel, "class") <- "data.frame"
  rel
}

Note that the result rel becomes a data.frame after that. You could even try to turn this list into data.table , but this would not be very different from turning a data.frame into a data.table .

So, unless someone has already implemented a workaround, to solve your problem you would have to reimplement the function in C to return an object of type data.table (which is basically a data.frame with a pointer like attribute). Also, you would need to change the generic definition of fetch , which expects a data.frame ( see here ).

I particularly do not think it would be worth it, because the gain in efficiency would not be as significant, but the effort would be.

By the way, I suggest you consider using the fantastic dplyr package, which has functions for efficient database extraction, using "lazy evaluation" (basically you can use the mutate , filter select , etc. of the dplyr vocabulary, and the package magically transforms those operations into SQL before downloading the data, which can result in a huge gain in data transfer.) In addition, the package was made to be intuitive and easy to use. See here .

    
26.09.2014 / 13:49