read txt file with less than 5 elements using read.table

1

I'm trying to read the txt file with two columns below:

+-----------------------------------------------------------------------------+
|                      Category Information                        |    square|
| #|description                                                    |     miles|
|-----------------------------------------------------------------------------|
| 3| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  2.096540|
| 4| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 14.719017|
|15| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  4.763791|
|19| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.002395|
|21| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  2.780825|
|25| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.087930|
|33| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.484098|
|-----------------------------------------------------------------------------|
|TOTAL                                                             | 24.934597|
+-----------------------------------------------------------------------------+

I'm using the following line of code:

rawdata<-read.table("1986.txt", sep = "|",skip = 5)

But it does not read anything and returns that there is not a minimum of 5 elements.

    
asked by anonymous 13.12.2018 / 01:21

2 answers

3

Reproducing the problem:

tf <- tempfile()
write.table(
  "+-----------------------------------------------------------------------------+
   |                      Category Information                        |    square|
   | #|description                                                    |     miles|
   |-----------------------------------------------------------------------------|
   | 3| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  2.096540|
   | 4| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 14.719017|
   |15| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  4.763791|
   |19| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.002395|
   |21| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  2.780825|
   |25| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.087930|
   |33| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.484098|
   |-----------------------------------------------------------------------------|
   |TOTAL                                                             | 24.934597|
   +-----------------------------------------------------------------------------+",
   tf, row.names = FALSE, col.names = FALSE)

rawdata <- read.table(tf, sep = "|", skip = 5)
  

Error in scan (file = file, what = what, sep = sep, quote = quote, dec = dec,:     line 7 did not have 5 elements

The file has some problems.

  • There are lines that do not contain tabular information ( |---...--- ). When the tag finds this line, it does not find the same 5 columns you were encountering in the previous rows and throws the error.
  • Presented character "#" : It is read by default as a comment on read.table() .
  • The last line, with the total, does not respect the pattern of the rest of the file (it does not have | after "TOTAL"
  • In addition the% initial and final% do not add any information to the table and generate two useless columns when the data is read

    To solve the reading of these data I see at least two possible paths.

    r-base

    A way is to read the data with | , remove those uncomfortable lines and then pass the "clean" data to readLines()

    txt <- readLines(tf)
    limpo <- txt[! grepl("----|TOTAL", txt)]
    rawdata <- read.table(text = limpo, sep = "|", skip = 1, comment.char = "")
    rawdata
    
      V1 V2                                                              V3         V4 V5
    1 NA  # description                                                          miles NA
    2 NA  3  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    2.096540 NA
    3 NA  4  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   14.719017 NA
    4 NA 15  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    4.763791 NA
    5 NA 19  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    0.002395 NA
    6 NA 21  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    2.780825 NA
    7 NA 25  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    0.087930 NA
    8 NA 33  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    0.484098 NA
    

    tidyverse

    No the package to read files is readr . Using it we would have:

    library(tidyverse)
    rawdata2 <- read_delim(tf, "|", skip = 2, comment = "----")
    rawdata2 %>% filter(!is.na(' #'))
    
    # A tibble: 8 x 5
      '   ' ' #'                        'description                   ~ '     miles' X5   
      <chr> <chr>                       <chr>                            <chr>        <chr>
    1 "   " " 3"                        " . . . . . . . . . . . . . . .~ "  2.096540" NA   
    2 "   " " 4"                        " . . . . . . . . . . . . . . .~ " 14.719017" NA   
    3 "   " 15                          " . . . . . . . . . . . . . . .~ "  4.763791" NA   
    4 "   " 19                          " . . . . . . . . . . . . . . .~ "  0.002395" NA   
    5 "   " 21                          " . . . . . . . . . . . . . . .~ "  2.780825" NA   
    6 "   " 25                          " . . . . . . . . . . . . . . .~ "  0.087930" NA   
    7 "   " 33                          " . . . . . . . . . . . . . . .~ "  0.484098" NA   
    8 "   " "TOTAL                    ~ " 24.934597"                     NA           NA 
    

    Note that in both cases the tables are not the same, because in the second case the row with total can be maintained.

        
    13.12.2018 / 12:07
    2

    Knowing what's in the file, the following works. It's not at all general at all.

    dados <- read.table(file = "Artur.txt", 
                        sep = "|", comment.char = "+", 
                        skip = 4, fill = TRUE)
    
    dados <- dados[!sapply(dados, function(x) all(is.na(x)))]
    dados <- dados[apply(dados, 1, function(x) !any(grepl("----", x))), ]
    dados$V4[nrow(dados)] <- as.numeric(as.character(dados$V3[nrow(dados)]))
    dados <- dados[-2]
    dados$V2 <- droplevels(dados$V2)
    dados$V2 <- trimws(as.character(dados$V2))
    names(dados) <- c("number", "sq.miles")
    
    dados
    #  number  sq.miles
    #1      3  2.096540
    #2      4 14.719017
    #3     15  4.763791
    #4     19  0.002395
    #5     21  2.780825
    #6     25  0.087930
    #7     33  0.484098
    #9  TOTAL 24.934597
    
        
    13.12.2018 / 20:57