read txt file with less than 5 elements using read.table

Question

read txt file with less than 5 elements using read.table

Navigation

#1 by (3 votes)
#2 by (2 votes)

1

I'm trying to read the txt file with two columns below:

+-----------------------------------------------------------------------------+
|                      Category Information                        |    square|
| #|description                                                    |     miles|
|-----------------------------------------------------------------------------|
| 3| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  2.096540|
| 4| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 14.719017|
|15| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  4.763791|
|19| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.002395|
|21| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  2.780825|
|25| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.087930|
|33| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.484098|
|-----------------------------------------------------------------------------|
|TOTAL                                                             | 24.934597|
+-----------------------------------------------------------------------------+

I'm using the following line of code:

rawdata<-read.table("1986.txt", sep = "|",skip = 5)

But it does not read anything and returns that there is not a minimum of 5 elements.

r txt rstudio

asked by anonymous 13.12.2018 / 01:21

2 answers

2

Knowing what's in the file, the following works. It's not at all general at all.

dados <- read.table(file = "Artur.txt", 
                    sep = "|", comment.char = "+", 
                    skip = 4, fill = TRUE)

dados <- dados[!sapply(dados, function(x) all(is.na(x)))]
dados <- dados[apply(dados, 1, function(x) !any(grepl("----", x))), ]
dados$V4[nrow(dados)] <- as.numeric(as.character(dados$V3[nrow(dados)]))
dados <- dados[-2]
dados$V2 <- droplevels(dados$V2)
dados$V2 <- trimws(as.character(dados$V2))
names(dados) <- c("number", "sq.miles")

dados
#  number  sq.miles
#1      3  2.096540
#2      4 14.719017
#3     15  4.763791
#4     19  0.002395
#5     21  2.780825
#6     25  0.087930
#7     33  0.484098
#9  TOTAL 24.934597

13.12.2018 / 20:57

Is it possible to extend in a class via reflection in java? SQL & PHP - Select everything using where

score 3 · Accepted Answer

Reproducing the problem:

tf <- tempfile()
write.table(
  "+-----------------------------------------------------------------------------+
   |                      Category Information                        |    square|
   | #|description                                                    |     miles|
   |-----------------------------------------------------------------------------|
   | 3| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  2.096540|
   | 4| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 14.719017|
   |15| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  4.763791|
   |19| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.002395|
   |21| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  2.780825|
   |25| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.087930|
   |33| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |  0.484098|
   |-----------------------------------------------------------------------------|
   |TOTAL                                                             | 24.934597|
   +-----------------------------------------------------------------------------+",
   tf, row.names = FALSE, col.names = FALSE)

rawdata <- read.table(tf, sep = "|", skip = 5)

Error in scan (file = file, what = what, sep = sep, quote = quote, dec = dec,: line 7 did not have 5 elements

The file has some problems.

There are lines that do not contain tabular information ( |---...--- ). When the r tag finds this line, it does not find the same 5 columns you were encountering in the previous rows and throws the error.

Presented character "#" : It is read by default as a comment on read.table() .

The last line, with the total, does not respect the pattern of the rest of the file (it does not have | after "TOTAL"

In addition the% initial and final% do not add any information to the table and generate two useless columns when the data is read

To solve the reading of these data I see at least two possible paths.

r-base

A way is to read the data with | , remove those uncomfortable lines and then pass the "clean" data to readLines()

txt <- readLines(tf)
limpo <- txt[! grepl("----|TOTAL", txt)]
rawdata <- read.table(text = limpo, sep = "|", skip = 1, comment.char = "")
rawdata

  V1 V2                                                              V3         V4 V5
1 NA  # description                                                          miles NA
2 NA  3  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    2.096540 NA
3 NA  4  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   14.719017 NA
4 NA 15  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    4.763791 NA
5 NA 19  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    0.002395 NA
6 NA 21  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    2.780825 NA
7 NA 25  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    0.087930 NA
8 NA 33  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    0.484098 NA

tidyverse

No tidyverse the package to read files is readr . Using it we would have:

library(tidyverse)
rawdata2 <- read_delim(tf, "|", skip = 2, comment = "----")
rawdata2 %>% filter(!is.na(' #'))

# A tibble: 8 x 5
  '   ' ' #'                        'description                   ~ '     miles' X5   
  <chr> <chr>                       <chr>                            <chr>        <chr>
1 "   " " 3"                        " . . . . . . . . . . . . . . .~ "  2.096540" NA   
2 "   " " 4"                        " . . . . . . . . . . . . . . .~ " 14.719017" NA   
3 "   " 15                          " . . . . . . . . . . . . . . .~ "  4.763791" NA   
4 "   " 19                          " . . . . . . . . . . . . . . .~ "  0.002395" NA   
5 "   " 21                          " . . . . . . . . . . . . . . .~ "  2.780825" NA   
6 "   " 25                          " . . . . . . . . . . . . . . .~ "  0.087930" NA   
7 "   " 33                          " . . . . . . . . . . . . . . .~ "  0.484098" NA   
8 "   " "TOTAL                    ~ " 24.934597"                     NA           NA

Note that in both cases the tables are not the same, because in the second case the row with total can be maintained.