Reproducing the problem:
tf <- tempfile()
write.table(
"+-----------------------------------------------------------------------------+
| Category Information | square|
| #|description | miles|
|-----------------------------------------------------------------------------|
| 3| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 2.096540|
| 4| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 14.719017|
|15| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 4.763791|
|19| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 0.002395|
|21| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 2.780825|
|25| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 0.087930|
|33| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . | 0.484098|
|-----------------------------------------------------------------------------|
|TOTAL | 24.934597|
+-----------------------------------------------------------------------------+",
tf, row.names = FALSE, col.names = FALSE)
rawdata <- read.table(tf, sep = "|", skip = 5)
Error in scan (file = file, what = what, sep = sep, quote = quote, dec = dec,:
line 7 did not have 5 elements
The file has some problems.
There are lines that do not contain tabular information ( |---...---
).
When the r tag finds this line, it does not find the same 5 columns you were encountering in the previous rows and throws the error.
Presented character "#"
: It is read by default as a comment on read.table()
.
The last line, with the total, does not respect the pattern of the rest of the file (it does not have |
after "TOTAL"
In addition the% initial and final% do not add any information to the table and generate two useless columns when the data is read
To solve the reading of these data I see at least two possible paths.
r-base
A way is to read the data with |
, remove those uncomfortable lines and then pass the "clean" data to readLines()
txt <- readLines(tf)
limpo <- txt[! grepl("----|TOTAL", txt)]
rawdata <- read.table(text = limpo, sep = "|", skip = 1, comment.char = "")
rawdata
V1 V2 V3 V4 V5
1 NA # description miles NA
2 NA 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.096540 NA
3 NA 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.719017 NA
4 NA 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.763791 NA
5 NA 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.002395 NA
6 NA 21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.780825 NA
7 NA 25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.087930 NA
8 NA 33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.484098 NA
tidyverse
No tidyverse the package to read files is readr . Using it we would have:
library(tidyverse)
rawdata2 <- read_delim(tf, "|", skip = 2, comment = "----")
rawdata2 %>% filter(!is.na(' #'))
# A tibble: 8 x 5
' ' ' #' 'description ~ ' miles' X5
<chr> <chr> <chr> <chr> <chr>
1 " " " 3" " . . . . . . . . . . . . . . .~ " 2.096540" NA
2 " " " 4" " . . . . . . . . . . . . . . .~ " 14.719017" NA
3 " " 15 " . . . . . . . . . . . . . . .~ " 4.763791" NA
4 " " 19 " . . . . . . . . . . . . . . .~ " 0.002395" NA
5 " " 21 " . . . . . . . . . . . . . . .~ " 2.780825" NA
6 " " 25 " . . . . . . . . . . . . . . .~ " 0.087930" NA
7 " " 33 " . . . . . . . . . . . . . . .~ " 0.484098" NA
8 " " "TOTAL ~ " 24.934597" NA NA
Note that in both cases the tables are not the same, because in the second case the row with total can be maintained.