I need to convert the PDF data below into a data frame: link

Doing a search for the How to Read PDF Data in the R . I had some problems installing the package, but I managed to make it work in RStudio after all. But the result was not satisfactory, because in columns with 3 or more blank lines it jumps to another column.

Using the tabulizer package, I extracted information from the first page only to test:

url <- ''
d <- extract_tables(url, encoding = "UTF-8", pages = 1)

Then I made the list into a data frame, turned it into chr , named the variables, and removed the first line (which is actually the name of the variables)

d <-
d <- d %>% 
names(d) <- d[1,]
d <- d[-1,]

Then you need to clean up the information, such as the thousand separator, the decimal separator in pdf as , , and turn that information into numeric

d <- d %>% 
  mutate_all(funs(gsub("-", NA, .)))
d <- d %>% 
  mutate_at(vars(VENCIMENTO:'TOTAL LÍQUIDO'), funs(gsub("\.", "", .))) %>% 
  mutate_at(vars(VENCIMENTO:'TOTAL LÍQUIDO'), funs(as.numeric(gsub(",", "\.", .))))

If you remove the pages option from the extract_tables function it will pull all the pages of the pdf and place it within a single list. For the join in a single table, I think, d) will solve.

