Text mining python or r [closed]

Question

Text mining python or r [closed]

Navigation

#1 by (1 votes)

1

I'm trying to extract information from PDF files to popular a table without having to read the PDF. I just can not find any reference to how to do this.

I need, for example, to find the authors and date of publication of this article:

link

I would like package / function tips in python or r.

python r text

asked by anonymous 09.10.2018 / 18:50

1 answer

Placing outline in USEMAP HTML5 Limit query in index but not in combo

score 1 · Answer 1

PDF files can have special fields to store this data, such as author and date, but I opened the PDF you sent and in them these fields are not filled:

Sothere'snomagic,you'llneedtoparsethetextandextractthedatadirectly,sincethePDFdoesnotprovidethisdatainanorganizedway.

Ifyoudonotknowtheexacttexttobesearchedfor,youcanmakepossi-bilitiesandmakeyourprogramtryeverypossibilityuntilyoufindonethatcangetthedata.

Forexample,inthePDFlisted,youcantrycomparingeachlinetothePDFnametofindthefulltitle,andconsiderthenextlineastheauthor.

AnotheroptionistolookuptheISSNacronym,andifyoufindit,youcanpickupthenumberandlookforsiteslike link and extract the data you want from the site rather than grabbing the PDF.