I wrote this code in Python 3 to read the metadata of a PDF:
>>> from pdfminer.pdfparser import PDFParser
>>> from pdfminer.pdfdocument import PDFDocument
>>> fp = open('EMC 1-2017 PL678716 =- PL 6787-2016.pdf', 'rb')
>>> parser = PDFParser(fp)
>>> doc = PDFDocument(parser)
>>> print(doc.info)
And as a result it generates:
[{'Title': b'\xfe\xff\x00C\x00O\x00M\x00I\x00S\x00S\x00\xc3\x00O', 'Author': b'Ivanete de Araujo Costa', 'Subject': b'EMD ADI - Emenda Aditiva', 'Creator': b'\xfe\xff\x00M\x00i\x00c\x00r\x00o\x00s\x00o\x00f\x00t\x00\xae\x00 \x00W\x00o\x00r\x00d\x00 \x002\x000\x001\x000', 'CreationDate': b"D:20170314114321-07'00'", 'ModDate': b"D:20170314114321-07'00'", 'Producer': b'\xfe\xff\x00M\x00i\x00c\x00r\x00o\x00s\x00o\x00f\x00t\x00\xae\x00 \x00W\x00o\x00r\x00d\x00 \x002\x000\x001\x000'}]
Please, does anyone know how to isolate the results in variables? For example, in the above case get the results:
a = "Ivanete de Araujo Costa" (campo Author)
b = "EMD ADI - Emenda Aditiva" (campo Subject)
c = "D:20170314114321-07'00" (campo CreationDate)
d = "D:20170314114321-07'00" (campo ModDate)