How can I convert DOC and DOCX to TXT with PHP?

7

I have a system where the files the client is going to send me are all in file DOC or DOCX . However, you may want to download this document in TXT format.

Is there any simple way to convert DOC or DOCX to TXT through PHP?

    
asked by anonymous 20.11.2015 / 12:05

1 answer

3

I was able to solve the problem. I did it as follows:

I open the WORD document through class IOFactory of library PHPWord .

 $reader = PHPOffice\PhpWord\IOFactory::createReader('Word2007');

 $phpword = $reader->load('arquivo.docx');

Save the file as HTML in a temporary file:

$tempfile = tempnam(sys_get_temp_dir());

$phpword->save($tempfile, 'HTML');

I use the class DomDocument to find only the body tag

$dom = new DomDocument('1.0', 'UTF-8');

@$dom->load($tempfile); // Essa arroba é normal ;)

$body = $dom->getElementsByTagName('body')->item(0)->nodeValue;

Next I do schematization to format HTML . I also set it to display correctly in Notepad from Windows , changing "\n" to "\r\n" .

 $txt = str_replace("\n", "\r\n", strip_tags($body));

 file_put_contents('arquivo.txt', $txt);
    
27.11.2015 / 19:00