I was able to solve the problem. I did it as follows:
I open the WORD document through class IOFactory
of library PHPWord
.
$reader = PHPOffice\PhpWord\IOFactory::createReader('Word2007');
$phpword = $reader->load('arquivo.docx');
Save the file as HTML
in a temporary file:
$tempfile = tempnam(sys_get_temp_dir());
$phpword->save($tempfile, 'HTML');
I use the class DomDocument
to find only the body
tag
$dom = new DomDocument('1.0', 'UTF-8');
@$dom->load($tempfile); // Essa arroba é normal ;)
$body = $dom->getElementsByTagName('body')->item(0)->nodeValue;
Next I do schematization to format HTML
. I also set it to display correctly in Notepad from Windows
, changing "\n"
to "\r\n"
.
$txt = str_replace("\n", "\r\n", strip_tags($body));
file_put_contents('arquivo.txt', $txt);