Alternative for converting Word to HTML

4

I maintain a site that does not have a CMS, and I often get requests from the client asking me to put text in a pre-determined format.

Generally these texts come in MS Word .doc or .docx format.

The problem is that they are long texts with various formatting, tables and spacing that need to be respected.

When I try to argue with the client that I can not simply copy and paste the document into a webpage, the client does not understand, and it charges me agility.

But the process, as many should know, is laborious. I usually need to use a tool to convert from Word to HTML, but the results are awful and still generate work to set styles, fix links and adjust images.

My question then is: Do you have a friendlier way of receiving content from clients to create HTML pages if I do not have CMS resources on the site?

Maybe some text editor that already creates an HTML cleaner than Word in the Save as HTML option?

Does anyone have a similar problem?

    
asked by anonymous 11.03.2015 / 20:32

1 answer

2

There is no ready-made solution. Conversion solutions will always generate dirty code, and the result is not always reliable, even because Word itself generates dirty code in your documents.

As you have to do this often, you have 2 options: convince the client to use a CMS or develop an xml processor to convert the docx files that they send in a clean html to your site. .Doc files will give so much work that the best is to convert docx and dps to pass in the processor.

Here has an example of this type of script with php. The example is simplified, but it is a bm starting point. And here you find information about the structure of Office Open Xml.

Have fun!

    
12.03.2015 / 02:31