I'm doing the conversion from a docx file to pdf using the Docx4J API, however I'm finding it difficult to keep the original text formatting after conversion.
Dependencies:
<!-- docx4j -->
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j</artifactId>
<version>3.3.0</version>
</dependency>
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-ImportXHTML</artifactId>
<version>3.3.0</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>1.6.1</version>
</dependency>
<dependency>
<groupId>org.capaxit.textimage</groupId>
<artifactId>TextImageGen</artifactId>
<version>2.0-SNAAPSHOT</version>
</dependency>
<dependency>
<groupId>com.googlecode.jaxb-namespaceprefixmapper-interfaces</groupId>
<artifactId>JAXBNamespacePrefixMapper</artifactId>
<version>2.2.4</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>com.sun.xml.bind</groupId>
<artifactId>jaxb-impl</artifactId>
<version>2.2.11</version>
</dependency>
<dependency>
<groupId>org.glassfish.jaxb</groupId>
<artifactId>jaxb-runtime</artifactId>
<version>2.2.11</version>
</dependency>
<dependency>
<groupId>org.plutext</groupId>
<artifactId>jaxb-xslfo</artifactId>
<version>1.0.1</version>
</dependency>
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-export-fo</artifactId>
<version>3.3.0</version>
</dependency>
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>xhtmlrenderer</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.xmlgraphics</groupId>
<artifactId>xmlgraphics-commons</artifactId>
<version>2.1</version>
</dependency>
<dependency>
<groupId>org.apache.avalon.framework</groupId>
<artifactId>avalon-framework-api</artifactId>
<version>4.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.avalon.framework</groupId>
<artifactId>avalon-framework-impl</artifactId>
<version>4.3.1</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.4</version>
</dependency>
Method that performs file replace and conversion:
@Path("fichaCaptacao")
@GET
@Produces({"application/pdf"})
public Response fichaCaptacao(@Context ServletContext servletContext) throws Exception {
// Exclude context init from timing
org.docx4j.wml.ObjectFactory foo = org.docx4j.jaxb.Context.getWmlObjectFactory();
// Font regex (optional)
// Set regex if you want to restrict to some defined subset of fonts
// Here we have to do this before calling createContent,
// since that discovers fonts
String outputFile = "/home/desenvolvimento/qimob.git/qimob-web/src/main/webapp/resources/templates/contratos/OUT_VariableReplace.docx";
// Set regex se você quiser definir um grupo de fonte
String regex = null;
regex = ".*(Courier New|Arial|Times New Roman|Comic Sans|Georgia|Impact|Lucida Console|Lucida Sans Unicode|Palatino Linotype|Tahoma|Trebuchet|Verdana|Symbol|Webdings|Wingdings|MS Sans Serif|MS Serif).*";
PhysicalFonts.setRegex(regex);
String docInputStream = servletContext.getRealPath("/") + "/resources/templates/contratos/CONTRATO_LOCACAO_IMOVEL_RESIDENCIAL.docx";
InputStream docxInputStream = new FileInputStream(docInputStream);
WordprocessingMLPackage tmpPkg = null;
tmpPkg = WordprocessingMLPackage.load(docxInputStream);
MainDocumentPart documentPart = tmpPkg.getMainDocumentPart();
HashMap<String, String> mappings = new HashMap<>();
mappings.put("contratante", "Omar Mota");
mappings.put("naturalidade", "Goiás-GO");
mappings.put("nacionalidade", "Brasileiro");
documentPart.variableReplace(mappings);
// Refresh the values of DOCPROPERTY fields
FieldUpdater updater = new FieldUpdater(tmpPkg);
updater.update(true);
// Set up font mapper (optional)
Mapper fontMapper = new IdentityPlusMapper();
tmpPkg.setFontMapper(fontMapper);
// FO exporter setup (required)
// .. the FOSettings object
final FOSettings foSettings = Docx4J.createFOSettings();
foSettings.setWmlPackage(tmpPkg);
// Document format:
// The default implementation of the FORenderer that uses Apache Fop will output
// a PDF document if nothing is passed via
foSettings.setApacheFopMime(FOSettings.MIME_PDF);
// apacheFopMime can be any of the output formats defined in org.apache.fop.apps.MimeConstants eg org.apache.fop.apps.MimeConstants.MIME_FOP_IF or
// FOSettings.INTERNAL_FO_MIME if you want the fo document as the result.
//foSettings.setApacheFopMime(FOSettings.INTERNAL_FO_MIME);
// Specify whether PDF export uses XSLT or not to create the FO
// (XSLT takes longer, but is more complete).
// // Save it
// if (true) {
// SaveToZipFile saver = new SaveToZipFile(tmpPkg);
// saver.save(outputFile);
// } else {
// System.out.println(XmlUtils.marshaltoString(documentPart.getJaxbElement(), true,
// true));
// }
// PdfSettings pdfSettings = new PdfSettings();
// OutputStream out = new FileOutputStream(new File("/home/desenvolvimento/Documents/conversao.pdf"));
// PdfConversion converter = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(tmpPkg);
// converter.output(out,pdfSettings);
ResponseBuilder builder = Response.ok(
new StreamingOutput() {
public void write(OutputStream output) throws IOException, WebApplicationException {
try {
Docx4J.toFO(foSettings, output, Docx4J.FLAG_EXPORT_PREFER_XSL);
} catch (Docx4JException e) {
throw new WebApplicationException(e);
}
}
}
);
// // Clean up, so any ObfuscatedFontPart temp files can be deleted
if (tmpPkg.getMainDocumentPart().getFontTablePart() != null) {
tmpPkg.getMainDocumentPart().getFontTablePart().deleteEmbeddedFontTempFiles();
}
// This would also do it, via finalize() methods
updater = null;
tmpPkg = null;
return builder.build();
// // Prefer the exporter, that uses a xsl transformation
// // Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);
//
// // Prefer the exporter, that doesn't use a xsl transformation (= uses a visitor)
// // .. faster, but not yet at feature parity
// // Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_NONXSL);
//
// System.out.println("Saved: " + outputfilepath);
//
}
Log:
15:24:27,217 INFO [org.docx4j.openpackaging.contenttype.ContentTypeManager] (default task-41) Detected WordProcessingML package
15:24:27,217 INFO [org.docx4j.openpackaging.io3.Load3] (default task-41) Instantiated package of type org.docx4j.openpackaging.packages.WordprocessingMLPackage
15:24:27,218 INFO [org.docx4j.openpackaging.io3.Load3] (default task-41) package read; elapsed time: 3 ms
15:24:27,218 INFO [org.docx4j.openpackaging.parts.JaxbXmlPart] (default task-41) Lazily unmarshalling /word/document.xml
15:24:27,224 INFO [org.docx4j.openpackaging.parts.DocPropsCorePart] (default task-41) unmarshalling org.docx4j.openpackaging.parts.DocPropsCorePart
15:24:27,224 INFO [org.docx4j.openpackaging.parts.DocPropsExtendedPart] (default task-41) unmarshalling org.docx4j.openpackaging.parts.DocPropsExtendedPart
15:24:27,225 INFO [org.docx4j.model.fields.FieldUpdater] (default task-41)
Simple Fields in /word/document.xml
=============
Found 0 simple fields
Complex Fields in /word/document.xml
==============
Found 0 fields
15:24:27,225 WARN [org.docx4j.fonts.IdentityPlusMapper] (default task-41) WARNING! SubstituterWindowsPlatformImpl works best on Windows. To get good results on other platforms, you'll probably need to have installed Windows fonts.
15:24:27,227 INFO [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
15:24:27,227 WARN [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font.
15:24:27,236 INFO [org.docx4j.openpackaging.parts.WordprocessingML.FontTablePart] (default task-41) Writing temp embedded fonts 1463077467236
15:24:27,236 WARN [org.docx4j.fonts.IdentityPlusMapper] (default task-41) - - No physical font for: Calibri
15:24:27,236 WARN [org.docx4j.fonts.Mapper] (default task-41) Overwriting existing fontMapping: arial
15:24:27,236 WARN [org.docx4j.fonts.IdentityPlusMapper] (default task-41) - - No physical font for: Times New Roman
15:24:27,244 INFO [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
15:24:27,244 WARN [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font.
15:24:27,252 INFO [org.docx4j.openpackaging.parts.WordprocessingML.FontTablePart] (default task-41) Writing temp embedded fonts 1463077467252
15:24:27,254 INFO [org.docx4j.convert.out.common.preprocess.FieldsCombiner] (default task-41) starting
15:24:27,255 INFO [org.docx4j.convert.out.common.preprocess.CoverPageSectPrMover] (default task-41) No need to move sectPr
15:24:27,261 WARN [org.docx4j.openpackaging.parts.WordprocessingML.DocumentSettingsPart] (default task-41) No w:settings/w:compat element
15:24:27,265 INFO [org.docx4j.model.structure.PageDimensions] (default task-41) No cols in this section; defaulting.
15:24:27,266 INFO [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
15:24:27,266 WARN [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font.
15:24:27,266 WARN [org.docx4j.fonts.RunFontSelector] (default task-41) Calibri is not mapped!
15:24:27,280 INFO [org.docx4j.XmlUtils] (default task-41) Using org.apache.xalan.transformer.TransformerImpl
15:24:27,280 INFO [org.docx4j.convert.out.common.AbstractConversionContext] (default task-41) /pkg:package
15:24:27,286 INFO [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
15:24:27,286 WARN [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font.
15:24:27,294 INFO [org.docx4j.openpackaging.parts.WordprocessingML.FontTablePart] (default task-41) Writing temp embedded fonts 1463077467294
15:24:27,294 INFO [org.docx4j.convert.out.common.preprocess.FieldsCombiner] (default task-41) starting
15:24:27,294 INFO [org.docx4j.convert.out.common.preprocess.CoverPageSectPrMover] (default task-41) No need to move sectPr
15:24:27,296 INFO [org.docx4j.model.structure.PageDimensions] (default task-41) No cols in this section; defaulting.
15:24:27,296 INFO [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
15:24:27,296 WARN [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font.
15:24:27,296 WARN [org.docx4j.fonts.RunFontSelector] (default task-41) Calibri is not mapped!
15:24:27,299 INFO [org.docx4j.XmlUtils] (default task-41) Using org.apache.xalan.transformer.TransformerImpl
15:24:27,299 INFO [org.docx4j.convert.out.common.AbstractConversionContext] (default task-41) /pkg:package
15:24:27,303 WARN [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Times New Roman' is not mapped to a physical font.
15:24:27,307 WARN [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Times New Roman' is not mapped to a physical font.
15:24:27,310 WARN [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Times New Roman' is not mapped to a physical font.
15:24:27,313 WARN [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Times New Roman' is not mapped to a physical font.
15:24:27,315 INFO [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
15:24:27,315 WARN [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font.
15:24:27,317 WARN [org.docx4j.fonts.fop.util.FopConfigUtil] (default task-41) Document font Calibri is not mapped to a physical font!
15:24:27,317 WARN [org.docx4j.fonts.fop.util.FopConfigUtil] (default task-41) Document font Times New Roman is not mapped to a physical font!
15:24:27,322 WARN [org.apache.fop.apps.FOUserAgent] (default task-41) Font "Calibri,normal,400" not found. Substituting with "any,normal,400".
15:24:27,327 WARN [org.apache.fop.apps.FOUserAgent] (default task-41) The contents of fo:region-body on page 4 exceed its viewport by 42211 millipoints. (See position 1:449)
15:24:27,327 WARN [org.apache.fop.apps.FOUserAgent] (default task-41) The contents of fo:region-body on page 3 exceed its viewport by 42211 millipoints. (See position 1:449)
15:24:27,327 WARN [org.apache.fop.apps.FOUserAgent] (default task-41) The contents of fo:region-body on page 2 exceed its viewport by 42211 millipoints. (See position 1:449)
15:24:27,327 WARN [org.apache.fop.apps.FOUserAgent] (default task-41) The contents of fo:region-body on page 1 exceed its viewport by 42211 millipoints. (See position 1:449)
15:24:27,331 INFO [org.docx4j.org.apache.xml.serializer.ToXMLStream] (default task-41) Using repackaged ToXMLStream
15:24:27,331 INFO [org.docx4j.org.apache.xml.serializer.ToXMLStream] (default task-41) Using repackaged ToXMLStream
15:24:27,340 INFO [org.docx4j.model.images.AbstractConversionImageHandler] (default task-41) Wrote @src='file:/tmp/6ccc1fe4-53c9-4661-b078-78c79a9a95d8image1.jpeg
15:24:27,350 WARN [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Times New Roman' is not mapped to a physical font.
15:24:27,481 INFO [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
15:24:27,481 WARN [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font.
15:24:27,489 WARN [org.docx4j.fonts.fop.util.FopConfigUtil] (default task-41) Document font Calibri is not mapped to a physical font!
15:24:27,489 WARN [org.docx4j.fonts.fop.util.FopConfigUtil] (default task-41) Document font Times New Roman is not mapped to a physical font!
15:24:27,509 WARN [org.apache.fop.apps.FOUserAgent] (default task-41) Font "Symbol,normal,700" not found. Substituting with "Symbol,normal,400".
15:24:27,509 WARN [org.apache.fop.apps.FOUserAgent] (default task-41) Font "ZapfDingbats,normal,700" not found. Substituting with "ZapfDingbats,normal,400".
15:24:27,510 WARN [org.apache.fop.apps.FOUserAgent] (default task-41) Font "Arial,normal,700" not found. Substituting with "Arial,normal,400".
15:24:27,521 WARN [org.apache.fop.apps.FOUserAgent] (default task-41) Font "Calibri,normal,400" not found. Substituting with "any,normal,400".
15:24:27,535 WARN [org.apache.fop.apps.FOUserAgent] (default task-41) The contents of fo:inline line 1 exceed the available area in the inline-progression direction by 23379 millipoints. (See position 3:11147)
15:24:27,561 INFO [org.apache.fop.apps.FOUserAgent] (default task-41) Rendered page #1.
Process files are available here
The PDF file is the result and DOCX is the original file.
If anyone can help me in this challenge I will be grateful!