Convert docx file to pdf without losing formatting?

2

I'm doing the conversion from a docx file to pdf using the Docx4J API, however I'm finding it difficult to keep the original text formatting after conversion.

Dependencies:

<!-- docx4j -->
    <dependency>
        <groupId>org.docx4j</groupId>
        <artifactId>docx4j</artifactId>
        <version>3.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.docx4j</groupId>
        <artifactId>docx4j-ImportXHTML</artifactId>
        <version>3.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-simple</artifactId>
        <version>1.6.1</version>
    </dependency>
    <dependency>
        <groupId>org.capaxit.textimage</groupId>
        <artifactId>TextImageGen</artifactId>
        <version>2.0-SNAAPSHOT</version>
    </dependency>
    <dependency>
        <groupId>com.googlecode.jaxb-namespaceprefixmapper-interfaces</groupId>
        <artifactId>JAXBNamespacePrefixMapper</artifactId>
        <version>2.2.4</version>
        <scope>runtime</scope>
    </dependency>
    <dependency>
        <groupId>com.sun.xml.bind</groupId>
        <artifactId>jaxb-impl</artifactId>
        <version>2.2.11</version>
    </dependency>
    <dependency>
        <groupId>org.glassfish.jaxb</groupId>
        <artifactId>jaxb-runtime</artifactId>
        <version>2.2.11</version>
    </dependency>
    <dependency>
        <groupId>org.plutext</groupId>
        <artifactId>jaxb-xslfo</artifactId>
        <version>1.0.1</version>
    </dependency>
    <dependency>
        <groupId>org.docx4j</groupId>
        <artifactId>docx4j-export-fo</artifactId>
        <version>3.3.0</version>
    </dependency>
    <dependency>
        <groupId>org.docx4j</groupId>
        <artifactId>xhtmlrenderer</artifactId>
        <version>3.0.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.xmlgraphics</groupId>
        <artifactId>xmlgraphics-commons</artifactId>
        <version>2.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.avalon.framework</groupId>
        <artifactId>avalon-framework-api</artifactId>
        <version>4.3.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.avalon.framework</groupId>
        <artifactId>avalon-framework-impl</artifactId>
        <version>4.3.1</version>
    </dependency>
    <dependency>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
        <version>2.4</version>
    </dependency>

Method that performs file replace and conversion:

   @Path("fichaCaptacao")
    @GET
    @Produces({"application/pdf"})
    public Response fichaCaptacao(@Context ServletContext servletContext) throws Exception {
        // Exclude context init from timing
        org.docx4j.wml.ObjectFactory foo = org.docx4j.jaxb.Context.getWmlObjectFactory();

        // Font regex (optional)
        // Set regex if you want to restrict to some defined subset of fonts
        // Here we have to do this before calling createContent,
        // since that discovers fonts
        String outputFile = "/home/desenvolvimento/qimob.git/qimob-web/src/main/webapp/resources/templates/contratos/OUT_VariableReplace.docx";
        // Set regex se você quiser definir um grupo de fonte
        String regex = null;
        regex = ".*(Courier New|Arial|Times New Roman|Comic Sans|Georgia|Impact|Lucida Console|Lucida Sans Unicode|Palatino Linotype|Tahoma|Trebuchet|Verdana|Symbol|Webdings|Wingdings|MS Sans Serif|MS Serif).*";

        PhysicalFonts.setRegex(regex);

        String docInputStream = servletContext.getRealPath("/") + "/resources/templates/contratos/CONTRATO_LOCACAO_IMOVEL_RESIDENCIAL.docx";
        InputStream docxInputStream = new FileInputStream(docInputStream);

        WordprocessingMLPackage tmpPkg = null;

        tmpPkg = WordprocessingMLPackage.load(docxInputStream);

        MainDocumentPart documentPart = tmpPkg.getMainDocumentPart();

        HashMap<String, String> mappings = new HashMap<>();
        mappings.put("contratante", "Omar Mota");
        mappings.put("naturalidade", "Goiás-GO");
        mappings.put("nacionalidade", "Brasileiro");

        documentPart.variableReplace(mappings);
        // Refresh the values of DOCPROPERTY fields
        FieldUpdater updater = new FieldUpdater(tmpPkg);
        updater.update(true);

        // Set up font mapper (optional)
        Mapper fontMapper = new IdentityPlusMapper();
        tmpPkg.setFontMapper(fontMapper);

        // FO exporter setup (required)
        // .. the FOSettings object
        final FOSettings foSettings = Docx4J.createFOSettings();
        foSettings.setWmlPackage(tmpPkg);

        // Document format:
        // The default implementation of the FORenderer that uses Apache Fop will output
        // a PDF document if nothing is passed via
        foSettings.setApacheFopMime(FOSettings.MIME_PDF);
        // apacheFopMime can be any of the output formats defined in org.apache.fop.apps.MimeConstants eg org.apache.fop.apps.MimeConstants.MIME_FOP_IF or
        // FOSettings.INTERNAL_FO_MIME if you want the fo document as the result.
        //foSettings.setApacheFopMime(FOSettings.INTERNAL_FO_MIME);

        // Specify whether PDF export uses XSLT or not to create the FO
        // (XSLT takes longer, but is more complete).

//      // Save it
//      if (true) {
//          SaveToZipFile saver = new SaveToZipFile(tmpPkg);
//          saver.save(outputFile);
//      } else {
//          System.out.println(XmlUtils.marshaltoString(documentPart.getJaxbElement(), true,
//                  true));
//      }

//      PdfSettings pdfSettings = new PdfSettings();
//      OutputStream out = new FileOutputStream(new File("/home/desenvolvimento/Documents/conversao.pdf"));
//      PdfConversion converter = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(tmpPkg);
//      converter.output(out,pdfSettings);

        ResponseBuilder builder = Response.ok(
                new StreamingOutput() {
                    public void write(OutputStream output) throws IOException, WebApplicationException {
                        try {
                            Docx4J.toFO(foSettings, output, Docx4J.FLAG_EXPORT_PREFER_XSL);
                        } catch (Docx4JException e) {
                            throw new WebApplicationException(e);
                        }
                    }
                }
        );

//      // Clean up, so any ObfuscatedFontPart temp files can be deleted
        if (tmpPkg.getMainDocumentPart().getFontTablePart() != null) {
            tmpPkg.getMainDocumentPart().getFontTablePart().deleteEmbeddedFontTempFiles();
        }
        // This would also do it, via finalize() methods
        updater = null;
        tmpPkg = null;

        return builder.build();
//      // Prefer the exporter, that uses a xsl transformation
//      // Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);
//
//      // Prefer the exporter, that doesn't use a xsl transformation (= uses a visitor)
//      // .. faster, but not yet at feature parity
//      // Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_NONXSL);
//
//      System.out.println("Saved: " + outputfilepath);
//

    }

Log:

 15:24:27,217 INFO  [org.docx4j.openpackaging.contenttype.ContentTypeManager] (default task-41) Detected WordProcessingML package 
    15:24:27,217 INFO  [org.docx4j.openpackaging.io3.Load3] (default task-41) Instantiated package of type org.docx4j.openpackaging.packages.WordprocessingMLPackage
    15:24:27,218 INFO  [org.docx4j.openpackaging.io3.Load3] (default task-41) package read;  elapsed time: 3 ms
    15:24:27,218 INFO  [org.docx4j.openpackaging.parts.JaxbXmlPart] (default task-41) Lazily unmarshalling /word/document.xml
    15:24:27,224 INFO  [org.docx4j.openpackaging.parts.DocPropsCorePart] (default task-41) unmarshalling org.docx4j.openpackaging.parts.DocPropsCorePart
    15:24:27,224 INFO  [org.docx4j.openpackaging.parts.DocPropsExtendedPart] (default task-41) unmarshalling org.docx4j.openpackaging.parts.DocPropsExtendedPart
    15:24:27,225 INFO  [org.docx4j.model.fields.FieldUpdater] (default task-41) 

    Simple Fields in /word/document.xml
    ============= 
    Found 0 simple fields 

     Complex Fields in /word/document.xml
    ============== 
    Found 0 fields 

    15:24:27,225 WARN  [org.docx4j.fonts.IdentityPlusMapper] (default task-41) WARNING! SubstituterWindowsPlatformImpl works best on Windows.  To get good results on other platforms, you'll probably  need to have installed Windows fonts.
    15:24:27,227 INFO  [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
    15:24:27,227 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font. 
    15:24:27,236 INFO  [org.docx4j.openpackaging.parts.WordprocessingML.FontTablePart] (default task-41) Writing temp embedded fonts 1463077467236
    15:24:27,236 WARN  [org.docx4j.fonts.IdentityPlusMapper] (default task-41) - - No physical font for: Calibri
    15:24:27,236 WARN  [org.docx4j.fonts.Mapper] (default task-41) Overwriting existing fontMapping: arial
    15:24:27,236 WARN  [org.docx4j.fonts.IdentityPlusMapper] (default task-41) - - No physical font for: Times New Roman
    15:24:27,244 INFO  [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
    15:24:27,244 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font. 
    15:24:27,252 INFO  [org.docx4j.openpackaging.parts.WordprocessingML.FontTablePart] (default task-41) Writing temp embedded fonts 1463077467252
    15:24:27,254 INFO  [org.docx4j.convert.out.common.preprocess.FieldsCombiner] (default task-41) starting
    15:24:27,255 INFO  [org.docx4j.convert.out.common.preprocess.CoverPageSectPrMover] (default task-41) No need to move sectPr 
    15:24:27,261 WARN  [org.docx4j.openpackaging.parts.WordprocessingML.DocumentSettingsPart] (default task-41) No w:settings/w:compat element
    15:24:27,265 INFO  [org.docx4j.model.structure.PageDimensions] (default task-41) No cols in this section; defaulting.
    15:24:27,266 INFO  [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
    15:24:27,266 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font. 
    15:24:27,266 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Calibri is not mapped!
    15:24:27,280 INFO  [org.docx4j.XmlUtils] (default task-41) Using org.apache.xalan.transformer.TransformerImpl
    15:24:27,280 INFO  [org.docx4j.convert.out.common.AbstractConversionContext] (default task-41) /pkg:package
    15:24:27,286 INFO  [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
    15:24:27,286 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font. 
    15:24:27,294 INFO  [org.docx4j.openpackaging.parts.WordprocessingML.FontTablePart] (default task-41) Writing temp embedded fonts 1463077467294
    15:24:27,294 INFO  [org.docx4j.convert.out.common.preprocess.FieldsCombiner] (default task-41) starting
    15:24:27,294 INFO  [org.docx4j.convert.out.common.preprocess.CoverPageSectPrMover] (default task-41) No need to move sectPr 
    15:24:27,296 INFO  [org.docx4j.model.structure.PageDimensions] (default task-41) No cols in this section; defaulting.
    15:24:27,296 INFO  [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
    15:24:27,296 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font. 
    15:24:27,296 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Calibri is not mapped!
    15:24:27,299 INFO  [org.docx4j.XmlUtils] (default task-41) Using org.apache.xalan.transformer.TransformerImpl
    15:24:27,299 INFO  [org.docx4j.convert.out.common.AbstractConversionContext] (default task-41) /pkg:package
    15:24:27,303 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Times New Roman' is not mapped to a physical font. 
    15:24:27,307 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Times New Roman' is not mapped to a physical font. 
    15:24:27,310 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Times New Roman' is not mapped to a physical font. 
    15:24:27,313 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Times New Roman' is not mapped to a physical font. 
    15:24:27,315 INFO  [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
    15:24:27,315 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font. 
    15:24:27,317 WARN  [org.docx4j.fonts.fop.util.FopConfigUtil] (default task-41) Document font Calibri is not mapped to a physical font!
    15:24:27,317 WARN  [org.docx4j.fonts.fop.util.FopConfigUtil] (default task-41) Document font Times New Roman is not mapped to a physical font!
    15:24:27,322 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) Font "Calibri,normal,400" not found. Substituting with "any,normal,400".
    15:24:27,327 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) The contents of fo:region-body on page 4 exceed its viewport by 42211 millipoints. (See position 1:449)
    15:24:27,327 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) The contents of fo:region-body on page 3 exceed its viewport by 42211 millipoints. (See position 1:449)
    15:24:27,327 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) The contents of fo:region-body on page 2 exceed its viewport by 42211 millipoints. (See position 1:449)
    15:24:27,327 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) The contents of fo:region-body on page 1 exceed its viewport by 42211 millipoints. (See position 1:449)
    15:24:27,331 INFO  [org.docx4j.org.apache.xml.serializer.ToXMLStream] (default task-41) Using repackaged ToXMLStream
    15:24:27,331 INFO  [org.docx4j.org.apache.xml.serializer.ToXMLStream] (default task-41) Using repackaged ToXMLStream
    15:24:27,340 INFO  [org.docx4j.model.images.AbstractConversionImageHandler] (default task-41) Wrote @src='file:/tmp/6ccc1fe4-53c9-4661-b078-78c79a9a95d8image1.jpeg
    15:24:27,350 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Times New Roman' is not mapped to a physical font. 
    15:24:27,481 INFO  [org.docx4j.fonts.RunFontSelector] (default task-41) rPrDefault/rFonts referenced Calibri
    15:24:27,481 WARN  [org.docx4j.fonts.RunFontSelector] (default task-41) Font 'Calibri' is not mapped to a physical font. 
    15:24:27,489 WARN  [org.docx4j.fonts.fop.util.FopConfigUtil] (default task-41) Document font Calibri is not mapped to a physical font!
    15:24:27,489 WARN  [org.docx4j.fonts.fop.util.FopConfigUtil] (default task-41) Document font Times New Roman is not mapped to a physical font!
    15:24:27,509 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) Font "Symbol,normal,700" not found. Substituting with "Symbol,normal,400".
    15:24:27,509 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) Font "ZapfDingbats,normal,700" not found. Substituting with "ZapfDingbats,normal,400".
    15:24:27,510 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) Font "Arial,normal,700" not found. Substituting with "Arial,normal,400".
    15:24:27,521 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) Font "Calibri,normal,400" not found. Substituting with "any,normal,400".
    15:24:27,535 WARN  [org.apache.fop.apps.FOUserAgent] (default task-41) The contents of fo:inline line 1 exceed the available area in the inline-progression direction by 23379 millipoints. (See position 3:11147)
    15:24:27,561 INFO  [org.apache.fop.apps.FOUserAgent] (default task-41) Rendered page #1.

Process files are available here

The PDF file is the result and DOCX is the original file.

If anyone can help me in this challenge I will be grateful!

    
asked by anonymous 12.05.2016 / 20:35

0 answers