While exporting the HTML file to PDF using iTextSharp and XMLWorker error occurs in some situations saying that certain tag is not closed and searching I found this post
My application queries an SQL table from where it returns saved HTML forms and when I try to convert them into PDF error occurs saying that certain tag is not closed, below is the code I use to export to PDF:
public ActionResult GetPdfFileZiped(ProcessamentoRegistros pProcessamentoRegistros)
{
XMLWorkerHelper.GetInstance().ParseXHtml(pw, doc, srHtml);
ocorre erro pois a estrutura do HTML eventualmente não está bem formatada
pProcessamentoRegistros.IdProcessamentoDiario = 1;
pProcessamentoRegistros.IdRegistro = 1;
pProcessamentoRegistros.IdServico = 2;
ProcessamentoRegistros _processamento = _IRepositorio.ObterProcessamentoRegistros(pProcessamentoRegistros);
var doc = new Document(PageSize.A4.Rotate());
var stream = new MemoryStream();
var pw = PdfWriter.GetInstance(doc, stream);
var minhaStringHTML = @_processamento.DocumentoHtml.Trim();
doc.Open();
using (var srHtml = new StringReader(minhaStringHTML))
{
XMLWorkerHelper.GetInstance().ParseXHtml(pw, doc, srHtml); // <-- AQUI OCORRE ERRO
}
doc.Close();
using (var compressedFileStream = new MemoryStream())
{
using (var zipArchive = new ZipArchive(compressedFileStream, ZipArchiveMode.Update, false))
{
var zipEntry = zipArchive.CreateEntry("MeuPDFZipado.pdf");
using (var originalFileStream = new MemoryStream(stream.ToArray()))
{
using (var zipEntryStream = zipEntry.Open())
{
originalFileStream.CopyTo(zipEntryStream);
}
}
}
return new FileContentResult(compressedFileStream.ToArray(), "application/zip") { FileDownloadName = "Filename.zip" };
}
}
For example, below the img tag is not closed and I have no control over its formatting, the error occurs in some other tags:
<IMG border="0" src="https://www.sifge.caixa.gov.br/Empresa/Crf/images/caixa.gif"width=180height=44>
BelowisthefullHTML:
<HTML><HEAD><METANAME="GENERATOR" Content="Microsoft Visual Studio 6.0">
<script language=javascript>
//function MudarPagina() {
// window.history.back();
//}
</script>
</HEAD>
<!--body bgcolor=white onBlur=MudarPagina();-->
<body bgcolor=white>
<FORM method="post" style="BACKGROUND-COLOR: white">
<!--FORM name="Imprimir" method="post" style="BACKGROUND-COLOR: white"-->
<br>
<table>
<tr>
<td align=center><a href="javascript:window.print();"><IMG src="https://www.sifge.caixa.gov.br/Empresa/Crf/images/botimprimir.gif"border=0></a><ahref="javascript:window.history.back();"><IMG src="https://www.sifge.caixa.gov.br/Empresa/Crf/images/botvoltar.gif"border=0></a></td></tr><tr><td><tablewidth="75%" CELLSPACING=0 CELLPADDING=10 border=1 align=center bordercolorlight="#FFFFFF" bordercolordark="#CCCCCC">
<tr>
<td>
<TABLE WIDTH=100% BORDER=0 CELLSPACING=0 CELLPADDING=0 style="color: black" class=txtcentral>
<tr>
<td align=left><IMG border="0" src="https://www.sifge.caixa.gov.br/Empresa/Crf/images/caixa.gif"width=180height=44></td></tr><tr><tdcolspan=2> </td></tr><tr><tdalign=rigth><spanstyle="font-size: 13pt" align=center><strong>Certificado de Regularidade do FGTS - CRF</strong></span></td>
</tr>
</table>
<TABLE WIDTH=100% BORDER=0 CELLSPACING=0 CELLPADDING=0 style="color: black" class=txtcentral>
<tr><td colspan=2> </td></tr>
<tr><td colspan=2> </td></tr>
<tr>
<TD width=22%><font style=" font-family: Verdana;font-size:10pt"><strong>Inscrição:</strong></font></TD>
<TD ><font style=" font-family: Verdana;font-size:8pt">08439659/0001-50</font></TD>
</tr>
<tr>
<td width=22% valign=top nowrap><font style=" font-family: Verdana;font-size:10pt"><strong>Razão Social:</strong></font></TD>
<td><font style=" font-family: Verdana;font-size:8pt">CPFL ENERGIAS RENOVAVEIS S A</font></TD>
</tr>
<tr>
<td width=22% nowrap><font style=" font-family: Verdana;font-size:10pt"><strong>Nome Fantasia:</strong></font></TD>
<td ><font style=" font-family: Verdana;font-size:8pt">CPFL RENOVAVEIS</font></TD>
</tr>
<tr>
<td width=22% valign=top><font style=" font-family: Verdana;font-size:10pt"><strong>Endereço:</strong></font></TD>
<td ><font style=" font-family: Verdana;font-size:8pt">AV DOUTOR CARDOSO DE MELO 1184 ANDAR 7 / VILA OLIMPIA / SAO PAULO / SP / 4548-004</font></TD>
</tr>
<tr><td colspan=2> </td></tr>
<tr><td colspan=2> </td></tr>
<tr>
<TD colspan=2 style="text-align: justify"><font style=" font-family: Verdana;font-size:10pt">A Caixa Econômica Federal, no uso da atribuição que lhe confere o Art. 7, da
Lei 8.036, de 11 de maio de 1990, certifica que, nesta data, a empresa acima identificada
encontra-se em situação regular perante o Fundo de Garantia do Tempo de Serviço - FGTS.
</font>
</TD>
</tr>
<tr><td colspan=2> </td></tr>
<tr><td colspan=2> </td></tr>
<tr>
<td style="text-align: justify" colspan=2><font style=" font-family: Verdana;font-size:10pt">O presente Certificado não servirá de prova contra cobrança de quaisquer débitos referentes
a contribuições e/ou encargos devidos, decorrentes das obrigações com o FGTS.</font>
</td>
</tr>
<tr><td colspan=2> </td></tr>
<tr><td colspan=2> </td></tr>
<tr>
<td colspan=2><font style=" font-family: Verdana;font-size:10pt"><strong>Validade: </strong>28/02/2017 a 29/03/2017</font></TD>
</tr>
<tr><td colspan=2> </td></tr>
<tr>
<td colspan=2><font style=" font-family: Verdana;font-size:10pt"><strong>Certificação Número: </strong>2017022805233090232330</font></TD></TR>
<tr><td colspan=2> </td></tr>
<tr><td colspan=2> </td></tr>
<tr>
<TD colspan=2><font style=" font-family: Verdana;font-size:10pt">Informação obtida em 15/03/2017, às 17:14:51.</font></TD>
</tr>
<tr><td colspan=2> </td></tr>
<tr><td colspan=2> </td></tr>
<tr>
<TD style="text-align: justify" colspan=2><font style=" font-family: Verdana;font-size:10pt">A utilização deste Certificado
para os fins previstos em Lei está condicionada à verificação de
autenticidade no site da Caixa: <strong>www.caixa.gov.br</strong></font></TD>
</tr>
</TABLE>
</form>
</td></tr></table>
</td>
</tr>
</table>
<script language=javascript>
//window.print();
</script>
</BODY>
</HTML>
How can I get around this problem? Is it possible to parse in HTML and transform into XHTML? Do you have any other free alternatives to convert this HTML to PDF along with the tag styles?