Format String after it is converted from HTML

1

I made a code that turns all HTML into a String, however, by doing so the code is coming like this:

<div class=\"page\">\r\n<div class=\"bloco\">\r\n   <table id=\"canhoto\">\r\n

The characters \ r \ n I can already remove, but now I need to figure out how to remove those slashes, for example in the div class, I'd like to leave it like this: class="page" all like this: class = \ "page \", I'd like somehow to treat them so that it does not look like this, and stay the right way.

string HTMLemString = RenderizaHtmlComoString("~/Views/Item/Item.cshtml", id);
        var regex = new Regex("(\<script(.+?)\</script\>)|(\<style(.+?)\</style\>)|(<link[^>]*>)",
            RegexOptions.Singleline | RegexOptions.IgnoreCase);
        HTMLemString = regex.Replace(HTMLemString, "");
        HTMLemString = HTMLemString.Replace("
string CSSdocumento = CSSemString();
        Byte[] bytes;

        using (var ms = new MemoryStream())
        {
            using (var doc = new Document())
            {
                using (var writer = PdfWriter.GetInstance(doc, ms))
                {
                    doc.Open();
                    var HTMLconversão = @HTMLemString;
                    var CSSconversão = @CSSdocumento;


                    using (var msCss = new MemoryStream(System.Text.ASCIIEncoding.UTF8.GetBytes(CSSconversão)))
                    {
                        using (var msHtml = new MemoryStream(System.Text.ASCIIEncoding.UTF8.GetBytes(HTMLconversão)))
                        {
                            iTextSharp.tool.xml.XMLWorkerHelper.GetInstance().ParseXHtml(writer, doc, msHtml, msCss);
                        }
                    }

                    doc.Close();
                }
            }

            bytes = ms.ToArray();
        }

        var testFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "teste.pdf");
        System.IO.File.WriteAllBytes(testFile, bytes);
", "");

The part that I treat the code is this.

<div class=\"page\">\r\n<div class=\"bloco\">\r\n   <table id=\"canhoto\">\r\n

And above the code where I generate PDF.

    
asked by anonymous 15.03.2017 / 17:48

1 answer

0

link

From what I've seen there seems to be some bug with this .. then this answer has a solution:

 Document document = new Document();
    try
    {
        PdfWriter.GetInstance(document, new FileStream("c:\my.pdf", FileMode.Create));
        document.Open();
        WebClient wc = new WebClient();
        string htmlText = wc.DownloadString("http://localhost:59500/my.html");
        Response.Write(htmlText);
        List<IElement> htmlarraylist = HTMLWorker.ParseToList(new StringReader(htmlText), null);
        for (int k = 0; k < htmlarraylist.Count; k++)
        {
            document.Add((IElement)htmlarraylist[k]);
        }

        document.Close();
    }
    catch
    {
    }
    
15.03.2017 / 18:02