C # update XML based on another XML

4

Today I have the following XML structure:

<ROOT>
    <TES IDTES="4780" IDPES="17522" />
    <TES IDTES="6934" IDPES="12343" />
    <TES IDTES="4781" IDPES="17523" />
    <TES IDTES="6935" IDPES="12344" />
</ROOT>

To update this XML I have the following:

<ROOT>
    <TES DEL="S" IDTES="4780" IDPES="17522" />
    <TES DEL="S" IDTES="6934" IDPES="12343" />
    <TES IDTES="7777" IDPES="17523" />
    <TES IDTES="2020" IDPES="12344" />
</ROOT>

It means that I have to delete 2 TES tags with their respective IDTES and add 2 more TES tags. Resulting in:

<ROOT>
    <TES IDTES="4781" IDPES="17523" />
    <TES IDTES="6935" IDPES="12344" />
    <TES IDTES="7777" IDPES="17523" />
    <TES IDTES="2020" IDPES="12344" />
</ROOT>

I searched a bit about Diff and Merge between XMLs in C # but they did not help me much.

How to do this with LINQ without using looping?

    
asked by anonymous 16.04.2014 / 22:23

3 answers

2

Using LINQ with XDocument :

XDocument doc1 = XDocument.Parse(@"
<ROOT>
    <TES IDTES=""4780"" IDPES=""17522"" />
    <TES IDTES=""6934"" IDPES=""12343"" />
    <TES IDTES=""4781"" IDPES=""17523"" />
    <TES IDTES=""6935"" IDPES=""12344"" />
</ROOT>");

XDocument doc2 = XDocument.Parse(@"
<ROOT>
    <TES DEL=""S"" IDTES=""4780"" IDPES=""17522"" />
    <TES DEL=""S"" IDTES=""6934"" IDPES=""12343"" />
    <TES IDTES=""7777"" IDPES=""17523"" />
    <TES IDTES=""2020"" IDPES=""12344"" />
</ROOT>");

In this example I am using literal strings to create the objects, of course you should open the XML files using Load() :

XDocument doc1 = XDocument.Load("file.xml");

The idea would be to merge the 2 files while converting to a list of simpler objects:

var list = doc1.Element("ROOT").Elements().Select(m => new { 
        IDTES = (string)m.Attribute("IDTES"), 
        IDPES = (string)m.Attribute("IDPES"), 
        DEL = (string)m.Attribute("DEL") ?? "N" } // coalesce para "N" em caso de null 
    ).Union(doc2.Element("ROOT").Elements().Select(m => new { 
        IDTES = (string)m.Attribute("IDTES"), 
        IDPES = (string)m.Attribute("IDPES"), 
        DEL = (string)m.Attribute("DEL") ?? "N" }
    )
);

Filter this list with Where to get the lines to exclude and apply Except to produce the desired result:

var toDel = list.Where(m => m.DEL == "S").Select(m => new { m.IDTES, m.IDPES });
var result = list.Select(m => new { m.IDTES, m.IDPES }).Except(toDel);

Then just generate a new XDocument from the result:

var doc3 = new XDocument(new XElement("ROOT",
           from r in result
           select new XElement("TES",
               new XAttribute("IDTES", r.IDTES),
               new XAttribute("IDPES", r.IDPES)
           )
      )
);

And write to disk with Save() :

doc3.Save("file.xml");
    
17.04.2014 / 16:12
4

An alternative to using LINQ is to use XSLT transformation, which performs transformations on XML nodes using compiled templates. XSLT transforms use DOM and load XML into memory, but nodes are selected with XPath, which tends to be more efficient.

The downside is that XSLT is another language (and not as trivial as it first seems). I will describe how a solution to your problem with XSLT (which you can run with C #) might be. If the structure of your original documents is similar to the one you provided as an example, you may not even need to change the code and can use it without change.

A brief summary of how XSLT works

The XSLT transformer receives a source document (XML well-formed) and an XSL document (XML in XSLT language) a result in text (can be XML, text, XML fragment, etc.) The XSL document can also read additional fonts (files) that are loaded through a function used in XPath expressions (% with%). In your case, the file that contains the overrides would be loaded this way. The transformer also accepts that data is passed as a parameter at the time of execution. This data is passed to a document('caminho-do-arquivo') element in the XSL document. You can run the transformer in several ways. There are online services, command line tools (such as Saxon, Xalan) as well as APIs in C #, Java, PHP, Ruby, etc.

Solve your problem using C # and XSLT

I'll call the original file from <xsl:param> :

<ROOT>
    <TES IDTES="4780" IDPES="17522" />
    <TES IDTES="6934" IDPES="12343" />
    <TES IDTES="4781" IDPES="17523" />
    <TES IDTES="6935" IDPES="12344" />
</ROOT>  

And the file with the substitutions of fonte.xml :

<ROOT>
    <TES DEL="S" IDTES="4780" IDPES="17522" />
    <TES DEL="S" IDTES="6934" IDPES="12343" />
    <TES IDTES="7777" IDPES="17523" />
    <TES IDTES="2020" IDPES="12344" />
</ROOT>

The XSLT document I'll call atualizacao.xml does the transformation you need. If you run an XSL transformer and pass atualiza.xsl as input, fonte.xml as the parameter I called atualizacao.xml , and arquivo as the XSL file, it will generate this result:

<ROOT>
   <TES IDTES="4781" IDPES="17523"/>
   <TES IDTES="6935" IDPES="12344"/>
   <TES IDTES="7777" IDPES="17523"/>
   <TES IDTES="2020" IDPES="12344"/>
</ROOT>

The C # code to run the XSLT transformer is similar to the code below (I have not tested - and I'm not a C # programmer - so there may be some inaccuracy):

        XslCompiledTransform transform = new XslCompiledTransform(true);

        XsltArgumentList par = new XsltArgumentList();
        par.AddParam("arquivo", "", "atualizacao.xml");

        XsltSettings s = new XsltSettings();
        s.EnableDocumentFunction = true;

        transform.Load("atualiza.xslt",s, new XmlUrlResolver());

        using (StreamWriter stream = new StreamWriter("resultado.xml")) 
        {
            transform.Transform("fonte.xml", par, stream);
        }

The XSLT document is listed below:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <xsl:output indent="yes"/>

    <xsl:param name="arquivo">atualizacao.xml</xsl:param>
    <xsl:variable name="doc" select="document($arquivo)" />

    <xsl:template match="ROOT">
        <xsl:copy>
            <xsl:apply-templates select="TES[not($doc/ROOT/TES/@IDTES=@IDTES and $doc/ROOT/TES/@IDPES=@IDPES and $doc/ROOT/TES/@DEL='S')]"/>
            <xsl:apply-templates select="$doc/ROOT/TES[not(@DEL = 'S')]"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="TES">
        <xsl:copy-of select="."/>
    </xsl:template>

</xsl:stylesheet>

The first element within atualiza.xsl is

    <xsl:output indent="yes"/>

which generates a meshed result. You can remove it if you wish. The following element:

    <xsl:param name="arquivo">atualizacao.xml</xsl:param>

gets the <xsl:stylesheet> parameter you pass via C #. If you do not pass the parameter for some reason it will default to arquivo .

The next element

<xsl:variable name="doc" select="document($arquivo)" />

loads the document and, if found, assigns a constant atualizacao.xml (which you can use through the document as doc ).

The document contains two $doc templates where transformations occur. The second template:

<xsl:template match="TES">
    <xsl:copy-of select="."/>
</xsl:template>

simply copy the entire node with attributes and content. It is only called when a <xsl:template> element is being processed (it does not restrict where this node is located, in the source file or the other).

The first template matches the <TES> node. It will be ROOT of <ROOT> and will be called automatically. The fonte.xml element copies this node (will produce <copy> ). Within the node there are two <ROOT>...</ROOT> calls that contain XPath expressions. They will choose what will be placed within xsl:apply-templates .

The first XPath:

TES[not($doc/ROOT/TES/@IDTES=@IDTES and $doc/ROOT/TES/@IDPES=@IDPES and $doc/ROOT/TES/@DEL='S')]

is relative a <ROOT> (refers to document <ROOT> ) and selects all fonte.xml elements except elements that have <TES> and @IDTES equal to the corresponding attributes of a @IDPES of document TES ( atualizacao.xml ) that also has $doc/ROOT/TES ( DEL='S' ) attribute. This way it goes through all the elements and does not copy to the source tree the ones that should be removed.

The second XPath

$doc/ROOT/TES[not(@DEL = 'S')]

only acts on the document $doc/ROOT/TES/@DEL='S' ( atualizacao.xml ), copying to the result tree only the nodes that do not have attribute $doc .

Information on C # XSLT transformation classes:

More information about XSLT

  • The XSLT Specification contains everything, but version 2.0 is still poorly supported.
  • I wrote a Tutorial XSLT 1.0 in Portuguese in 1998 and updated in 2007. It is already outdated again, but it is useful if you are interested in understanding XSLT better.
  • There is also a fiddle environment for XSLT: link where you can test your code (there are some limitations.)
17.04.2014 / 03:37
1

After some testing, I got the following results:

For a base XML of 53MB and a XML of update of 45KB

  • Using the solution with XslCompiledTransform takes the time of 5 min. to generate the new file
  • Using the solution with XDocument takes the time of 13 seconds to generate the new file

For a base XML of 45KB and a XML of update of 53MB

  • Using the solution with XslCompiledTransform takes the time of 16 min. to generate the new file
  • Using the solution with XDocument takes the time of 13 seconds to generate the new file

For both XML with 53MB

  • Using the solution with XslCompiledTransform took the time of more than 1 hour and canceled
  • Using the solution with XDocument takes the time of 20 seconds to generate the new file

In this way I changed the correct answer as being that of Iuri, since in my case the project proved viable thanks to this solution.

    
22.04.2014 / 19:39