Altova Mailing List Archives


Re: [xsl] comparing nodesets to each other

From: "Kai Hackemesser" <kaha@------>
To:
Date: 4/11/2005 7:47:00 PM
Hello, Aron,

I try to be more exact in my definition:
- two nodes 'relation' are different, if they have the same value in
relation/Attribute[@Name='FindNumber']/Value but the text value of both
node's children at all is different.
- a 'relation' node must be listed, too, if there is no corresponding
'relation' node with same relation/Attribute[@Name='FindNumber']/Value
- I need to know in which list a node is changed/added/removed.
- The whole list of changes needs to be sorted by the
Attribute[@Name='FindNumber']/Value

Regards, Kai

> Kai,
> 
> IMO the general problem of finding the differences between any 2 XML 
> documents is, shall we say, challenging.  Something that helps such an 
> operation is being extremely precise about what constitutes a difference, 
> and being able to formulate precedence rules in comparision operations. 
> An 
> earlier respondent illustrated the need for this with an example that 
> "added" a node in the second document.  It's very likely *you* have a good
> idea of what you're after, but in these types of problems you'll get the 
> most help if you can express your "rules for comparision" in [formal] 
> written form.
> 
> Consider the following documents:
> 
> doc1.xml
> =======
> <doc>
> <chapter n="1"/>
> <chapter n="2"/>
> </doc>
> 
> doc2.xml
> =======
> <doc>
> <chapter n="1"/>
> <chapter n="2">
>   <para n="1"/>
> </chapter>
> </doc>
> 
> What *exactly* would you like in your final output?  Do you want to see
> only 
> the node <para n="1"/>?  Do you want to see <para n="1"/> and all its
> parent 
> nodes?  You see where this is going?  It helps to be precise.
> 
> Also, while writing a "general" differencing algorithm would be
> worthwhile, 
> it's probably not simple.  To start you'll have better luck if you
> constrain 
> your problem, as it relates to your domain.  One way to do this is by 
> identifying a least granular level for your purposes--perhaps a node or 
> "level" below which identifying differences is superfluous.  In the
> example 
> above, you could say:
> 
> --chapter nodes are compared by their "n" attribute
> --if there are any differences betweein 2 <chapter> nodes or any of their 
> descendents, the entire <chapter> node is considered "changed", and that
> of 
> doc2.xml is output
> 
> I've done this type of "constrained" comparision with success.
> 
> Here's another approach to consider: preprocess each xml document to a 
> "standard" format, then use a textual diff tool.  The idea here is that
> you 
> apply an XSL transform to doc1.xml so that <chapter> nodes are sequential,
> their descendents are ordered is a specific way, etc.  Do the same with 
> doc2.xml.  Then use a diff tool ( eg: beyondcompare, from 
> http://www.scootersoftware.com/ ) to check differences.  Note, this method
> is susceptible to line-breaks, so it's not trivial to implement.
> 
> Regards
> 
> --A
> 
> 
> 
> >From: "Kai Hackemesser" <kaha@xxxxxx>
> >Reply-To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> >To: xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> >Subject: Re: [xsl] comparing nodesets to each other
> >Date: Mon, 11 Apr 2005 18:18:47 +0200 (MEST)
> >
> >Hello, David,
> >
> >Thanks for the response. The errors you mentioned already have happened,
> >that's why I'm currently clueless how to solve it.
> >
> >I try to show the structure of the recipe (eased):
> >
> ><object>
> >   <relation>
> >     <Attribute Type="string" Name="FindNumber">
> >       <Value><![CDATA[0005]]></Value>
> >     <Attribute>
> >     <Attribute Type="float" Name="...
> >     <object>
> >       <Attribute Type="string" Name="PartNumber">
> >         <Value><![CDATA[Part1]]></Value>
> >       </Attribute>
> >     </object>
> >   </relation>
> >   <relation>
> >     <Attribute Type="string" Name="FindNumber">
> >       <Value><![CDATA[0010]]></Value>
> >     <Attribute>
> >     <Attribute Type="float" Name="...
> >     <object>
> >       <Attribute Type="string" Name="PartNumber">
> >         <Value><![CDATA[Part2]]></Value>
> >       </Attribute>
> >     </object>
> >   </relation>
> >   <relation>
> >     <Attribute Type="string" Name="FindNumber">
> >       <Value><![CDATA[0015]]></Value>
> >     <Attribute>
> >     <Attribute Type="float" Name="...
> >     <object>
> >       <Attribute Type="string" Name="PartNumber">
> >         <Value><![CDATA[Part3]]></Value>
> >       </Attribute>
> >     </object>
> >   </relation>
> ></object>
> >
> >needs to be compared against a similar structure:
> ><object>
> >   <relation>
> >     <Attribute Type="string" Name="FindNumber">
> >       <Value><![CDATA[0005]]></Value>
> >     <Attribute>
> >     <Attribute Type="float" Name="...
> >     <object>
> >       <Attribute Type="string" Name="PartNumber">
> >         <Value><![CDATA[Part1]]></Value>
> >       </Attribute>
> >     </object>
> >   </relation>
> >   <relation>
> >     <Attribute Type="string" Name="FindNumber">
> >       <Value><![CDATA[0015]]></Value>
> >     <Attribute>
> >     <Attribute Type="float" Name="...
> >     <object>
> >       <Attribute Type="string" Name="PartNumber">
> >         <Value><![CDATA[Part3b]]></Value>
> >       </Attribute>
> >     </object>
> >   </relation>
> ></object>
> >
> >(Attribute nodes are more than one per object or relation node)
> >
> >So I need to extract all differences like attribute change, missing
> nodes,
> >altered nodes, added nodes. To identify a node I use the findnumber
> >Attribute node of each relation node. A missing node is one, where the
> >corresponding Findnumber Attribute value is missing in nodelist 'b'. An
> >added node is one where the corresponding Findnumber Attribute value is
> >missing in nodelist 'a'. An altered node means the Findnumber Attribute
> >value is there in bothe nodelists, but the Attribute nodes or the
> >object/Attribute nodes are different. I think a simple text compare would
> >be
> >enough for the test of alternation.
> >
> >Regards,
> >Kai
> >
> 
> _________________________________________________________________
> Dont just search. Find. Check out the new MSN Search! 
> http://search.msn.click-url.com/go/onm00200636ave/direct/01/

Disclaimer

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.