Altova Mailing List Archives


Re: Fw: [xsl] Question on duplicate node elimination

From: Hermann Stamm-Wilbrandt <STAMMW@--.---.--->
To: xsl-list@-----.------------.---
Date: 8/30/2010 6:54:00 PM
This is a question on "pointers" in XSLT.


The sample ancestor3.xml [1] is demonstration for nodes "//*" and
"ids($nodes/ancestor::*)". This excludes the root node.

ancestor4.xml [2] demonstrates "ids($nodes/ancestor::node())" and
nodes "/|//*" (includes root node).


This is the modified key definition needed by dupelim4.xsl [3]:
  <xsl:key name="nodes-by-genid" match="/" use="generate-id()"/>
  <xsl:key name="nodes-by-genid" match="node()" use="generate-id()"/>

By this definition every of the seven node types in the XML data
model [4] is covered.


Having an id-node-set of the form
  <id>some_id_1</id>
  <id>some_id_2</id>
  ...
  <id>some_id_k</id>

as in [3] allows to (efficiently) "address" the represented nodes in the
XML tree by the key() function.
And every node-set can be represented by such an id-node-set.

Result tree fragments of id-node-sets can be converted to id-node-sets
by the exslt:node-set() function as in [3].
This allows for iteratively generating new id-node-sets.


I did a quick search for "XSLT pointer" and found hits for pointers in
C-implementations of XSLT processors or for "XPointer".


Can representing the current node by
  <id><xsl:value-of select="generate-id()"/></id>

in conjuntion with "bulk" conversion to corresponding (real) node-set by
  "key('nodes-by-genid',exslt:node-set($nodes)/id)"

for id-node-set $nodes be considered as XSLT "pointer" representation of
the current node as in C?


[1] http://stamm-wilbrandt.de/en/xsl-list/ancestor3.xml
[2] http://stamm-wilbrandt.de/en/xsl-list/ancestor4.xml
[3] http://stamm-wilbrandt.de/en/xsl-list/dupelim4.xsl
[4] http://www.w3.org/TR/xpath/#data-model


Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Developer, XML Compiler, L3
WebSphere DataPower SOA Appliances
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



From:       Hermann Stamm-Wilbrandt/Germany/IBM@IBMDE
To:         xsl-list@l...
Date:       08/27/2010 03:00 PM
Subject:    Re: Fw: [xsl] Question on duplicate node elimination



Michael,

> ... Instead, whenever you
> are evaluating an operation that returns a node-set, represent that
> node-set as a string containing the generate-id values of the nodes in
> the node-set, space-separated. Elimination of duplicates then reduces to
> an operation on strings: not trivial, but not especially difficult
> either.

yesterdays solution [1] based on id() function was working good.


But I thought again and below single file solution based on applying
key() function twice for duplicate elimination is much better:
* does not need any separately created structure (like idcopy in [1])
* is really short, just a few lines (not counting comments)
* works on ALL major browsers (IE support by David Carlisle's trick [4])

Below are
* execution by xsltproc
* listing of dupelinm3.xsl [2]
* listing of ancestor.xml [3] (open that in browser)


$ xsltproc dupelim3.xsl ancestor3.xml
<html><pre><h2>Duplicate node elimination by applying key() function
twice</h2>
    See <a href="dupelim3.xsl">dupelim3.xsl</a> for details.
    Tested to work with these browsers:
      Chrome
      Firefox
      Internet Explorer
      Opera
      Safari
    (clicking reload shows different ids)


ids(//*)
a      id2619817
+-b    id2619788
! +-c  id2619830
! +-c  id2619802
+-b    id2619245
! +-c  id2619317
! +-c  id2619321
<hr>
ids(//c):
<id>id2619830</id><id>id2619802</id><id>id2619317</id><id>id2619321</id>
<hr>
nodes="ids(//c)"<br>ids($nodes/ancestor::*):
<id>id2619817</id><id>id2619788</id><id>id2619245</id>
</pre></html>
$
$ cat dupelim3.xsl
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:exslt="http://exslt.org/common"
  xmlns:msxsl="urn:schemas-microsoft-com:xslt"
  exclude-result-prefixes="exslt msxsl"
>
  <xsl:output method="html"/>

  <xsl:key name="nodes-by-genid" match="node()" use="generate-id()"/>


  <xsl:template match="/">

    <!--
         initial node-set sample, represented by <id> nodes
    -->
    <xsl:variable name="nodes">
      <xsl:for-each select="//c">
        <id><xsl:value-of select="generate-id()"/></id>
      </xsl:for-each>
    </xsl:variable>


    <!--
         do ancestor location step
    -->
    <xsl:variable name="result">
      <!--
           application of "ancestor::*" on $nodes;
           $aux
might contain duplicate id nodes
      -->
      <xsl:variable name="aux">
        <!--
             use key() function to determine real nodes
-->
        <xsl:for-each select="key('nodes-by-genid',exslt:node-set
($nodes)/id)">
          <!--
              location step on each real node
          -->
          <xsl:for-each select="ancestor::*">
            <!--
                generate <id>s for new nodes
            -->
            <id><xsl:value-of select="generate-id()"/></id>
          </xsl:for-each>
        </xsl:for-each>
      </xsl:variable>

      <!--
           use key() function for duplicate elimination
      -->
      <xsl:for-each select="key('nodes-by-genid',exslt:node-set($aux)/id)">
        <!--
            generate <id>s, now for unique new nodes
        -->
        <id><xsl:value-of select="generate-id()"/></id>
      </xsl:for-each>
    </xsl:variable>


<html><pre>
    <h2>Duplicate node elimination by applying key() function twice</h2>
    See <a href="dupelim3.xsl">dupelim3.xsl</a> for details.
    Tested to work with these browsers:
      Chrome
      Firefox
      Internet Explorer
      Opera
      Safari
    (clicking reload shows different ids)

    <!-- node name vs genid output -->
    <xsl:text>&#10;ids(//*)</xsl:text>
    <xsl:for-each select="//*">
      <xsl:value-of select=
        "concat('&#10;',substring('! +-',5-2*count(ancestor::*)),name(),
         substring('    ',1+2*count(ancestor::*)),'  ',generate-id())"/>
    </xsl:for-each>
    <xsl:text>&#10;</xsl:text><hr/><xsl:text>&#10;</xsl:text>

    <!-- for verification -->
    <xsl:text>ids(//c): </xsl:text>
    <xsl:copy-of select="$nodes"/>
    <xsl:text>&#10;</xsl:text><hr/><xsl:text>&#10;</xsl:text>

    <!-- output of result -->
    <xsl:text>nodes="ids(//c)"</xsl:text><br/>
    <xsl:text>ids($nodes/ancestor::*): </xsl:text>
    <xsl:copy-of select="$result"/>

    <xsl:text>&#10;</xsl:text>
</pre></html>

  </xsl:template>


<!--
  from
http://dpcarlisle.blogspot.com/2007/05/exslt-node-set-function.html
-->
<msxsl:script language="JScript" implements-prefix="exslt">
 this['node-set'] =  function (x) {
  return x;
  }
</msxsl:script>

</xsl:stylesheet>
$
$ cat ancestor3.xml
<?xml-stylesheet href="dupelim3.xsl" type="text/xsl"?>
<a>
  <b>
    <c>1</c>
    <c>2</c>
  </b>
  <b>
    <c>3</c>
    <c>4</c>
  </b>
</a>

$


[1]
http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/201008/msg00291.html

[2] http://stamm-wilbrandt.de/en/xsl-list/ancestor3.xml
[3] http://stamm-wilbrandt.de/en/xsl-list/dupelim3.xml
[4] http://dpcarlisle.blogspot.com/2007/05/exslt-node-set-function.html


Mit besten Gruessen / Best wishes,

Hermann Stamm-Wilbrandt
Developer, XML Compiler, L3
WebSphere DataPower SOA Appliances
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



From:       Michael Kay <mike@s...>
To:         xsl-list@l...
Date:       08/24/2010 02:17 PM
Subject:    Re: Fw: [xsl] Question on duplicate node elimination



  I haven't understood your logic in any detail, but I wonder if it
suggests an alternative approach to the problem: namely, avoid creating
RTFs entirely, at least for intermediate results. Instead, whenever you
are evaluating an operation that returns a node-set, represent that
node-set as a string containing the generate-id values of the nodes in
the node-set, space-separated. Elimination of duplicates then reduces to
an operation on strings: not trivial, but not especially difficult either.

Michael Kay
Saxonica

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe@l...>
--~--




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe@l...>
--~--




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe@l...>
--~--

Disclaimer

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.