Altova Mailing List Archives


[xsl] Re: Re: Re: Re: Unbounded element grouping/concatenation

From: "Dimitre Novatchev" <dnovatchev@--------->
To:
Date: 12/12/2003 12:34:00 AM
"Gupta, Raman K [CI]" <raman.k.gupta@xxxxxxxxxxxxx> wrote in message
news:C0CB45C72DDE2A4FA32370AB65CAEC8F059673@xxxxxxxxxxxxxxxxxxxxxxxxxx

[Nice timing results omitted]

> So all of the methods except recursion exhibit exponential
> behavior with an increase in continuation records except
> recursion.
>
> So if there is a way to do solve this problem recursively
> that would avoid the stack overflow errors (2.5.2 doesn't
> even do you the courtesy of throwing an exception -- it
> just dies in the middle of the transformation), that would
> still be by far the best solution.
>

Hi Raman,

I have both bad and good news for you.

The Bad News:
              The recursive algorithm isn't the fastest.

The Good News:
              I am enclosing here the code implementing the fastest (that I
know of) algorithm, anf it is non-recursive.


The idea is to get a string with the positions of all "record" nodes with
type="normal" among all "record nodes", then for a record with type="normal"
find its position in the string and also the position of the next record
with type="normal" The difference in these two positions - 1 is the number
of intermediate record nodes with type="continuation".

Here's the XSLT code:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 >

  <xsl:output omit-xml-declaration="yes" indent="yes"/>

  <xsl:variable name="vposArray">
    <xsl:value-of select="'|'"/>
    <xsl:for-each select="/*/record">
      <xsl:if test="@type = 'normal'">
        <xsl:value-of select="concat(position(), '|')"/>
      </xsl:if>
    </xsl:for-each>
  </xsl:variable>

  <xsl:template match="@* | node()" name="identity">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="records">
    <records>
      <xsl:apply-templates select="record"/>
    </records>
  </xsl:template>

  <xsl:template match="record">
    <xsl:choose>
      <xsl:when test="not(@type='normal')">
        <xsl:call-template name="identity"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:variable name="vPos" select="position()"/>
        <xsl:variable name="vposNext"
        select="substring-before(
                          substring-after($vposArray,
                                          concat('|',
                                                 position(),
                                                 '|'
                                                 )
                                          ),
                          '|'
                                )"/>
         <xsl:variable name="vNumNested"
           select="$vposNext - position() - 1"/>
         <xsl:copy>
           <xsl:copy-of select="@* | node()"/>
           <xsl:if test="$vNumNested > 0">
             <xsl:copy-of select=
               "following-sibling::record
                           [position() &lt;= $vNumNested]"/>
           </xsl:if>
         </xsl:copy>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

  <xsl:template match="record[not(@type='normal')]"/>
</xsl:stylesheet>


I also measured the time it takes with the three different algorithms
(recursive, with keys, non-recursive) with two different sorce xml
documents -- one having 1000 consecutive record nodes with
type="continuation", the other with 2000 such nodes. They look like this:

<records>
   <record n="1" type="normal"/>
   <record n="2" type="normal"/>
   <record n="3" type="continuation"/>
   <record n="4" type="continuation"/>
   <record n="5" type="continuation"/>
   <record n="6" type="continuation"/>
   <record n="7" type="continuation"/>
 ..............................................................
   <record n="1000" type="continuation"/>
   <record n="1001" type="continuation"/>
   <record n="1002" type="continuation"/>
   <record n="1003" type="normal"/>
   <record n="1004" type="normal"/>
</records>

and

<records>
   <record n="1" type="normal"/>
   <record n="2" type="normal"/>
   <record n="3" type="continuation"/>
   <record n="4" type="continuation"/>
   <record n="5" type="continuation"/>
   <record n="6" type="continuation"/>
   <record n="7" type="continuation"/>
 ..............................................................
   <record n="2000" type="continuation"/>
   <record n="2001" type="continuation"/>
   <record n="2002" type="continuation"/>
   <record n="2003" type="normal"/>
   <record n="2004" type="normal"/>
</records>


I tested the following XSLT processors: MSXML4, Saxon6.5.3, JD1.5.2,
XalanJ2.4.1, XalanC1.5

I also wanted to test a forth algorithm using generate-id(). It took MSXML4
233157 milliseconds on the 1000 nodes document and I had to kill it after 30
minutes with the 2000 nodes document. I did not attempt to test this with
the rest of the XSLT processors.

Here are the results:


Method                    MSXML4       Saxon               JD
XalanJ            XalanC
=============================================================
Recursive:
    1000 nodes                     37     Stack Ovfl.        150
Stack Ovfl.               170
    2000 nodes                   519     Stack Ovfl.   Stack Ovfl.   Stack
Ovfl.               621

Keys:
    1000 nodes                     44            1783           791
4677              1472
    2000 nodes                 2211            6890         2654
17466              5818

Non-Recursive:
    1000 nodes                    5.9                50           100
1152                   10
    2000 nodes                12.09              101           120
1342                   40



The non-recursive algorithm exhibits linear behaviour with MSXML4 and Saxon
and sub-linear! one with JD, XalanJ and XalanC.

The recursive and key-based algorithm exhibit quadratic behaviour.

XalanJ is clearly not in the same company as the rest of the tested
processors.


I could try still speeding up the non-recursive algorithm, by using a faster
search than linear to find the position of a record node in the string with
positions -- this will require that all positions must have the same (some
maximum) length. Or I could record the positions in a node-set, for which
binary search is straight-forward.

In case you are still not satisfied with the speed of the non-recursive
algorithm, just let me know :o)

Hope this helped.


=====
Cheers,

Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL









 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Disclaimer

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.