Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: Most performant way to sort large files

From: jeh@-----------.---------.---
To: NULL
Date: 7/31/2007 10:46:00 PM

Hey,

Thanks! That's alot faster. There are however two things that I would like 
to follow up on here.

1st: The files differ greatly in size, before sort it's about 15MB and after 
it's roughly 7, what happened seems to be it removing all linebreaks, 
whitespace and such. What would I have to do to have (at least) linebreaks? I 
don't really need indention, but that would be nice as well. Linebreaks 
however is causing me to not be able to opening it in for example Visual 
Studios Xml editor, since it's too much on one line.

2nd: When researching the fastest sort algorithm I got the impression that 
using keys would be way faster, why isnt it, and when would I introduce keys 
as opposed to the simple sort statement used here? Is it when complexity grow 
to more then one sort statement, or when I introduce grouping (two levels of 
sorting)? Or is it just an "old" approach?

Thanks,
John

"Dimitre Novatchev" wrote:

> Usually the simplest way is the fastest. Using this code is 10 times faster 
> (1.6sec vs 16 sec) when transformed with .NET XSLCompiledTransform and 2 
> times faster (16se vs 32 sec) when transformed with MSXML4:
> 
> 
>     <Level1>
>       <xsl:for-each select="Level2">
>         <xsl:sort select="@UseThisToSort" data-type="text" order="ascending" 
> />
>             <xsl:copy-of select="." />
>       </xsl:for-each>
>     </Level1>
> 
> 
> Cheers,
> Dimitre Novatchev
> 
> "jeh" <jeh@d...> wrote in message 
> news:7BC89F8E-D8E3-4327-8A8B-88E9C83CE90C@m......
> > Hi all,
> >
> > I have been testing a few xsl sorting statements - I want to output the 
> > same
> > file, as I input, no direct transformations, just sorted. My files are 
> > (can
> > be) large.. somewhere in the neighbourhood of 12MB (and sometimes more) -
> > which makes them somewhere around 50k rows depending on the complexity of 
> > the
> > different documents.
> >
> > The issue that I am having is that my xsl statement is fast on small
> > documents but not so much so when they grow larger - a 12MB document takes
> > about 10 minutes in the test case presented below, and even more so when I
> > have larger more complex actual structures. The sample below is simplified 
> > to
> > act as just that. The xsl is "real" though.
> >
> > What is the most performant way of sorting a document? What can be 
> > improved
> > in my xsl below? With performant I mean it to be fast as well as 
> > preferably
> > light on the memory. Below I use keys, but in my understanding that's 
> > quite
> > memory intensive - what are the tradeoffs between different approaches?
> >
> > Oh, and I am looking to use this in BizTalk, so I'm first and foremost
> > interested in 1.0 interpreter and .Net compatible solutions. Other 
> > solutions
> > are however more then welcome - I'd be glad to get that perspective as 
> > well.
> >
> > TIA,
> > John
> >
> > ---------
> > My xsl:
> > ---------
> > <?xml version="1.0" encoding="utf-16"?>
> > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> > xmlns:msxsl="urn:schemas-microsoft-com:xslt"
> > xmlns:var="http://schemas.microsoft.com/BizTalk/2003/var"
> > exclude-result-prefixes="msxsl var" version="1.0"
> > xmlns:ns0="http://BizTalkXsltSortTest.SimpleSample.Schema1">
> >  <xsl:output omit-xml-declaration="yes" method="xml" version="1.0" />
> >  <xsl:key name="sort-key" match="Level2" use="@UseThisToSort" />
> >  <xsl:template match="/">
> >    <xsl:apply-templates select="/ns0:Root" />
> >  </xsl:template>
> >  <xsl:template match="/ns0:Root">
> >    <ns0:Root>
> >      <xsl:apply-templates select="Level1" />
> >    </ns0:Root>
> >  </xsl:template>
> >  <xsl:template match="Level1">
> >    <Level1>
> >      <xsl:for-each select="Level2[count(. | key('sort-key',
> > @UseThisToSort)[1]) = 1]">
> >        <xsl:sort select="@UseThisToSort" data-type="text" 
> > order="ascending"
> > />
> >        <xsl:for-each select="key('sort-key', @UseThisToSort)">
> >            <xsl:copy-of select="." />
> >          </xsl:for-each>
> >      </xsl:for-each>
> >    </Level1>
> >  </xsl:template>
> > </xsl:stylesheet>
> >
> > ---------
> > Sample xml:
> > ---------
> > <ns0:Root xmlns:ns0="http://BizTalkXsltSortTest.SimpleSample.Schema1">
> >  <Level1>
> >    <Level2 UseThisToSort="20010201" MiscData1="MiscData1_1"
> > MiscData2="MiscData2_2" />
> >    <Level2 UseThisToSort="20010101" MiscData1="MiscData1_1"
> > MiscData2="MiscData2_2" />
> >    <Level2 UseThisToSort="20010102" MiscData1="MiscData1_1"
> > MiscData2="MiscData2_2" />
> >    <Level2 UseThisToSort="20010101" MiscData1="MiscData1_1"
> > MiscData2="MiscData2_2" />
> >    <Level2 UseThisToSort="20010101" MiscData1="MiscData1_1"
> > MiscData2="MiscData2_2" />
> >    <Level2 UseThisToSort="20010201" MiscData1="MiscData1_1"
> > MiscData2="MiscData2_2" />
> >    <Level2 UseThisToSort="20010101" MiscData1="MiscData1_1"
> > MiscData2="MiscData2_2" />
> >  </Level1>
> > </ns0:Root>
> >
> > ---------
> > Schema:
> > ---------
> > <?xml version="1.0" encoding="utf-16"?>
> > <xs:schema xmlns:b="http://schemas.microsoft.com/BizTalk/2003"
> > xmlns="http://BizTalkXsltSortTest.SimpleSample.Schema1"
> > targetNamespace="http://BizTalkXsltSortTest.SimpleSample.Schema1"
> > xmlns:xs="http://www.w3.org/2001/XMLSchema">
> >  <xs:element name="Root">
> >    <xs:complexType>
> >      <xs:sequence>
> >        <xs:element minOccurs="0" maxOccurs="unbounded" name="Level1">
> >          <xs:complexType>
> >            <xs:sequence>
> >              <xs:element minOccurs="0" maxOccurs="unbounded" 
> > name="Level2">
> >                <xs:complexType>
> >                  <xs:attribute name="UseThisToSort" type="xs:string" />
> >                  <xs:attribute name="MiscData1" type="xs:string" />
> >                  <xs:attribute name="MiscData2" type="xs:string" />
> >                </xs:complexType>
> >              </xs:element>
> >            </xs:sequence>
> >          </xs:complexType>
> >        </xs:element>
> >      </xs:sequence>
> >    </xs:complexType>
> >  </xs:element>
> > </xs:schema> 
> 
> 
> 


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent