Altova Mailing List Archives>Archive Index >microsoft.public.xsl Archive Home >Recent entries >Thread Prev - Re: Most performant way to sort large files >Thread Next - Re: Most performant way to sort large files Re: Most performant way to sort large filesTo: NULL Date: 7/31/2007 10:46:00 PM
Hey,
Thanks! That's alot faster. There are however two things that I would like
to follow up on here.
1st: The files differ greatly in size, before sort it's about 15MB and after
it's roughly 7, what happened seems to be it removing all linebreaks,
whitespace and such. What would I have to do to have (at least) linebreaks? I
don't really need indention, but that would be nice as well. Linebreaks
however is causing me to not be able to opening it in for example Visual
Studios Xml editor, since it's too much on one line.
2nd: When researching the fastest sort algorithm I got the impression that
using keys would be way faster, why isnt it, and when would I introduce keys
as opposed to the simple sort statement used here? Is it when complexity grow
to more then one sort statement, or when I introduce grouping (two levels of
sorting)? Or is it just an "old" approach?
Thanks,
John
"Dimitre Novatchev" wrote:
> Usually the simplest way is the fastest. Using this code is 10 times faster
> (1.6sec vs 16 sec) when transformed with .NET XSLCompiledTransform and 2
> times faster (16se vs 32 sec) when transformed with MSXML4:
>
>
> <Level1>
> <xsl:for-each select="Level2">
> <xsl:sort select="@UseThisToSort" data-type="text" order="ascending"
> />
> <xsl:copy-of select="." />
> </xsl:for-each>
> </Level1>
>
>
> Cheers,
> Dimitre Novatchev
>
> "jeh" <jeh@d...> wrote in message
> news:7BC89F8E-D8E3-4327-8A8B-88E9C83CE90C@m......
> > Hi all,
> >
> > I have been testing a few xsl sorting statements - I want to output the
> > same
> > file, as I input, no direct transformations, just sorted. My files are
> > (can
> > be) large.. somewhere in the neighbourhood of 12MB (and sometimes more) -
> > which makes them somewhere around 50k rows depending on the complexity of
> > the
> > different documents.
> >
> > The issue that I am having is that my xsl statement is fast on small
> > documents but not so much so when they grow larger - a 12MB document takes
> > about 10 minutes in the test case presented below, and even more so when I
> > have larger more complex actual structures. The sample below is simplified
> > to
> > act as just that. The xsl is "real" though.
> >
> > What is the most performant way of sorting a document? What can be
> > improved
> > in my xsl below? With performant I mean it to be fast as well as
> > preferably
> > light on the memory. Below I use keys, but in my understanding that's
> > quite
> > memory intensive - what are the tradeoffs between different approaches?
> >
> > Oh, and I am looking to use this in BizTalk, so I'm first and foremost
> > interested in 1.0 interpreter and .Net compatible solutions. Other
> > solutions
> > are however more then welcome - I'd be glad to get that perspective as
> > well.
> >
> > TIA,
> > John
> >
> > ---------
> > My xsl:
> > ---------
> > <?xml version="1.0" encoding="utf-16"?>
> > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> > xmlns:msxsl="urn:schemas-microsoft-com:xslt"
> > xmlns:var="http://schemas.microsoft.com/BizTalk/2003/var"
> > exclude-result-prefixes="msxsl var" version="1.0"
> > xmlns:ns0="http://BizTalkXsltSortTest.SimpleSample.Schema1">
> > <xsl:output omit-xml-declaration="yes" method="xml" version="1.0" />
> > <xsl:key name="sort-key" match="Level2" use="@UseThisToSort" />
> > <xsl:template match="/">
> > <xsl:apply-templates select="/ns0:Root" />
> > </xsl:template>
> > <xsl:template match="/ns0:Root">
> > <ns0:Root>
> > <xsl:apply-templates select="Level1" />
> > </ns0:Root>
> > </xsl:template>
> > <xsl:template match="Level1">
> > <Level1>
> > <xsl:for-each select="Level2[count(. | key('sort-key',
> > @UseThisToSort)[1]) = 1]">
> > <xsl:sort select="@UseThisToSort" data-type="text"
> > order="ascending"
> > />
> > <xsl:for-each select="key('sort-key', @UseThisToSort)">
> > <xsl:copy-of select="." />
> > </xsl:for-each>
> > </xsl:for-each>
> > </Level1>
> > </xsl:template>
> > </xsl:stylesheet>
> >
> > ---------
> > Sample xml:
> > ---------
> > <ns0:Root xmlns:ns0="http://BizTalkXsltSortTest.SimpleSample.Schema1">
> > <Level1>
> > <Level2 UseThisToSort="20010201" MiscData1="MiscData1_1"
> > MiscData2="MiscData2_2" />
> > <Level2 UseThisToSort="20010101" MiscData1="MiscData1_1"
> > MiscData2="MiscData2_2" />
> > <Level2 UseThisToSort="20010102" MiscData1="MiscData1_1"
> > MiscData2="MiscData2_2" />
> > <Level2 UseThisToSort="20010101" MiscData1="MiscData1_1"
> > MiscData2="MiscData2_2" />
> > <Level2 UseThisToSort="20010101" MiscData1="MiscData1_1"
> > MiscData2="MiscData2_2" />
> > <Level2 UseThisToSort="20010201" MiscData1="MiscData1_1"
> > MiscData2="MiscData2_2" />
> > <Level2 UseThisToSort="20010101" MiscData1="MiscData1_1"
> > MiscData2="MiscData2_2" />
> > </Level1>
> > </ns0:Root>
> >
> > ---------
> > Schema:
> > ---------
> > <?xml version="1.0" encoding="utf-16"?>
> > <xs:schema xmlns:b="http://schemas.microsoft.com/BizTalk/2003"
> > xmlns="http://BizTalkXsltSortTest.SimpleSample.Schema1"
> > targetNamespace="http://BizTalkXsltSortTest.SimpleSample.Schema1"
> > xmlns:xs="http://www.w3.org/2001/XMLSchema">
> > <xs:element name="Root">
> > <xs:complexType>
> > <xs:sequence>
> > <xs:element minOccurs="0" maxOccurs="unbounded" name="Level1">
> > <xs:complexType>
> > <xs:sequence>
> > <xs:element minOccurs="0" maxOccurs="unbounded"
> > name="Level2">
> > <xs:complexType>
> > <xs:attribute name="UseThisToSort" type="xs:string" />
> > <xs:attribute name="MiscData1" type="xs:string" />
> > <xs:attribute name="MiscData2" type="xs:string" />
> > </xs:complexType>
> > </xs:element>
> > </xs:sequence>
> > </xs:complexType>
> > </xs:element>
> > </xs:sequence>
> > </xs:complexType>
> > </xs:element>
> > </xs:schema>
>
>
>
| ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
