![]() |
![]() | ![]() | ![]() | Altova Mailing List Archives>Archive Index >microsoft.public.xsl Archive Home >Recent entries >Thread Prev - Re: Most performant way to sort large files [Thread Next] Re: Most performant way to sort large filesTo: NULL Date: 8/2/2007 8:55:00 PM
"jeh" <jeh@d...> wrote in message
news:A9A37A0D-65D9-4104-9C96-F210A4BCD81A@m......
> Hey,
>
> Thanks! That's alot faster. There are however two things that I would like
> to follow up on here.
>
> 1st: The files differ greatly in size, before sort it's about 15MB and
> after
> it's roughly 7, what happened seems to be it removing all linebreaks,
> whitespace and such. What would I have to do to have (at least)
> linebreaks? I
> don't really need indention, but that would be nice as well. Linebreaks
> however is causing me to not be able to opening it in for example Visual
> Studios Xml editor, since it's too much on one line.
Use the
<xsl:output .../>
directive. Specify
indent="yes"
>
> 2nd: When researching the fastest sort algorithm I got the impression that
> using keys would be way faster, why isnt it, and when would I introduce
> keys
> as opposed to the simple sort statement used here? Is it when complexity
> grow
> to more then one sort statement, or when I introduce grouping (two levels
> of
> sorting)? Or is it just an "old" approach?
Keys are useful for getting quick access to a *subset* of nodes in an xml
file (typically when the nodes should be accessed more than once) -- such
as grouping or implementing a "link" between two nodes.
Sort is O(N*log(N)) and cannot be made significantly faster if one of the
fastest sorting algorithms is already used.
Cheers,
Dimitre Novatchev
>
> Thanks,
> John
>
> "Dimitre Novatchev" wrote:
>
>> Usually the simplest way is the fastest. Using this code is 10 times
>> faster
>> (1.6sec vs 16 sec) when transformed with .NET XSLCompiledTransform and 2
>> times faster (16se vs 32 sec) when transformed with MSXML4:
>>
>>
>> <Level1>
>> <xsl:for-each select="Level2">
>> <xsl:sort select="@UseThisToSort" data-type="text"
>> order="ascending"
>> />
>> <xsl:copy-of select="." />
>> </xsl:for-each>
>> </Level1>
>>
>>
>> Cheers,
>> Dimitre Novatchev
>>
>> "jeh" <jeh@d...> wrote in message
>> news:7BC89F8E-D8E3-4327-8A8B-88E9C83CE90C@m......
>> > Hi all,
>> >
>> > I have been testing a few xsl sorting statements - I want to output the
>> > same
>> > file, as I input, no direct transformations, just sorted. My files are
>> > (can
>> > be) large.. somewhere in the neighbourhood of 12MB (and sometimes
>> > more) -
>> > which makes them somewhere around 50k rows depending on the complexity
>> > of
>> > the
>> > different documents.
>> >
>> > The issue that I am having is that my xsl statement is fast on small
>> > documents but not so much so when they grow larger - a 12MB document
>> > takes
>> > about 10 minutes in the test case presented below, and even more so
>> > when I
>> > have larger more complex actual structures. The sample below is
>> > simplified
>> > to
>> > act as just that. The xsl is "real" though.
>> >
>> > What is the most performant way of sorting a document? What can be
>> > improved
>> > in my xsl below? With performant I mean it to be fast as well as
>> > preferably
>> > light on the memory. Below I use keys, but in my understanding that's
>> > quite
>> > memory intensive - what are the tradeoffs between different approaches?
>> >
>> > Oh, and I am looking to use this in BizTalk, so I'm first and foremost
>> > interested in 1.0 interpreter and .Net compatible solutions. Other
>> > solutions
>> > are however more then welcome - I'd be glad to get that perspective as
>> > well.
>> >
>> > TIA,
>> > John
>> >
>> > ---------
>> > My xsl:
>> > ---------
>> > <?xml version="1.0" encoding="utf-16"?>
>> > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>> > xmlns:msxsl="urn:schemas-microsoft-com:xslt"
>> > xmlns:var="http://schemas.microsoft.com/BizTalk/2003/var"
>> > exclude-result-prefixes="msxsl var" version="1.0"
>> > xmlns:ns0="http://BizTalkXsltSortTest.SimpleSample.Schema1">
>> > <xsl:output omit-xml-declaration="yes" method="xml" version="1.0" />
>> > <xsl:key name="sort-key" match="Level2" use="@UseThisToSort" />
>> > <xsl:template match="/">
>> > <xsl:apply-templates select="/ns0:Root" />
>> > </xsl:template>
>> > <xsl:template match="/ns0:Root">
>> > <ns0:Root>
>> > <xsl:apply-templates select="Level1" />
>> > </ns0:Root>
>> > </xsl:template>
>> > <xsl:template match="Level1">
>> > <Level1>
>> > <xsl:for-each select="Level2[count(. | key('sort-key',
>> > @UseThisToSort)[1]) = 1]">
>> > <xsl:sort select="@UseThisToSort" data-type="text"
>> > order="ascending"
>> > />
>> > <xsl:for-each select="key('sort-key', @UseThisToSort)">
>> > <xsl:copy-of select="." />
>> > </xsl:for-each>
>> > </xsl:for-each>
>> > </Level1>
>> > </xsl:template>
>> > </xsl:stylesheet>
>> >
>> > ---------
>> > Sample xml:
>> > ---------
>> > <ns0:Root xmlns:ns0="http://BizTalkXsltSortTest.SimpleSample.Schema1">
>> > <Level1>
>> > <Level2 UseThisToSort="20010201" MiscData1="MiscData1_1"
>> > MiscData2="MiscData2_2" />
>> > <Level2 UseThisToSort="20010101" MiscData1="MiscData1_1"
>> > MiscData2="MiscData2_2" />
>> > <Level2 UseThisToSort="20010102" MiscData1="MiscData1_1"
>> > MiscData2="MiscData2_2" />
>> > <Level2 UseThisToSort="20010101" MiscData1="MiscData1_1"
>> > MiscData2="MiscData2_2" />
>> > <Level2 UseThisToSort="20010101" MiscData1="MiscData1_1"
>> > MiscData2="MiscData2_2" />
>> > <Level2 UseThisToSort="20010201" MiscData1="MiscData1_1"
>> > MiscData2="MiscData2_2" />
>> > <Level2 UseThisToSort="20010101" MiscData1="MiscData1_1"
>> > MiscData2="MiscData2_2" />
>> > </Level1>
>> > </ns0:Root>
>> >
>> > ---------
>> > Schema:
>> > ---------
>> > <?xml version="1.0" encoding="utf-16"?>
>> > <xs:schema xmlns:b="http://schemas.microsoft.com/BizTalk/2003"
>> > xmlns="http://BizTalkXsltSortTest.SimpleSample.Schema1"
>> > targetNamespace="http://BizTalkXsltSortTest.SimpleSample.Schema1"
>> > xmlns:xs="http://www.w3.org/2001/XMLSchema">
>> > <xs:element name="Root">
>> > <xs:complexType>
>> > <xs:sequence>
>> > <xs:element minOccurs="0" maxOccurs="unbounded" name="Level1">
>> > <xs:complexType>
>> > <xs:sequence>
>> > <xs:element minOccurs="0" maxOccurs="unbounded"
>> > name="Level2">
>> > <xs:complexType>
>> > <xs:attribute name="UseThisToSort" type="xs:string" />
>> > <xs:attribute name="MiscData1" type="xs:string" />
>> > <xs:attribute name="MiscData2" type="xs:string" />
>> > </xs:complexType>
>> > </xs:element>
>> > </xs:sequence>
>> > </xs:complexType>
>> > </xs:element>
>> > </xs:sequence>
>> > </xs:complexType>
>> > </xs:element>
>> > </xs:schema>
>>
>>
>>
| ![]() | ![]() | ![]() |
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | |||||
|
