Altova Mailing List Archives


Re: [xsl] for-each-group grouping accented versions of letters

From: "Imsieke, Gerrit, le-tex" <gerrit.imsieke@------.-->
To: xsl-list@-----.------------.---
Date: 4/21/2012 1:03:00 AM
You can strip the accents by unicode decomposition and then removing the 
diacritical marks:

<xsl:for-each-group select="index-0"
   group-by="substring(
               upper-case(
                 replace(
                   normalize-unicode(heading, 'NFKD'),
                   '[&#x300;-&#x36f;]',
                   ''
                 )
               ), 1, 1
             )">
   <xsl:sort select="current-grouping-key()"/>

When writing the group (= starting letter) to an output file further 
down in you template, you should sort it according to the upper-case(…) 
part as first sort key, then according to the actual heading as a second 
(tie-breaker) sort key.

So it’s best to make a function (call it, e.g., my:sortkey) out of 
upper-case(…).

In that function, you can also do other useful stuff, such as 
eliminating stop words or replacing all numbers with a zero, so that 
everything that starts with a number will be in the same group.

Gerrit


On 2012-04-21 02:03, Graydon wrote:
> So I've got an XML index file, which is too large for some downstream
> processing to be entirely pleased with.  The requirement is to split the
> file up, grouping index entries (index-0 elements; the index element is
> the overall container element) by the first character of their child
> heading element.
>
> Using XSLT 2.0, this is pretty easy:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet exclude-result-prefixes="xs xd" version="2.0"
>    xmlns:xd="www.---.com;
>    xmlns:xs="http://www.w3.org/2001/XMLSchema"
>    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
>    <xsl:template match="/wkna-shared-cms/index">
>      <xsl:for-each-group group-by="substring(heading,1,1)" select="index-0">
>        <xsl:sort select="./heading"/>
>        <xsl:result-document href="eitaindex+Topical_Index_{current-grouping-key()}.xml">
>          <wkna-shared-cms>
>            <index area="{/wkna-shared-cms/index/@area}"
>              xml:lang="{/wkna-shared-cms/index/@xml:lang}">
>              <num cite="Topical Index {current-grouping-key()}">
>                <xsl:sequence select="current-grouping-key()"/>
>              </num>
>              <xsl:copy-of select="/wkna-shared-cms/index/index-metadata"/>
>              <xsl:copy-of select="current-group()"/>
>            </index>
>          </wkna-shared-cms>
>        </xsl:result-document>
>      </xsl:for-each-group>
>    </xsl:template>
> </xsl:stylesheet>
>
> The problem is that some of the initial characters of the headings have
> accents, and it's desired that the accented characters and the
> unaccented characters group together, so that E and É and Ê, etc. all
> group together in a group with a current-grouping-key() of "E".
>
> I can imagine doing this in a painful way with conditional statements
> and an exhaustive list of characters, but I'm hoping someone can tell me
> there's a better way.
>
> Thanks!
>
> -- Graydon
>
> --~------------------------------------------------------------------
> XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
> To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
> or e-mail:<mailto:xsl-list-unsubscribe@l...>
> --~--
>

-- 
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit.imsieke@l..., http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer: Gerrit Imsieke, Svea Jelonek,
Thomas Schmidt, Dr. Reinhard Vöckler

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe@l...>
--~--

Disclaimer

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.