Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: [xsl] segmenting a paragraph

From: "G. Ken Holman" <gkholman@-------------------->
To:
Date: 10/2/2007 8:35:00 AM
At 2007-10-02 17:05 +0900, Christian Wittern wrote:
In trying to solve the following problem I am seeking your help:

I want to segment paragraphs in a text, so that sentences are 
enclosed in a <s> element and within the sentences, words between 
interpunction are within <seg> elements.



So far, I have been capturing the content of <p> in a string and 
then using two nested <xsl:analyze-string> blocks with regexes, 
which work nicely and do what I want.  Now I discovered that there 
are <note> elements with additional markup in some paragraphs, which 
get lost in this process. However, I really want to leave these 
notes alone, as they are.  So:



<p>Some text.  Some more text, with a comma. <note>This stuff, how 
boring</note></p>



should look like:



<p><s><seg>Some text.</seg></s><s><seg>Some more text,</seg><seg> 
with a comma.</seg></s><note>This stuff, how boring</note></p>



I wonder how I tell the processor to leave the note stuff alone?

From your comment "capturing the content in a string and then..." 
I'm assuming you have something like:



  <xsl:template match="p">
    <xsl:analyze-string select="." .....
  </xsl:template>

If you break this into pieces you can work on each text bit in turn:



  <xsl:template match="p">
    <xsl:apply-templates mode="in-p" select="node()"/>
  </xsl:template>
  <xsl:template mode="in-p" match="*">
    <xsl:apply-templates select="."/> <!--reapply in the default mode-->
  </xsl:template>
  <xsl:template mode="in-p" match="text()">
    <xsl:analyze-string select="." .....


I hope this helps.



. . . . . . . . . . . . Ken



--
Upcoming public training: UBL and code lists Oct 1/5; Madrid Spain
World-wide corporate, govt. & user group XML, XSL and UBL training
RSS feeds:     publicly-available developer resources and training
G. Ken Holman                 mailto:gkholman@xxxxxxxxxxxxxxxxxxxx
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/s/
Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
Male Cancer Awareness Jul'07  http://www.CraneSoftwrights.com/s/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent