Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


[xsl] regex, shortest match

From: Dave Pawson <davep@------------->
To:
Date: 8/1/2008 7:24:00 AM
I'm looking to parse sentences out of paras.



Input



<para>It is sometimes desired to have a specific heading which should 
not be numbered. This corresponds to unnumbered list headers in lists 
(see sections 4.3). To facilitate this, an optional attribute 
text:is-list-header can be used. If true, the given header will not be 
numbered, even if an explicit list-style is given. </para>

<para>A text:style-name attribute references a paragraph style, while a 
text:cond-style-name attribute references a conditional-style, that is, 
a style that contains conditions and maps to other styles (see section 
14.1.1). If a conditional style is applied to a paragraph, the 
text:style-name attribute contains the name of the style that was the 
result of the conditional style evaluation, while the conditional style 
name itself is the value of the text:cond-style-name  attribute. This 
XML structure simplifies [XSLT] transformations because XSLT only has to 
acknowledge the conditional style if the formatting attributes are 
relevant. The referenced style can be a common style or an automatic 
style.</para>

<para>A text:class-names attribute takes a whitespace separated list of 
paragraph style names. The referenced styles are applied in the order 
they are contained in the list. If both, text:style-name and 
text:class-names are present, the style referenced by the 
text:style-name attribute is as the first style in the list in 
text:class-names. If a conditional style is specified together with a 
style:class-names attribute, but without the text:style-name  attribute, 
then the first style in the style list is used as the value of the 
missing text:style-name attribute. </para>

<para>A page sequence element &lt;text:page-sequence> specifies a 
sequence of master pages that are instantiated in exactly the same order 
as they are referenced in the page sequence. If a text document contains 
a page sequence, it will consist of exactly as many pages as specified. 
Documents with page sequences do not have a main text flow consisting of 
headings and paragraphs as is the case for documents that do not contain 
a page sequence. Text content is included within text boxes for 
documents with page sequences. The only other content that is permitted 
are drawing objects. </para>



This 'works', but hits the longest match. I can't come up with
a regex that has a sufficiently broad range, yet matches on the shortest
match.

Any suggestions please.



TIA DaveP




 <xsl:template match="para">
    <para>
      <xsl:variable name='contents' select="normalize-space(.)"/>
      <xsl:copy-of select="dp:sentence($contents)"/>
    </para>
  </xsl:template>

<!-- Isolate sentences within para's -->
<xsl:function name="dp:sentence">
  <xsl:param name="nd" as='xs:string'/>
  <xsl:analyze-string regex="((.+).) |$ " select="$nd">
    <xsl:matching-substring>
          <s>
            <xsl:value-of select="regex-group(1)"/>
          </s>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
          <p2><xsl:value-of select="."/></p2>
    </xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:function>


regards



--
Dave Pawson
XSLT XSL-FO FAQ.
http://www.dpawson.co.uk


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent