Altova Mailing List Archives>Archive Index >xsl-list Archive Home >Recent entries >Thread Prev - >Thread Next - Re: [xsl] regex, shortest match [xsl] regex, shortest matchTo: Date: 8/1/2008 7:24:00 AM I'm looking to parse sentences out of paras. Input <para>It is sometimes desired to have a specific heading which should not be numbered. This corresponds to unnumbered list headers in lists (see sections 4.3). To facilitate this, an optional attribute text:is-list-header can be used. If true, the given header will not be numbered, even if an explicit list-style is given. </para> <para>A text:style-name attribute references a paragraph style, while a text:cond-style-name attribute references a conditional-style, that is, a style that contains conditions and maps to other styles (see section 14.1.1). If a conditional style is applied to a paragraph, the text:style-name attribute contains the name of the style that was the result of the conditional style evaluation, while the conditional style name itself is the value of the text:cond-style-name attribute. This XML structure simplifies [XSLT] transformations because XSLT only has to acknowledge the conditional style if the formatting attributes are relevant. The referenced style can be a common style or an automatic style.</para> <para>A text:class-names attribute takes a whitespace separated list of paragraph style names. The referenced styles are applied in the order they are contained in the list. If both, text:style-name and text:class-names are present, the style referenced by the text:style-name attribute is as the first style in the list in text:class-names. If a conditional style is specified together with a style:class-names attribute, but without the text:style-name attribute, then the first style in the style list is used as the value of the missing text:style-name attribute. </para> <para>A page sequence element <text:page-sequence> specifies a sequence of master pages that are instantiated in exactly the same order as they are referenced in the page sequence. If a text document contains a page sequence, it will consist of exactly as many pages as specified. Documents with page sequences do not have a main text flow consisting of headings and paragraphs as is the case for documents that do not contain a page sequence. Text content is included within text boxes for documents with page sequences. The only other content that is permitted are drawing objects. </para> This 'works', but hits the longest match. I can't come up with a regex that has a sufficiently broad range, yet matches on the shortest match. Any suggestions please. TIA DaveP <xsl:template match="para"> <para> <xsl:variable name='contents' select="normalize-space(.)"/> <xsl:copy-of select="dp:sentence($contents)"/> </para> </xsl:template> <!-- Isolate sentences within para's --> <xsl:function name="dp:sentence"> <xsl:param name="nd" as='xs:string'/> <xsl:analyze-string regex="((.+).) |$ " select="$nd"> <xsl:matching-substring> <s> <xsl:value-of select="regex-group(1)"/> </s> </xsl:matching-substring> <xsl:non-matching-substring> <p2><xsl:value-of select="."/></p2> </xsl:non-matching-substring> </xsl:analyze-string> </xsl:function> regards -- Dave Pawson XSLT XSL-FO FAQ. http://www.dpawson.co.uk | ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
