Altova Mailing List Archives

RE: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)

From: "Marc Portier" <mpo@---------------->
Date: 1/10/2002 1:48:00 PM
Hi Jeni,

> -----Original Message-----
> From: Jeni Tennison [mailto:jeni@xxxxxxxxxxxxxxxx]
> Sent: donderdag 10 januari 2002 14:05
> To: Marc Portier
> Cc: Steven Noels; xsl-list@xxxxxxxxxxxxxxxxxxxxxx
> Subject: Re: Regular expression functions (Was: Re: [xsl] comments on
> December F&O draft)
> Hi Marc,
> > some
> > <regex name="fancy-number">[0-9]+(\.[0-9]+)?([Ee][+-][0-9]+)?</regex>
> >
> > could then later be used inside
> > <matcher name="" regex="(other groups):fancy-number:(other groups)">
> > ... while nested matchers or output-selecting elements could
> then use group
> > selections like
> > 1.      <...    select-group="1"> ... or 2 refering to counting
> the parenthesis in
> > the scoped regex of this matcher
> > 2.      <... select-group=":fancy-number:2" >
> > </matcher>
> >
> > could be challenging to implement (spontanous idea of using the
> > indexes as offsets in counting parenthesis)
> I like this method better than the Omnimark method of assigning the
> names within the regular expression itself, because it doesn't clutter
> the regular expression (if anything it makes it more readable) and it
> allows regular expressions to be reused.

> There are a couple of issues that would need to be worked out with it,
> though. What happens if you have a regular expression that involved
> two instances of the named subexpression at the same level:
>   <matcher name="two-numbers" regexp=":fancy-number:\w:fancy-number:">
>     ...
>   </matcher>
> You need to have separate indexes to indicate which one you're talking
> about, plus some kind of syntax to pull out submatches within the
> named subexpression. Borrowing from XPath syntax (which might be a bad
> idea), you might have:
>   fancy-number[2]/*[2]
jep, had short internet-time juste before I left with sending this reply, it
crossed my mind later,
that indeed double reuse of one regex inside another one could occur, nice
to see there is already a syntax inside the world of xslt-awares that would
help out.

> to indicate the second subexpression of the second fancy-number
> subexpression in the matched string.
trying to catch it completely though:

you mean:
the *[index] is throwing all named subregexes on one array and getting the
second regardless it's name, right?

getting an actual parenthesis group out of a named subregex would be
different, no?
example of the nuance I'm seeing: how would I select the exponent-group out
of the second matched fancy-number in the folowing setting?

no sub-subregex's only parenthesis groups
<regex name="fancy-number">[0-9]+(\.[0-9]+)?([Ee][+-][0-9]+)?</regex>
<matcher name="two-numbers" regexp=":fancy-number:\w:fancy-number:">

compared to:
<regex name="exponent">[Ee][+-][0-9]+</regex>
<regex name="fractalpart">\.[0-9]+</regex>
<regex name="fancy-number">[0-9]+:fractalpart:?:exponent:?</regex>
<matcher name="two-numbers" regexp=":fancy-number:\w:fancy-number:">
or	select-group="fancy-number[2]/exponent"

> Actually, that syntax isn't all that bad - you can imagine the matcher
> actually builds up a tree structure based on the subexpression
yep, need some more imagination before actually building it though :-)

> matches (you need 'anonymous' elements for unnamed subexpressions, but
> you should be able to get away with that using elements in some
> restricted namespace or something)...
mmm... don't understand how we could get unnamed subexpressions?
as far as I see now, we'ld need :name: to slice them in, no?

> > this also makes me think about your earlier mentioning of dynamic
> > regexes you probably expect anything that qualifies as a
> > text-representing xsl parameter to be possibly carrying part of the
> > regex to be executed...
> I think that if you could build the named regular expressions
> dynamically, then this idea would work fine. Going back to the keyword
> example that I used on an earlier mail, you could do:
> <xsl:regexp name="keyword-as-word"
>             select="concat('\W', $keyword, '\W')" />
> If named regular expressions were like variables, you could assign
> them values at the global or local level...

> Cheers,
> Jeni
> ---
> Jeni Tennison

 XSL-List info and archive:


These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.