Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: [xml-dev] [OT] bugs in JDK regex engine ?

From: "Mukul Gandhi" <gandhi.mukul@-----.--->
To: xml-dev@-----.---.---
Date: 2/4/2008 3:58:00 AM
Thanks Mike, for your comments.

Below is a simple example I tried with JDK 1.6.0.

String str = "<root><abc x='1'>text1</abc><pqr y='1'>text2</pqr></root>";

Pattern pattern = Pattern.compile("<[^/]+>");  //anything from '<' to
'>', and not having '/'
Matcher matcher = pattern.matcher(str);

while (matcher.find()) {
   String group = matcher.group();
   System.out.println(group);
}

'str' is a String representation of a XML fragment.

I want to extract all pieces from the string (the tokens), which form
a start tag (including attribute parts).

I am expecting output:
<root>
<abc x='1'>
<pqr y='1'>

But the output produced by the above program is:
<root><abc x='1'>
<pqr y='1'>

You could notice, that the 1st token is larger ...

Can you or anybody please help ...

On Feb 3, 2008 10:52 PM, Michael Kay <mike@s...> wrote:
> Saxon translates XML Schema and XPath regexes into JDK regexes, so it's
> pretty heavily dependent on the underlying regex engine. There are some
> cases where the behaviour is very incompletely specified, for example the
> effect of the "i" (case-blind) flag, but I've found very few cases where the
> expected behaviour is clear and the actual behaviour differs. In my
> experience, it's much more likely to be a user error.
>
> However, I think it might be stretching the (highly elastic) patience of
> this list to hold a discussion of JDK regex behaviour here.
>
> In any case, I think the whole concept of checking XML well-formedness using
> regular expressions is misguided, for the simple reason that (on theoretical
> grounds) regular expressions aren't up to the job.
>
> Michael Kay
> http://www.saxonica.com/


-- 
Regards,
Mukul Gandhi


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent