Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: [xml-dev] Schemas and mixed content with Relax NG and W3C XML Schema

From: rjelliffe@-------.---.--
To: "Philippe Poulard" <philippe.poulard@------.-----.-->
Date: 7/16/2008 2:43:00 PM
> hi,
>
> this is a question about schemas
>
> I know that with DTDs, when a text is allowed with elements, the best we
> can do is to allow it everywhere between other elements that can be
> repeated at any place in the text :
>
> <!ELEMENT p (#PCDATA|a|ul|b|i|em)*>
>
> unfortunately, we can't enforce the text to be at a given place :
>
> <person>Mr <firstname>John</firstname><lastname>Doe</lastname></person>
>
> the following DTD is invalid, but explain what we'd like to have :
> <!ELEMENT person (#PCDATA,firstname,lastname)>
>
> I wonder if there are also similar limitations with Relax NG and W3C XML
> Schema and why ?

SGML DTDs do allow that kind of structure.

Unfortunately, there was a logical flaw that it exposed that was very
difficult. It was called the pernicuous mixed content problem.

Say you have a content model like this:
    <!ELEMENT person ( (title | #PCDATA) , firstname, lastname)>
where you can either mark up the title or just have it.

Now we have a document
    <person><title>Mr</title><firstname>John</firstname><lastname>Doe</lastname></person>

That is fine.

But now we take that same document and pretty print it.

<person>
    <title>Mr</title>
    <firstname>John</firstname>
    <lastname>Doe</lastname>
</person>

This is invalid!  Why? Because the initial whitespace is taken to match the
$PCDATA, and the the <title> element is unexpected.

This problem could happen for all sorts of strange reasons, such as if you
were using a system with automatic line breaking and the start tag for
person was at the end of the line.

So in the end, in XML it was decided to dump this as too problematic. So
only (#PCDATA, ...)* was allowed, which is the same as XSD's mixed=true.

However, with RELAX NG it was realized that the problem does not occur for
tokens. So having tokens as well as elements such as
   ( "Mr" | "Mrs), firstname, lastname
will not trigger this problem.

Cheers
Rick Jelliffe


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent