Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: XPathDocument and significant whitespace

From: Peter Flynn <peter.nosp@-.--------.-->
To: NULL
Date: 12/4/2005 11:19:00 PM
Michael Liu wrote:

> Why doesn't XPathDocument(validatingReader, XmlSpace.Default) preserve
> *all* the significant whitespace returned by the specified
> XmlValidatingReader (.NET Framework 1.1)? For example:
> 
> <!DOCTYPE p [
> <!ELEMENT p (#PCDATA | b | i)*>
> <!ELEMENT b (#PCDATA)>
> <!ELEMENT i (#PCDATA)>
> ]>
> <p>Here is <b>bold</b> <i>italic</i> text.</p>
> 
> The DTD says the <p> element contains mixed content, so when
> XmlValidatingReader reads this document, the XmlNodeType of the space
> between "</b>" and "<i>" is XmlNodeType.SignificantWhitespace. But the
> space is discarded by XPathDocument -- its private ReadChildNodes
> method treats significant whitespace as regular whitespace (which is
> then discarded) if the reader's XmlSpace property is anything but
> XmlSpace.Preserve.
> 
> Why does XPathDocument make this extra check for xml:space="preserve"?
> 
> (Since I don't want to put xml:space attributes all over the place in
> my documents, I'm going to pass XmlSpace.Preserve as the second
> argument to the XPathDocument constructor instead of XmlSpace.Default.
> But I'd still like to understand this behavior.)

I've been whingeing about this for years, as it occurs not just in your
software but in all XML processors I have seen. It's a pain in the ass,
because it means that you lose significant white-space in mixed content,
which is normally irrelevant to e-commerce "data" type applications but
utterly critical to normal e-publishing "document" applications for the
reasons you outline above.

The reason it's there is because the behaviour wrt space was changed 
between SGML and XML. In SGML a DTD was *required*, always, so the
processor always knew in advance whether or not a given space was
occurring in element content, character data content, or mixed content.
In XML a DTD is optional, so the WG believed that because everyone would
be working without DTDs or Schemas, the processor would have no way to
detect the environment a space was occurring in. Hence the rule that the
parser passes *all* space to the application. 

The mistake IMHO was in the implementation of the preservation/deletion
of space, in that processors pay insufficient attention to the element
environment in which the space was found; and in any event the model
for the removal of space in mixed content is in error.

Unfortunately there is nothing that can be done about it except lobby
the authors of software to fix this: which they won't, because they
believe (wrongly, IMHO) that it would conflict with the design of XML;
and because there are too few of them from the document processing 
field, so they have a problem in understanding why there is a problem.
It was a very rare example of carelessness for which we will have to
pay dearly in wasted time programming around it.

///Peter
-- 
XML FAQ: http://xml.silmaril.ie/



transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent