Altova Mailing List Archives>Archive Index >microsoft.public.xml Archive Home >Recent entries >Thread Prev - XPathDocument and significant whitespace [Thread Next] Re: XPathDocument and significant whitespaceTo: NULL Date: 12/4/2005 11:19:00 PM Michael Liu wrote: > Why doesn't XPathDocument(validatingReader, XmlSpace.Default) preserve > *all* the significant whitespace returned by the specified > XmlValidatingReader (.NET Framework 1.1)? For example: > > <!DOCTYPE p [ > <!ELEMENT p (#PCDATA | b | i)*> > <!ELEMENT b (#PCDATA)> > <!ELEMENT i (#PCDATA)> > ]> > <p>Here is <b>bold</b> <i>italic</i> text.</p> > > The DTD says the <p> element contains mixed content, so when > XmlValidatingReader reads this document, the XmlNodeType of the space > between "</b>" and "<i>" is XmlNodeType.SignificantWhitespace. But the > space is discarded by XPathDocument -- its private ReadChildNodes > method treats significant whitespace as regular whitespace (which is > then discarded) if the reader's XmlSpace property is anything but > XmlSpace.Preserve. > > Why does XPathDocument make this extra check for xml:space="preserve"? > > (Since I don't want to put xml:space attributes all over the place in > my documents, I'm going to pass XmlSpace.Preserve as the second > argument to the XPathDocument constructor instead of XmlSpace.Default. > But I'd still like to understand this behavior.) I've been whingeing about this for years, as it occurs not just in your software but in all XML processors I have seen. It's a pain in the ass, because it means that you lose significant white-space in mixed content, which is normally irrelevant to e-commerce "data" type applications but utterly critical to normal e-publishing "document" applications for the reasons you outline above. The reason it's there is because the behaviour wrt space was changed between SGML and XML. In SGML a DTD was *required*, always, so the processor always knew in advance whether or not a given space was occurring in element content, character data content, or mixed content. In XML a DTD is optional, so the WG believed that because everyone would be working without DTDs or Schemas, the processor would have no way to detect the environment a space was occurring in. Hence the rule that the parser passes *all* space to the application. The mistake IMHO was in the implementation of the preservation/deletion of space, in that processors pay insufficient attention to the element environment in which the space was found; and in any event the model for the removal of space in mixed content is in error. Unfortunately there is nothing that can be done about it except lobby the authors of software to fix this: which they won't, because they believe (wrongly, IMHO) that it would conflict with the design of XML; and because there are too few of them from the document processing field, so they have a problem in understanding why there is a problem. It was a very rare example of carelessness for which we will have to pay dearly in wasted time programming around it. ///Peter -- XML FAQ: http://xml.silmaril.ie/ | ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
