Altova Mailing List Archives
>xml-dev Archive Home
>Thread Prev - Re: [xml-dev] CDATA headache
RE: [xml-dev] CDATA headache
To: "'michael odling-smee'" <mike.odlingsmee@-----.--->, "'Bjoern Hoehrmann'"
Date: 9/9/2009 4:29:00 PM
I sympathize with your customer and you as well. You will note that XDM http://www.w3.org/TR/2007/REC-xpath-datamodel-20070123/#Node on which XSLT2, XQuery and XPath are based has no CData node. XDM cannot preserve the DOCTYPE directive either. XDM has attitude - it represents a more simplified processing model for XML than InfoSet. I like simple but is XDM more simple than it should be. Roundtripping of serialized forms is a current issue for XML and XDM - the tools like: SAX XSLT XPATH/XQUERY XML databases Have to all agree on the data model - right now, in a pipeline, it is lowest common denominator that carries the day - for example SAX can do DocType declarations but XSLT cannot pass the through, so they get dropped. SAX LexicalHandler supports CDATA begin and end. Jim _____ From: mike@x... [mailto:mike@x...] On Behalf Of michael odling-smee Sent: Wednesday, September 09, 2009 4:34 AM To: Bjoern Hoehrmann Cc: XML Developers List Subject: Re: [xml-dev] CDATA headache Bjoern, Thanks for the link to the XPath specification - had not thought to look there! Section 5.7 (http://www.w3.org/TR/xpath#section-Text-Nodes) should hopefully be enough to convince the customer. Kind regards, Michael On Wed, Sep 9, 2009 at 12:12 PM, Bjoern Hoehrmann <derhoermi@g...> wrote: * michael odling-smee wrote: >Now I am well aware that these are entirely equivalent from an XML >standpoint - however the customer point of view is if they are equivalent >why has the parser altered the way the characters are escaped? On this front >it is unlikely that links to sites such as >http://www.dpawson.co.uk/xsl/sect2/cdata.html#d3164e447 will be enough to >convince them that our processor is behaving correctly - they need a formal >specification. I have searched the XML specification (to no avail) and was >wondering whether anyone could point me to the relevant place which >specifies that this is expected behaviour of the parser/xslt processor. As far as XSLT 1.0 output goes this is covered by the XPath 1.0 data model defined http://www.w3.org/TR/xpath#data-model in which there are no CDATA sections; without proprietary extensions the difference be- tween the two forms cannot be represented and thus not retained. You can use the cdata-section-elements attribute of the xsl:output element to have the serializer add such sections to elements with certain names. This is the result of simplification, more work is required to retain the insignificant differences, just like you do not usually treat the difference between example='value' and example="value" or the difference between example='ö', example='ö', and example='ö' as something worth retaining. To retain the differences you could, for example, pre- process the documents, replacing CDATA sections with e.g. elements, and then transform the elements back into CDATA sections later on. -- Björn Höhrmann · mailto:bjoern@h... · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/