Altova Mailing List Archives

RE: [xml-dev] CDATA headache

From: "Jim Tivy" <jimt@----------.--->
To: "'michael odling-smee'" <mike.odlingsmee@-----.--->, "'Bjoern Hoehrmann'"
Date: 9/9/2009 4:29:00 PM
I sympathize with your customer and you as well.


You will note that XDM on which XSLT2,
XQuery and XPath are based has no CData node.

XDM cannot preserve the DOCTYPE directive either.


XDM has attitude - it represents a more simplified processing model for XML
than InfoSet.  I like simple but is XDM more simple than it should be.


Roundtripping of serialized forms is a current issue for XML and XDM - the
tools like:





XML databases


Have to all agree on the data model - right now, in a pipeline, it is lowest
common denominator that carries the day - for example SAX can do DocType
declarations but XSLT cannot pass the through, so they get dropped.  SAX
LexicalHandler supports CDATA begin and end.






From: mike@x... [mailto:mike@x...] On Behalf Of
michael odling-smee
Sent: Wednesday, September 09, 2009 4:34 AM
To: Bjoern Hoehrmann
Cc: XML Developers List
Subject: Re: [xml-dev] CDATA headache



Thanks for the link to the XPath specification - had not thought to look
there! Section 5.7 ( should
hopefully be enough to convince the customer.

Kind regards,


On Wed, Sep 9, 2009 at 12:12 PM, Bjoern Hoehrmann <derhoermi@g...> wrote:

* michael odling-smee wrote:
>Now I am well aware that these are entirely equivalent from an XML
>standpoint - however the customer point of view is if they are equivalent
>why has the parser altered the way the characters are escaped? On this
>it is unlikely that links to sites such as
> will be enough to
>convince them that our processor is behaving correctly - they need a formal
>specification. I have searched the XML specification (to no avail) and was
>wondering whether anyone could point me to the relevant place which
>specifies that this is expected behaviour of the parser/xslt processor.

As far as XSLT 1.0 output goes this is covered by the XPath 1.0 data
model defined in which there are
no CDATA sections; without proprietary extensions the difference be-
tween the two forms cannot be represented and thus not retained. You
can use the cdata-section-elements attribute of the xsl:output element
to have the serializer add such sections to elements with certain names.

This is the result of simplification, more work is required to retain
the insignificant differences, just like you do not usually treat the
difference between example='value' and example="value" or the difference
between example='&#xF6;', example='&#246;', and example='ö' as something
worth retaining. To retain the differences you could, for example, pre-
process the documents, replacing CDATA sections with e.g. elements, and
then transform the elements back into CDATA sections later on.
Björn Höhrmann · mailto:bjoern@h... ·
Am Badedeich 7 · Telefon: +49(0)160/4415681 ·
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 ·



These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.