Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: [xml-dev] Line ending normalization

From: "G. Ken Holman" <gkholman@----------------.--->
To: xml-dev@-----.---.---
Date: 5/4/2009 7:14:00 PM
At 2009-05-04 12:14 -0400, Bob Kline wrote:
>I'm having a hard time finding the language in the 1.0 spec [1] 
>which would make it clear whether the line ending normalization 
>which XML processors must perform (more precisely, "must behave as 
>if it normalized all line breaks ...") happens before or after the 
>replacement of character entities.

A line end sequence is comprised only of naked characters, not 
composed parsed numeric character references.

>In other words, for the following document:
>
><a>x&#x000d;&#x000a;y</a>
>
>is the value returned by the XML parser for the text content of 
>element e "x\r\ny" or "x\ny"?

"x\r\ny" because that is what is in the element ... there are no line 
end sequences in the element.

>Could someone point to the language which would address this timing 
>question?

Here:

   http://www.w3.org/TR/2008/REC-xml-20081126/#sec-line-ends


   XML parsed entities are often stored in computer files which,
   for editing convenience, are organized into lines. These lines
   are typically separated by some combination of the characters
   CARRIAGE RETURN (#xD) and LINE FEED (#xA).

   To simplify the tasks of applications, the XML processor MUST
   behave as if it normalized all line breaks in external parsed
   entities (including the document entity) on input, before
   parsing, by translating both the two-character sequence #xD #xA
   and any #xD that is not followed by #xA to a single #xA character.

Note that the "#xA" and "#xD" bits of text are *not* parsed numeric 
character references, they are only prose character references.  It 
is an unambiguous way of referring to the characters, but it is the 
naked characters that are being referred to.

Note the bit "before parsing" ... so the naked characters get 
replaced by a naked #xA and *then* the parsed numeric character 
references of your example would be parsed as content.

>And do the major XML parser implementations handle this issue consistently?

I haven't tripped over a problem with this with various 
implementations ... have you recognized inconsistent 
behaviour?  Certainly the specification seems unambiguous.

I hope this helps.

. . . . . . . . . . Ken

--
XQuery/XSLT/XSL-FO hands-on training - Los Angeles, USA 2009-06-08
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/x/
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video lesson:    http://www.youtube.com/watch?v=PrNjJCh7Ppg&fmt=18
Video overview:  http://www.youtube.com/watch?v=VTiodiij6gE&fmt=18
G. Ken Holman                 mailto:gkholman@C...
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/x/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent