Altova Mailing List Archives>Archive Index >comp.text.xml Archive Home >Recent entries >Thread Prev - Re: UTF-8 & Unicode >Thread Next - Re: UTF-8 & Unicode Re: UTF-8 & UnicodeTo: NULL Date: 2/5/2005 12:10:00 PM In article <Pine.LNX.4.61.0502041627530.8656@p...>, "Alan J. Flavell" <flavell@p...> wrote: > On Fri, 4 Feb 2005, Henri Sivonen wrote: > > > "Alan J. Flavell" <flavell@p...> wrote: > > > > > But that's OK, since any plausible encoding produced by the editor can > > > be transformed by rote into utf-8 prior to subsequent XML processing > > > (that's the XML relevance). > > > > Such conversion leads to bugs like this one: > > https://bugzilla.mozilla.org/show_bug.cgi?id=174351 > > Does it? I'll have to ask you to explain that in more detail, please. > As far as I can see, the bug relates to a byte stream which is not > valid utf-8 - which by definition is therefore not utf-8 at all. > > What I'm talking about is taking a properly-labelled and > properly-formed character stream in some known encoding, and > transcoding that into properly-formed utf-8 (with appropriate > re-labelling, of course). The problem is that the XML spec is not only concerned with proper UTF-8 streams but also says what to do in improper cases. If the character encoding conversion is decoupled from the XML processor, but this is viewed as an implementation detail so that the combination of the converter and actual XML processor is subjected to the conformance requirements placed on XML processors, non-conformance ensues if the converter is lenient, which they usually are. -- Henri Sivonen hsivonen@i... http://iki.fi/hsivonen/ Mozilla Web Author FAQ: http://mozilla.org/docs/web-developer/faq.html | ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
