Re: Non-Ascii Characters modified when Document Loaded

From: "Anthony Jones" <Ant@------------.--->
Date: 3/20/2007 9:34:00 AM

"mikes" <mikes@d...> wrote in message
> I was hoping that I was doing something really dumb and there would be a
> quick solution that would not require a lot of detail.  But I guess that's
> not the case, so here is the detail.
> My application is written in C++.  I am using VS 2003 (soon to be upgraded
> to VS 2005 I hope) and Platform SDK version 3790.1830.  I'm using the Load
> method of the IXMLDOMDocument class to load the xml document file and the
> Save method to write the document back to a file.  The xml document I am
> loading is UFT-8 encoded.
> After a much deeper look at what is happen I have found that the Load
> is loading the XML correctly from the file, it is not stepping on the
> non-ASCII characters in the XML.  It is the Save method that is causing
> problem.  It is UFT-8 encoding characters that have already been encoded
> UFT-8.  An example will make this clearer.
> The test XML document I am using has a ® (registered symbol) in the data
> associated with one of its elements.  Its unicode value is 00AE.  When the
> document is read into the DOM the symbol's code is C2 AE which is the
> UFT-8 encoding of the register symbol character.  When the document is
> the two character C2 AE code is replaced by the four characters C3 82 C2
> It turns out C3 82 is the UTF-8 encoding of the non-ASCII character Â
> has a unicode value of 00C2.  So it appears that the Save method is
> processing the content of the DOM as though it is raw unicode that needs
> be UFT-8 encoded, even though the content was read from a UFT-8 encoded
> source and needs no encoding.
> How do I turn off this unwanted encoding?

Does the source file contain the UTF-8 byte order mark at the beginning of
the file?
Does the source file contain a <?xml declaration and does it specify an
If so what encoding does it specifiy?
How have you determined that the load hasn't misinterpreted the encoding?


