Altova Mailing List Archives>Archive Index >xml-dev Archive Home >Recent entries >Thread Prev - Copying text (curly quotes) from Word into an XML document (UTF-8): what happens? [Thread Next] Re: [xml-dev] Copying text (curly quotes) from Word into an XML document (UTF-8): what happens?To: xml-dev@-----.---.--- Date: 9/3/2007 12:07:00 AM On 02/09/07, G. Ken Holman <gkholman@c...> wrote: > >Notepad doesn't understand UTF-8 encoded files. > > False ... I just opened Notepad and wrote out a file using UTF-8 and > opened it up again and it was preserved. An XML processor read the > file and didn't complain about the encoding. I'm running XP. If you save as UTF-8 from notepad, it adds a BOM (EF BB BF) which will let it recognise it as UTF-8 in future, but which isn't recognised by some XML parsers, such as the default one shipped with Java 1.4 (Crimson). See http://lists.xml.org/archives/xml-dev/200106/msg00358.html for discussion whether XML should be changed to make such files legal XML. If you save as UTF-8 from other editors, they often don't add the BOM and if you open such UTF-8 files in Notepad it doesn't deduce it's UTF-8 (which there isn't an easy way to do). So notepad isn't able to produce files which can be processed by some UTF-8 compliant applications, including spec complient XML parsers, and is not able to process UTF-8 encoded files created by some other applications. The same applies to the UTF-8 encoding used by the .net XML writer - it adds a BOM, which confuses applications expecting UTF-8 encoded XML to start with '<' or whitespace. I got the codepoint wrong for the curly quotes. Pete | ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
