![]() |
![]() | ![]() | ![]() | Altova Mailing List Archives>Archive Index >comp.text.xml Archive Home >Recent entries >Thread Prev - Re: UTF-8 & Unicode [Thread Next] Re: UTF-8 & UnicodeTo: NULL Date: 2/3/2005 1:29:00 PM EU citizen wrote: > "Richard Tobin" <richard@c...> wrote in message > news:ctq7fk$51s$1@p...... >> <?xml version="1.0" encoding="whatever-the-notepad-encoding-is"?> > > Based on what I know now, I agree. I always assumed that Notepad, being a > simple text editor, saved files in Ascii format. By default, Notepad saves files as Windows-1252. The characters from 0 to 127 (0x7F) are identical to US-ASCII, ISO-8859-1, UTF-8 and many other character sets that make use of the same subset. Thus, any file saved using Windows-1252 that only makes use of those characters is compatible with all those other encodings. The characters from 160 (0xA0) to 255 (0xFF) match those contained in ISO-8859-1. Thus, any file saved using Windows-1252 that only makes use of the aforementioned US-ASCII subset and that range of characters is compatible with ISO-8859-1. The characters from 128 (0x80) to 159 (0x9F), however, do not match those in any other encoding, making any Windows-1252 file using these characters incompatible with any other encoding. For XML, this must be declared appropriately in the XML declaration. The characters in this range contain the infamous "smart quotes" (Left and Right, single and double quotation marks: ‘ ’ “ â€) that cause so many problems for the uneducated. Use of this range while declaring ISO-8859-1, UTF-8 or any other encoding, will cause errors because they are control characters in the character repertoires used by those encodings. > Nothing in Notepad's Help, Windows' Help or Microsoft's website says anything > about the formt used by Notepad. It is actually mentioned in a few places on the web, though it's not easy to find. Microsoft tend to incorrectly refer to it as ANSI, even though it is not. > Through experimentation with the W3C HTML vakidator, I've worked out that > iso-8859-1will work for Notepad files with standard english text plus acute > accented vowels. That's because Windows-1252 is compatible with ISO-8859-1 when that subset is used. >>>Windows 95/98 Notepad files must be saved with an encoding attribute. >> >>This is mysterious. What does it mean? That Notepad won't save >>them without one? Or that you have to add one to make it work >>in the web browser? > > I can't make head or tail of it. It actually means that version of Notepad will only save as Windows-1252, so it needs to be declared in the XML declaration. That is because an XML parser will assume UTF-8 without it and that assumption is acceptable only when the US-ASCII subset is used. -- Lachlan Hunt http://lachy.id.au/ http://GetFirefox.com/ Rediscover the Web http://SpreadFirefox.com/ Igniting the Web | ![]() | ![]() | ![]() |
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | |||||
|
