Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: UTF-8 & Unicode

From: Lachlan Hunt <spam.my.gspot@-----.--->
To: NULL
Date: 2/3/2005 1:29:00 PM
EU citizen wrote:
> "Richard Tobin" <richard@c...> wrote in message
> news:ctq7fk$51s$1@p......
>>  <?xml version="1.0" encoding="whatever-the-notepad-encoding-is"?>
> 
> Based on what I know now, I agree. I always assumed that Notepad, being a
> simple text editor, saved files in Ascii format.

By default, Notepad saves files as Windows-1252.  The characters from 0 
to 127 (0x7F) are identical to US-ASCII, ISO-8859-1, UTF-8 and many 
other character sets that make use of the same subset.  Thus, any file 
saved using Windows-1252 that only makes use of those characters is 
compatible with all those other encodings.

The characters from 160 (0xA0) to 255 (0xFF) match those contained in 
ISO-8859-1.  Thus, any file saved using Windows-1252 that only makes use 
of the aforementioned US-ASCII subset and that range of characters is 
compatible with ISO-8859-1.

The characters from 128 (0x80) to 159 (0x9F), however, do not match 
those in any other encoding, making any Windows-1252 file using these 
characters incompatible with any other encoding.  For XML, this must be 
declared appropriately in the XML declaration.  The characters in this 
range contain the infamous "smart quotes" (Left and Right, single and 
double quotation marks: ‘ ’ “ ”) that cause so many problems for the 
uneducated.  Use of this range while declaring ISO-8859-1, UTF-8 or any 
other encoding, will cause errors because they are control characters in 
the character repertoires used by those encodings.

> Nothing in Notepad's Help, Windows' Help or Microsoft's website says anything
> about the formt used by Notepad.

It is actually mentioned in a few places on the web, though it's not 
easy to find.  Microsoft tend to incorrectly refer to it as ANSI, even 
though it is not.


> Through experimentation with the W3C HTML vakidator, I've worked out that
> iso-8859-1will work for Notepad files with standard english text plus acute
> accented vowels.

That's because Windows-1252 is compatible with ISO-8859-1 when that 
subset is used.

>>>Windows 95/98 Notepad files must be saved with an encoding attribute.
>>
>>This is mysterious.  What does it mean?  That Notepad won't save
>>them without one?  Or that you have to add one to make it work
>>in the web browser?
> 
> I can't make head or tail of it.

It actually means that version of Notepad will only save as 
Windows-1252, so it needs to be declared in the XML declaration.  That 
is because an XML parser will assume UTF-8 without it and that 
assumption is acceptable only when the US-ASCII subset is used.

-- 
Lachlan Hunt
http://lachy.id.au/
http://GetFirefox.com/    Rediscover the Web
http://SpreadFirefox.com/   Igniting the Web


transparent
Print
Mail
Digg
delicious
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent