Altova Mailing List Archives

Re: xsl, html entities, and encoding

From: mmodrall@------.------
Date: 1/21/2005 10:41:00 AM
Well, our desire to get the named entity stems from two things, really 
(though I know at least one is taken care of by   The first reason 
is that named entities are more portable in html when the web dev guys don't 
always dot their i's and cross their t's on the encoding declarations, and 
regular asp doesn't help.

Regular asp does not emit a Content-Type: ...;charset= declaration unless 
the asp page explicitly departs from the system default.  This has always 
created a portability/internationalization hole in asp.  Your average system 
default encoding will be windows-1252, but there's nothing in the response 
stream by default to give the receiving browser a hint what it's getting, so 
browsers not using the same default will sometimes get it wrong and the 
display comes out wonky when the literal characters are in there.  I've 
noticed that doesn't let any response go out without a ;charset= 
declaration, so that will help, but we're not in yet.

Yes, we could tell all of the web dev guys that *every* page has to include 
a <meta> header to declare the encoding, but they often forget and they don't 
test browsers with Korean, say, so we don't know a non-portable page has gone 
out until we get complaints.

In this environment, it would be nicer for the webdev guys writing xsl to be 
able to say   or « in their xsl and nicer if the html it generated also came 
out   or « in the output stream as well.  More portable, less open to 
internationalization problems.

The secondary desire is that it's just clearer.    is clearer to the 
entry-level webdev guy what's desired than   - this gets even more true when 
you're talking about the more obscure symbol entity declarations, like « («) 
or » (»).

I know that xml mantra is that an entity is a value and a value is an entity 
so it shouldn't matter whether the named entity, numeric entity, or literal 
char gets used in xsl, but the xml framework keeps encoding issues more in 
mind.  Html guys often forget.  I'm just trying to find tips and tricks to 
make it easier for them to make pages portable without adding a lot more 
steps for them, and getting named entities as output seems like it would make 
things easier on them.

I hope this at least explains the desire, even if there isn't an easy way to 
do it.


"Martin Honnen" wrote:

> It seems even that the post is missing some stuff. I don't know of a way 
> to force MSXML to output entity references instead of the proper 
> character of instead of a proper numeric character reference.
> If you set
>    <xsl:output encoding="US-ASCII" method="html" />
> then I suppose you get numeric character references for anything not in 
> ASCII but I don't see why you would want that.


These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.