Altova Mailing List Archives
>microsoft.public.xml Archive Home
>Thread Prev - Re: xsl, html entities, and encoding
Re: xsl, html entities, and encoding
Date: 1/21/2005 10:41:00 AM
Well, our desire to get the named entity stems from two things, really (though I know at least one is taken care of by asp.net). The first reason is that named entities are more portable in html when the web dev guys don't always dot their i's and cross their t's on the encoding declarations, and regular asp doesn't help. Regular asp does not emit a Content-Type: ...;charset= declaration unless the asp page explicitly departs from the system default. This has always created a portability/internationalization hole in asp. Your average system default encoding will be windows-1252, but there's nothing in the response stream by default to give the receiving browser a hint what it's getting, so browsers not using the same default will sometimes get it wrong and the display comes out wonky when the literal characters are in there. I've noticed that asp.net doesn't let any response go out without a ;charset= declaration, so that will help, but we're not in asp.net yet. Yes, we could tell all of the web dev guys that *every* page has to include a <meta> header to declare the encoding, but they often forget and they don't test browsers with Korean, say, so we don't know a non-portable page has gone out until we get complaints. In this environment, it would be nicer for the webdev guys writing xsl to be able to say or Â« in their xsl and nicer if the html it generated also came out or Â« in the output stream as well. More portable, less open to internationalization problems. The secondary desire is that it's just clearer. is clearer to the entry-level webdev guy what's desired than - this gets even more true when you're talking about the more obscure symbol entity declarations, like Â« (Â«) or Â» (Â»). I know that xml mantra is that an entity is a value and a value is an entity so it shouldn't matter whether the named entity, numeric entity, or literal char gets used in xsl, but the xml framework keeps encoding issues more in mind. Html guys often forget. I'm just trying to find tips and tricks to make it easier for them to make pages portable without adding a lot more steps for them, and getting named entities as output seems like it would make things easier on them. I hope this at least explains the desire, even if there isn't an easy way to do it. Thanks _mark "Martin Honnen" wrote: > It seems even that the post is missing some stuff. I don't know of a way > to force MSXML to output entity references instead of the proper > character of instead of a proper numeric character reference. > If you set > <xsl:output encoding="US-ASCII" method="html" /> > then I suppose you get numeric character references for anything not in > ASCII but I don't see why you would want that.