Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: Is it possible NOT to replace entity references?

From: Stephan Hoffmann <shh@----.--.--->
To: NULL
Date: 9/6/2005 12:54:00 AM
Hi,

thanks for the detailed explanation.

You are right, these are two 'issues', I confused them because
the Python SAX parser I use replaces both the predefined and the not
predefined entity references, which is ok. I simply assumed an XSLT
processor would also replace both, but that assumption is probably wrong.

I don't know why I prefer &auml; over 'ä', maybe because 7-bit
ASCI seems to be more portable, but I can't really find a use case
where 'ä' would be less portable.

Thanks, Stephan

Martin Honnen wrote:
> 
> 
> Stephan Hoffmann wrote:
> 
> 
>> I use XML mainly as a source for HTML. HTML browsers 'know'
>> certain entity references like &eacute; or &auml;.
>> 
>> When I use XSL to transform XML to HTML or XML, these entities are
>> replaced by what they refer to.
>> 
>> Is there a way to avoid that?
> 
> XSLT/XPath 1.0 at least which is the current version and the one
> implemented by lots of processors and in wide-spread use does not
> provide anything in its data model or in its instructions to create
> entity references and to ensure that these are preserved and not
> replaced by the entity content when the result of a transformation is
> serialized.
> You would need to look at a specific XSLT processor and check whether it
> provides any mechanisms outside the standards to deal with entity and
> entity references.
> Saxon 6 has an extension function documented here:
> <http://saxon.sourceforge.net/saxon6.5.4/extensions.html#saxon:entity-ref>
> 
>> Two reasons to avoid that:
>> - On my linux machine xsltproc replaced the entities in a way that
>> my browser did not correctly display the resulting HTML
>> (I updated my linux distribution and it now works).
>> 
>> - &lt; is replaced by < and the output is no longer valid XML/HTML
> 
> But &lt; and &gt; are references to entities predefined in XML and
> certainly if any application supposed to output XML or HTML outputs &lt;
> as a plain '<' character then the application is seriously broken.
> This is a different issue, those characters '<' and '>' are obviously
> special as they delimit tags in both XML and HTML and therefore need to
> be escaped as &lt; respectively &gt;.
> &auml; in HTML 4 stands for the character 'ä' and that has no special
> meaning in XML or HTML so if an XSLT processor or other application
> supposed to output XML or HTML simply inserts 'ä' instead of &auml; in a
> document properly encoded and with the proper encoding used and declared
> then there are no problems with well-formedness (or even validity).
> 
> 
> 



transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent