Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


[xsl] character entities

From: Joe Barwell <jbar@-------->
To:
Date: 11/3/2008 7:33:00 AM
Hello people,



xsl 1.0, Firefox 3.0, Zend Search Lucene, php 5.2.6.



I'm having a wee spot of bother with character entities.



What I'm trying to do:



I have data stored in xml files, which I first pass to an xsl template
in order to transform it into a more usable form (technically, I'm
"flattening" it).

This data is then put into fields within a Zend Search Lucene index, via
php (that's why I first "flattened" it).

This index data is then queried (again via php) and the results sent
to/rendered by a browser.

If I put &#241_; (minus the underline character, which I've added so
this email is not mis-parsed) in my original xml, and using
encoding="iso-8859-1" for it and my xsl stylesheet, then my xsl
transforms that into a (Spanish) n character with a tilde on top: q.

If I tell ZSL to index fields using 'iso-8859-1' encoding, my Spanish n
becomes: CB1. If I tell ZSL to index fields using 'utf-8' encoding, my
Spanish n becomes: C1.

I've looked at dpawson on encoding, and Mike Brown's tutorial at
skew.org. They're v. good, but don't quite seem to explain where I'm
going wrong (or more likely, I'm just oblivious to what's under my nose).

I believe I need to prevent all parsers bar the browser at the end from
parsing my "special characters", right? But how?

I have tried putting a dtd with an entity declaration inside my original
xml, but although that works--i.e. using:

<!DOCTYPE wine [
<!ENTITY ntilde "&#241;">
]>

I can then put: &ntilde; inside my xml, this still gets parsed into: q
by my xsl, & then stored as: C1 in lucene, and displayed as: C1 in my
browser.

I've also tried playing around with php's htmlspecialchars() function,
to no avail.

Latest effort: I tried using encoding="utf-8" for all levels: my
original xml, my xsl output, and the input to ZSL's index, & I also
saved my xml file as utf-8 format, and used the Spanish n inside my xml,
i.e. q rather than &#241;. Doing that, the Spanish n was preserved
through the xsl output, but ZSL stores it as: C1, & that's also how my
browser displays it.

I've run out of ideas. Any suggestions? Ta.



Joe


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent