Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: [xml-dev] converting character entities to us-ascii /equivalents/

From: Robert Koberg <rob@------.--->
To: Michael Kay <michael.h.kay@--------.--->
Date: 10/6/2004 10:56:00 PM
Michael Kay wrote:

> If there's a limited number of non-ASCII characters you need to handle, you
> can use character maps in the XSLT 2.0 serializer.

Sorry, I should have specified that I am using v1.0. I intend to move to 
v2.0 sometime soon, but have not had the time to learn it and convert 
all of my stylesheets.

But that is good to know.

thanks,
-Rob

> 
> Michael Kay
> http://www.saxonica.com/
> 
> 
>>-----Original Message-----
>>From: Robert Koberg [mailto:rob@k...] 
>>Sent: 06 October 2004 22:56
>>To: XML Developers List
>>Subject: [xml-dev] converting character entities to us-ascii 
>>/equivalents/
>>
>>Hi,
>>
>>I need to output several versions of a page (through XSL 
>>transformations), one of which is us-ascii (for email). But, 
>>the content 
>>might contain some characters that are not supported by 
>>us-ascii (like 
>>em dash - &#151;).
>>
>>I want the character entities to remain in the content. When 
>>transforming to us-ascii, I want to replace the entities with 
>>some ascii 
>>text equivalent: For example, '&#151;' would get converted to '--'.
>>
>>The XML is pulled into the transformation through the 
>>document function 
>>using a custom URIResolver.
>>
>>Is there an existing solution to this?
>>
>>Does Apache's FOP and the text renderer handle this type of thing?
>>
>>I have tried to set a ContentHandler (actually a 
>>DefaultHandler) on the 
>>XMLReader and tried to replace a character entity, but I am doing 
>>something wrong and a confused on how to proceed. Using the 
>>code below I 
>>get a recoverable error using saxon/aelfred and a failure when using 
>>saxon/xerces.
>>
>>Here is a snippet from the URIResolver:
>>
>>
>>InputSource in = new InputSource(file.getAbsolutePath());
>>SAXSource source = new SAXSource(in);
>>XMLReader reader = null;
>>try {
>>   reader = 
>>XMLReaderFactory.createXMLReader("com.icl.saxon.aelfred.SAXDriver");
>>   //reader = 
>>XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SA
>>XParser");
>>} catch (SAXException e) {
>>   System.err.println(e.getMessage());
>>}
>>
>>reader.setContentHandler(new AsciiHandler());
>>
>>source.setXMLReader(reader);
>>
>>return source;
>>
>>
>>
>>And the DefaultHandler has one method:
>>
>>
>>public void characters(char[] text, int start, int length) {
>>
>>   String str = new String(text, start, length);
>>   if (str.indexOf(174) > -1) {
>>    str.replaceAll("\u00AE", "(Registered Trademark)");
>>   }
>>   text = str.toCharArray();
>>}
>>
>>How can I do this? Is there a better way to handle this type of thing?
>>
>>thanks,
>>-Rob
>>
>>
>>-----------------------------------------------------------------
>>The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
>>initiative of OASIS <http://www.oasis-open.org>
>>
>>The list archives are at http://lists.xml.org/archives/xml-dev/
>>
>>To subscribe or unsubscribe from this list use the subscription
>>manager: <http://www.oasis-open.org/mlmanage/index.php>
>>
>>
>


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent