Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: [xsl] Invalid byte 2 of 2-byte UTF-8 sequence exception while transforming

From: Abel Braaksma <abel.online@--------->
To:
Date: 2/2/2007 12:25:00 PM
Pankaj Bishnoi wrote:
Hi Owen
                The file starts with <?xml version="1.0"
encoding="ISO-8859-1"?> so i think before transforming the encoding of file
is changed to UTF-8(Default encoding for Xalan transformer) and since UTF-8
encoded file cannot contain ISO-8859-1 characters so this might be the cause
of this problem i am still debugging it.

No, UTF-8 is an encoding for Unicode, which can handle all characters 
fro ISO-8859-1.



If you use Eclipse, you can test the "looks" of your file as follows:



1. Open the XML file as-is.

2. Right-click the file in the Navigator and click Properties

3. Check "Default (determined from content: ISO-8859-1)"  (I mean: check 
what it says there, it should show "ISO-8859-1")

4. Read through your file carefully if you see any small squares 
(Eclipse's way of showing unknown chars, chars not in the font, or chars 
that are illegal), if there are some, your file contains illegal encodings.

5. It may be that as the result of illegal characters, Xalan tries to 
read it as UTF-8 (because that is the default for XML), but ISO-8859-1 
and UTF-8 are not the same for characters above codepoint 127, and for 
these characters it may give this error.

6. Go again to the Properties, and type manually "UTF-8". Check again 
for any little squares.

7. Make a little change, and change the encoding string to "UTF-8". 
Eclipse will automatically and correctly save it as UTF-8 now. Change it 
back to ISO-8859-1. Eclipse will replace any character that is not 
allowed in ISO-8859-1 with a "?" char. Close and open it to see if it 
has such changed chars.



If you don't have Eclipse, you can use a text editor where you can 
select and override the encoding. Even a browser will give you some 
hints on illegal characters when you select another encoding using the 
View menu. If you have an editor where you can search with regular 
expressions, search your document with the following expression (or the 
equivalent for your regex dialect):



[^\t\n\r\x20-\x79]+



it will give you all "character suspects" that may have gotten the wrong 
encoding when saving the file. In fact, it gives you all characters that 
are not allowed in XML when you were to encode your file as US-ASCII 
(one of the most basic character sets and the first 127 codepoints are 
equal to all IS0-8859-X and UTF-8 and many other character sets). 
Testing all these suspects one by one (by removing/changing them), you 
will quickly find the problem character.



Good luck researching!



Cheers,
-- Abel Braaksma
  http://www.nuntia.nl


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent