IMPORTANT:
this is not a Support Forum! Experienced users might answer from time to time questions posted here. If you need a professional and reliable answer, or if you want to report a bug, please contact Altova Support instead.

XSLT Processing - weird encoding issue Options · View
sy27295
Posted: Saturday, November 21, 2009 6:17:41 PM
Rank: Member

Joined: 9/22/2009
Posts: 15
Location: USA
This is the beginning of the XML File:
<?xml version="1.0" encoding="UTF-8" ?><extract createDate="2009-08-14"><manufacturerList><manufacturer id="ABB">Abbott Laboratories Limited</manufacturer><manufacturer id="ABH">Manufacturer name not currently available</manufacturer><manufacturer id="ABI">Abiogen Pharma</manufacturer><manufacturer id="ABP">Manufacturer name not currently available</manufacturer><manufacturer id="ABT">Albert Pharma Inc - &quot; DO NOT USE &quot;</manufacturer><manufacturer id="ACT">Actelion ....

The Entire File is here:
http://www.HumanGenome.org/msdn/ontData.xml

This is an example xslt template file:
Code:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:output method="text" />

<xsl:template match="/*">
<xsl:apply-templates />
</xsl:template>

<xsl:template match="//extract/manufacturerList/manufacturer">
<xsl:value-of select="@id"/>|<xsl:value-of select="."/><xsl:text>&#xa;</xsl:text>
</xsl:template>

<xsl:template match="text()" />

</xsl:stylesheet>


The output comes out ok, datawise. Everything is in a columnar format values are separated with the delimeter "|".
The problem is that the beginning of each line has char(255) and char(254) and each legit character of the text has char(0)/null character right after it.

I tried >>>encoding="ISO-8859-1"<<< instead of UTF-8; still the same result. I strip out the characters that I hate with an intermediary process; but why is it happening? Who is to blame here? I used this syntax of XSLT and the transformation for other countries' files, everthing was ASCII (or nice unicode, I think).

Please, gimme more reason to like XML!
rip
Posted: Saturday, November 21, 2009 9:10:40 PM
Rank: Advanced Member

Joined: 7/17/2008
Posts: 185
Location: Minutiae, Triviality
Hi,

when you say "I tried" other encodings, can you state _where_ you set that encoding other-type?

Like this:

<?xml version="1.0" encoding="ISO-8859-1" ?>

or like this:

<xsl:output method="text" encoding="ISO-8859-1"/>

because your example doesn't have an encoding in the output method for the xsl, I have to wonder if you were putting it in the right place.

I'm assuming that if you _could_, you _would_ have attached an example of the file data in question.
sy27295
Posted: Saturday, November 21, 2009 9:47:13 PM
Rank: Member

Joined: 9/22/2009
Posts: 15
Location: USA
Bummer.....
Of course, I was doing this one:
<?xml version="1.0" encoding="ISO-8859-1" ?>

Was not aware of this:
<xsl:output method="text" encoding="ISO-8859-1"/>

And the second one solved the problem; and I hate XML a little bit more; every problem is solved with some solution that is so difficult to find, unless you go to a remote island with tons of O'Reily Books to read them from cover to cover.

I wrote the code to clean up the weird characters already...and then saw your posting (2 hours wasted)
Code:

    Set oSource = oFs.OpenTextFile(sCsvPathFile, 1)
    Set oDest = oFs.CreateTextFile(sTextPathFile, True)
    Do Until oSource.atEndOfStream
        iDel = -1: sStr = "":   iRow = iRow + 1:
            Application.StatusBar = Now & " transforming " & sCsvPathFile & " " & iRow
        Do While True
            If oSource.atEndOfStream Then Exit Do
            sChar = oSource.read(1)
            If sChar = "|" Then iDel = iDel + 1
            If iDel = iFields And sChar = "|" Then
                oDest.writeLine sStr: Exit Do
            Else
                sChar = Replace(sChar, Chr(255), "")
                sChar = Replace(sChar, Chr(254), "")
                sChar = Replace(sChar, Chr(0), "")
                sChar = Replace(sChar, Chr(13), "")
                sChar = Replace(sChar, Chr(10), "")
                sStr = sStr & sChar
            End If
        Loop
    Loop

rip
Posted: Sunday, November 22, 2009 6:51:43 AM
Rank: Advanced Member

Joined: 7/17/2008
Posts: 185
Location: Minutiae, Triviality
Or, you could read the spec.

To read:
http://www.w3.org/TR/2008/REC-xml-20081126/
http://www.w3.org/TR/2004/REC-xmlschema-0-20041028/
http://www.w3.org/TR/2007/REC-xpath20-20070123/
http://www.w3.org/TR/2007/REC-xslt20-20070123/

Not to read, to refer to:
http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/
http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/

You might also consider working through the various modules in w3schools.com


Users browsing this topic
guest

Forum Jump
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Use of the Altova User Forum(s) is governed by the Altova Terms of Use.