|
|
Rank: Member
Joined: 9/22/2009 Posts: 15 Location: USA
|
This is the beginning of the XML File: <?xml version="1.0" encoding="UTF-8" ?><extract createDate="2009-08-14"><manufacturerList><manufacturer id="ABB">Abbott Laboratories Limited</manufacturer><manufacturer id="ABH">Manufacturer name not currently available</manufacturer><manufacturer id="ABI">Abiogen Pharma</manufacturer><manufacturer id="ABP">Manufacturer name not currently available</manufacturer><manufacturer id="ABT">Albert Pharma Inc - " DO NOT USE "</manufacturer><manufacturer id="ACT">Actelion ....
The Entire File is here: http://www.HumanGenome.org/msdn/ontData.xml
This is an example xslt template file:
Code: <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"> <xsl:output method="text" />
<xsl:template match="/*"> <xsl:apply-templates /> </xsl:template>
<xsl:template match="//extract/manufacturerList/manufacturer"> <xsl:value-of select="@id"/>|<xsl:value-of select="."/><xsl:text>
</xsl:text> </xsl:template>
<xsl:template match="text()" />
</xsl:stylesheet>
The output comes out ok, datawise. Everything is in a columnar format values are separated with the delimeter "|". The problem is that the beginning of each line has char(255) and char(254) and each legit character of the text has char(0)/null character right after it.
I tried >>>encoding="ISO-8859-1"<<< instead of UTF-8; still the same result. I strip out the characters that I hate with an intermediary process; but why is it happening? Who is to blame here? I used this syntax of XSLT and the transformation for other countries' files, everthing was ASCII (or nice unicode, I think).
Please, gimme more reason to like XML!
|
|
Rank: Advanced Member
Joined: 7/17/2008 Posts: 185 Location: Minutiae, Triviality
|
Hi,
when you say "I tried" other encodings, can you state _where_ you set that encoding other-type?
Like this:
<?xml version="1.0" encoding="ISO-8859-1" ?>
or like this:
<xsl:output method="text" encoding="ISO-8859-1"/>
because your example doesn't have an encoding in the output method for the xsl, I have to wonder if you were putting it in the right place.
I'm assuming that if you _could_, you _would_ have attached an example of the file data in question.
|
|
Rank: Member
Joined: 9/22/2009 Posts: 15 Location: USA
|
Bummer..... Of course, I was doing this one: <?xml version="1.0" encoding="ISO-8859-1" ?>
Was not aware of this: <xsl:output method="text" encoding="ISO-8859-1"/>
And the second one solved the problem; and I hate XML a little bit more; every problem is solved with some solution that is so difficult to find, unless you go to a remote island with tons of O'Reily Books to read them from cover to cover.
I wrote the code to clean up the weird characters already...and then saw your posting (2 hours wasted)
Code: Set oSource = oFs.OpenTextFile(sCsvPathFile, 1) Set oDest = oFs.CreateTextFile(sTextPathFile, True) Do Until oSource.atEndOfStream iDel = -1: sStr = "": iRow = iRow + 1: Application.StatusBar = Now & " transforming " & sCsvPathFile & " " & iRow Do While True If oSource.atEndOfStream Then Exit Do sChar = oSource.read(1) If sChar = "|" Then iDel = iDel + 1 If iDel = iFields And sChar = "|" Then oDest.writeLine sStr: Exit Do Else sChar = Replace(sChar, Chr(255), "") sChar = Replace(sChar, Chr(254), "") sChar = Replace(sChar, Chr(0), "") sChar = Replace(sChar, Chr(13), "") sChar = Replace(sChar, Chr(10), "") sStr = sStr & sChar End If Loop Loop
|
|
Rank: Advanced Member
Joined: 7/17/2008 Posts: 185 Location: Minutiae, Triviality
|
Or, you could read the spec.
To read: http://www.w3.org/TR/2008/REC-xml-20081126/ http://www.w3.org/TR/2004/REC-xmlschema-0-20041028/ http://www.w3.org/TR/2007/REC-xpath20-20070123/ http://www.w3.org/TR/2007/REC-xslt20-20070123/
Not to read, to refer to: http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/ http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/
You might also consider working through the various modules in w3schools.com
|
|
|
guest |