| sy27295 |
| Member |
|
| USA |
|
|
| None Specified |
|
| Tuesday, September 22, 2009 |
| Saturday, November 21, 2009 9:47:13 PM |
15 [0.08% of all post / 0.00 posts per day] |
|
Bummer..... Of course, I was doing this one: <?xml version="1.0" encoding="ISO-8859-1" ?>
Was not aware of this: <xsl:output method="text" encoding="ISO-8859-1"/>
And the second one solved the problem; and I hate XML a little bit more; every problem is solved with some solution that is so difficult to find, unless you go to a remote island with tons of O'Reily Books to read them from cover to cover.
I wrote the code to clean up the weird characters already...and then saw your posting (2 hours wasted)
Code: Set oSource = oFs.OpenTextFile(sCsvPathFile, 1) Set oDest = oFs.CreateTextFile(sTextPathFile, True) Do Until oSource.atEndOfStream iDel = -1: sStr = "": iRow = iRow + 1: Application.StatusBar = Now & " transforming " & sCsvPathFile & " " & iRow Do While True If oSource.atEndOfStream Then Exit Do sChar = oSource.read(1) If sChar = "|" Then iDel = iDel + 1 If iDel = iFields And sChar = "|" Then oDest.writeLine sStr: Exit Do Else sChar = Replace(sChar, Chr(255), "") sChar = Replace(sChar, Chr(254), "") sChar = Replace(sChar, Chr(0), "") sChar = Replace(sChar, Chr(13), "") sChar = Replace(sChar, Chr(10), "") sStr = sStr & sChar End If Loop Loop
|
This is the beginning of the XML File: <?xml version="1.0" encoding="UTF-8" ?><extract createDate="2009-08-14"><manufacturerList><manufacturer id="ABB">Abbott Laboratories Limited</manufacturer><manufacturer id="ABH">Manufacturer name not currently available</manufacturer><manufacturer id="ABI">Abiogen Pharma</manufacturer><manufacturer id="ABP">Manufacturer name not currently available</manufacturer><manufacturer id="ABT">Albert Pharma Inc - " DO NOT USE "</manufacturer><manufacturer id="ACT">Actelion ....
The Entire File is here: http://www.HumanGenome.org/msdn/ontData.xml
This is an example xslt template file:
Code: <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"> <xsl:output method="text" />
<xsl:template match="/*"> <xsl:apply-templates /> </xsl:template>
<xsl:template match="//extract/manufacturerList/manufacturer"> <xsl:value-of select="@id"/>|<xsl:value-of select="."/><xsl:text>
</xsl:text> </xsl:template>
<xsl:template match="text()" />
</xsl:stylesheet>
The output comes out ok, datawise. Everything is in a columnar format values are separated with the delimeter "|". The problem is that the beginning of each line has char(255) and char(254) and each legit character of the text has char(0)/null character right after it.
I tried >>>encoding="ISO-8859-1"<<< instead of UTF-8; still the same result. I strip out the characters that I hate with an intermediary process; but why is it happening? Who is to blame here? I used this syntax of XSLT and the transformation for other countries' files, everthing was ASCII (or nice unicode, I think).
Please, gimme more reason to like XML!
|
Vlad
Biiiiiiiiingo. You have no idea how much experimentation I did on this thing.
It is fair!!! a) I did spend good money from upstairs (LP) on Altova (downstairs) at Cummings. If you don't believe me ask your networking guy [DT]. (ps. he is a good guy). We are are supporters of Altova.
I love to solve problems without complicated tools.
Thank you so much..... I am happy until the next problem.
|
I cannot figure it out for the life of me.... I know it is something simple for you guys. (sample output is hand-constructed; hopefully did not make any mistakes)
This is the XSLT that I could come up with: ++++++start++++++++++++++++++++++++++++++++++++
Code:<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"> <xsl:output method="text" /> <xsl:template match="/"> <xsl:apply-templates /> </xsl:template> <xsl:template match="/extract/formulary/pcg2/pcg6/genericName/pcgGroup/pcg9"> <xsl:value-of select="/extract/formulary/pcg2/pcg6/@id"/>|<xsl:value-of select="@id"/> <xsl:text>
</xsl:text> </xsl:template> <xsl:template match="text()" /> </xsl:stylesheet> ++++++end++++++++++++++++++++++++++++++++++++
I get this: (00005 is the first parent @id repeated) 00005|040000086 00005|040000087 00005|040000014 00005|040000013 00005|040000084 00005|040000061 00005|040000022
But I want to get this with nested loops [two templates, maybe?] 00005|040000086 00005|040000087 00002|040000014 00002|040000013 01195|040000084 00015|040000061 00003|040000022
This is where the entire XML file is: http://www.HumanGenome.org/msdn/ontData.xml
Here is the relevant section:
Code:- <extract createDate="2009-08-14"> ...... <manufacturer id="ZYN">Zymcan Pharmaceuticals Inc.</manufacturer> </manufacturerList> - <formulary edition="4101" updateVer="K" formularyDate="2009-08-18" createDate="2009-08-14"> - <pcg2 id="040000000"> <name>ANTIHISTAMINICS</name> - <pcg6> - <genericName id="00005"> <name>CETIRIZINE HYDROCHLORIDE</name> - <pcgGroup> - <pcg9 id="040000086"> <itemNumber>0001</itemNumber> <strength>10mg</strength> <dosageForm>Tab</dosageForm> - <drug id="02223554" notABenefit="Y" sec3="Y"> <name>Reactine</name> <manufacturerId>MCL</manufacturerId> <listingDate>2007-12-19</listingDate> </drug> - <drug id="02231603" notABenefit="Y" sec3="Y"> <name>Apo-Cetirizine</name> <manufacturerId>APX</manufacturerId> <listingDate>2007-12-19</listingDate> </drug> </pcg9> - <pcg9 id="040000087"> <itemNumber>0002</itemNumber> <strength>20mg</strength> <dosageForm>Tab</dosageForm> - <drug id="01900978" notABenefit="Y" sec3="Y"> <name>Reactine</name> <manufacturerId>MCL</manufacturerId> <listingDate>2008-12-23</listingDate> </drug> - <drug id="02315963" notABenefit="Y" sec3="Y"> <name>PMS-Cetirizine</name> <manufacturerId>PMS</manufacturerId> <listingDate>2008-12-23</listingDate> </drug> </pcg9> </pcgGroup> </genericName> - <genericName id="00002"> <name>DIPHENHYDRAMINE HCL</name> - <pcgGroup> - <pcg9 id="040000014" suppliedBy="L"> <itemNumber>0003</itemNumber> <strength>25mg</strength> <dosageForm>Cap</dosageForm> - <drug id="00022756" notABenefit="Y" sec3="Y"> <name>Benadryl</name> <manufacturerId>PDA</manufacturerId> <listingDate>1996-10-01</listingDate> </drug> - <drug id="00370517" notABenefit="Y" sec3="Y"> <name>Allerdryl</name> <manufacturerId>VAL</manufacturerId> <listingDate>1996-10-01</listingDate> </drug> </pcg9> - <pcg9 id="040000013" suppliedBy="L"> <itemNumber>0004</itemNumber> <strength>50mg</strength> <dosageForm>Cap</dosageForm> - <drug id="00022764" notABenefit="Y" sec3="Y"> <name>Benadryl</name> <manufacturerId>PDA</manufacturerId> <listingDate>1996-10-01</listingDate> </drug> - <drug id="00271411" notABenefit="Y" sec3="Y"> <name>Allerdryl</name> <manufacturerId>VAL</manufacturerId> <listingDate>1996-10-01</listingDate> </drug> </pcg9> </pcgGroup> </genericName> .....
|
|