Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: XSLT how do I get info from external html web page

From: "Neil Smith [MVP Digital Media]" <neil@------.--->
To: NULL
Date: 6/6/2005 8:45:00 PM
Well, lets try to stick with XSL for a few more moments : Are you in a
position where you can get your document to read like this ?

<?xml version="1.0"?>
<SUBJECT-COMPANY> 
	<CIK> 0000925645</CIK>
	<NAME> Big Corporation</NAME>
	<BARF>My barf node </BARF>
</SUBJECT-COMPANY>

If you can get it to there then you can process it (within certain
limits). So let's say you save that as docinclude.xml

Then (let's assume it's in the same directory as the XSL file, to make
the file path simple) you can do stuff like this as your XSL : 


<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" />

<xsl:variable name="lookupDoc" select="document('docinclude.xml')"/>

<xsl:template match="/">
    <html>
        <body>
            <p>The value of the CIK node was : 
                <xsl:value-of select="$lookupDoc/SUBJECT-COMPANY/CIK"
/>
            </p>
        </body>
    </html>
</xsl:template>

</xsl:stylesheet>

So, there you can see you've loaded the document 'docinclude.xml' into
a variable (it still has to be valid XML though). And then you can
reference the nodes inside it using XPath just as if they were in your
originating XML document - it's a useful way to include external
resources into something you're processing.

Regarding the URL source you've no real option if you want to process
it using an XML tool, to convert it into valid XML.

There's an example of usinh HTMLTidy with VB (in Frontpage) here : 
http://www.suodenjoki.dk/us/productions/articles/tidy_integration_article.htm

HTH
Chers - Neil


On Mon, 6 Jun 2005 11:46:03 -0700, "Frank"
<Frank@d...> wrote:

>Thanks!  That explains why I couldn't get the document() function to work.  
>Can I use VBScript while stepping through the XSLT file?  And, if so, could 
>VBScript do the job?
>
>"Neil Smith [MVP Digital Media]" wrote:
>
>> If your "plain text file" contained XML you could use the document()
>> function as desribed here : 
>> 
>> http://www-128.ibm.com/developerworks/xml/library/x-tipcombxslt/
>> http://www.xml.com/pub/a/2002/03/06/xslt.html
>> 
>> But, since your plain text is not XML (it has no closing tags for CIK
>> and NAME for example) you probably won't be able to extract the CIK
>> node content using XSLT.
>> 
>> It might work better though if you were able to preprocess your text
>> file to be actual XML 
>> 
>> You also *cannot* go to the web page and 'extract the title' using XML
>> http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000925645
>> 
>> It's not XML it's HTML4, and can't be read by an XML parser. So you
>> need to have some mechanism (such as running HTMLTidy on your web
>> server) to convert the HTML to real XML, or you need some other
>> scripting language such as ASP or PHP or ..... to extract the title
>> element from the HTML. PHP can execute fopen(url) on that web page,
>> you can then use a regular expression to extract the title.
>> 
>> Cheers - Neil
>> 
>> 
>> On Mon, 6 Jun 2005 08:07:17 -0700, "Frank"
>> <Frank@d...> wrote:
>> 
>> >I am stepping through an xslt stylesheet and transforming an xml file into 
>> >html output.  At some point in the xslt file I need to go to a plain text 
>> >file which is in the same location as the xslt file (C:\data\) and retrieve 
>> >the CIK.  The text file looks like this:
>> >
>> ><SUBJECT-COMPANY> 
>> >      <CIK> 0000925645
>> >      <NAME> Big Corporation
>> ></SUBJECT-COMPANY>
>> >
>> >and then I need to go to the webpage below and grab the Title value and plug 
>> >that into my html output.
>> >
>> >http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000925645
>> >
>> >Any ideas?
>> 
>> 



transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent