Altova Mailing List Archives>Archive Index >microsoft.public.xsl Archive Home >Recent entries >Thread Prev - Re: XSLT how do I get info from external html web page [Thread Next] Re: XSLT how do I get info from external html web pageTo: NULL Date: 6/6/2005 8:45:00 PM Well, lets try to stick with XSL for a few more moments : Are you in a
position where you can get your document to read like this ?
<?xml version="1.0"?>
<SUBJECT-COMPANY>
<CIK> 0000925645</CIK>
<NAME> Big Corporation</NAME>
<BARF>My barf node </BARF>
</SUBJECT-COMPANY>
If you can get it to there then you can process it (within certain
limits). So let's say you save that as docinclude.xml
Then (let's assume it's in the same directory as the XSL file, to make
the file path simple) you can do stuff like this as your XSL :
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" />
<xsl:variable name="lookupDoc" select="document('docinclude.xml')"/>
<xsl:template match="/">
<html>
<body>
<p>The value of the CIK node was :
<xsl:value-of select="$lookupDoc/SUBJECT-COMPANY/CIK"
/>
</p>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
So, there you can see you've loaded the document 'docinclude.xml' into
a variable (it still has to be valid XML though). And then you can
reference the nodes inside it using XPath just as if they were in your
originating XML document - it's a useful way to include external
resources into something you're processing.
Regarding the URL source you've no real option if you want to process
it using an XML tool, to convert it into valid XML.
There's an example of usinh HTMLTidy with VB (in Frontpage) here :
http://www.suodenjoki.dk/us/productions/articles/tidy_integration_article.htm
HTH
Chers - Neil
On Mon, 6 Jun 2005 11:46:03 -0700, "Frank"
<Frank@d...> wrote:
>Thanks! That explains why I couldn't get the document() function to work.
>Can I use VBScript while stepping through the XSLT file? And, if so, could
>VBScript do the job?
>
>"Neil Smith [MVP Digital Media]" wrote:
>
>> If your "plain text file" contained XML you could use the document()
>> function as desribed here :
>>
>> http://www-128.ibm.com/developerworks/xml/library/x-tipcombxslt/
>> http://www.xml.com/pub/a/2002/03/06/xslt.html
>>
>> But, since your plain text is not XML (it has no closing tags for CIK
>> and NAME for example) you probably won't be able to extract the CIK
>> node content using XSLT.
>>
>> It might work better though if you were able to preprocess your text
>> file to be actual XML
>>
>> You also *cannot* go to the web page and 'extract the title' using XML
>> http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000925645
>>
>> It's not XML it's HTML4, and can't be read by an XML parser. So you
>> need to have some mechanism (such as running HTMLTidy on your web
>> server) to convert the HTML to real XML, or you need some other
>> scripting language such as ASP or PHP or ..... to extract the title
>> element from the HTML. PHP can execute fopen(url) on that web page,
>> you can then use a regular expression to extract the title.
>>
>> Cheers - Neil
>>
>>
>> On Mon, 6 Jun 2005 08:07:17 -0700, "Frank"
>> <Frank@d...> wrote:
>>
>> >I am stepping through an xslt stylesheet and transforming an xml file into
>> >html output. At some point in the xslt file I need to go to a plain text
>> >file which is in the same location as the xslt file (C:\data\) and retrieve
>> >the CIK. The text file looks like this:
>> >
>> ><SUBJECT-COMPANY>
>> > <CIK> 0000925645
>> > <NAME> Big Corporation
>> ></SUBJECT-COMPANY>
>> >
>> >and then I need to go to the webpage below and grab the Title value and plug
>> >that into my html output.
>> >
>> >http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000925645
>> >
>> >Any ideas?
>>
>>
| ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
