Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: XSLT how do I get info from external html web page

From: "Neil Smith [MVP Digital Media]" <neil@------.--->
To: NULL
Date: 6/6/2005 5:18:00 PM
If your "plain text file" contained XML you could use the document()
function as desribed here : 

http://www-128.ibm.com/developerworks/xml/library/x-tipcombxslt/
http://www.xml.com/pub/a/2002/03/06/xslt.html

But, since your plain text is not XML (it has no closing tags for CIK
and NAME for example) you probably won't be able to extract the CIK
node content using XSLT.

It might work better though if you were able to preprocess your text
file to be actual XML 

You also *cannot* go to the web page and 'extract the title' using XML
http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000925645

It's not XML it's HTML4, and can't be read by an XML parser. So you
need to have some mechanism (such as running HTMLTidy on your web
server) to convert the HTML to real XML, or you need some other
scripting language such as ASP or PHP or ..... to extract the title
element from the HTML. PHP can execute fopen(url) on that web page,
you can then use a regular expression to extract the title.

Cheers - Neil


On Mon, 6 Jun 2005 08:07:17 -0700, "Frank"
<Frank@d...> wrote:

>I am stepping through an xslt stylesheet and transforming an xml file into 
>html output.  At some point in the xslt file I need to go to a plain text 
>file which is in the same location as the xslt file (C:\data\) and retrieve 
>the CIK.  The text file looks like this:
>
><SUBJECT-COMPANY> 
>      <CIK> 0000925645
>      <NAME> Big Corporation
></SUBJECT-COMPANY>
>
>and then I need to go to the webpage below and grab the Title value and plug 
>that into my html output.
>
>http://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000925645
>
>Any ideas?



transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent