Altova Mailing List Archives>Archive Index >comp.text.xml Archive Home >Recent entries >Thread Prev - Re: Parsing an HTML table with XML [Thread Next] Re: Parsing an HTML table with XMLTo: NULL Date: 7/5/2006 5:39:00 PM
Andy Dingley <dingbat@c...> wrote:
> Rick Walsh wrote:
>
>>I have an HTML table in the following format:
>>
>><table>
>><tr><td>Header 1</td><td>Header 2</td></tr>
>><tr><td>1</td><td>2</td></tr>
>
>
>>However, I dont want to grab the very first row - because this isnt
>>data!
>
>
> Then code it with <th>, not <td>
>
> If this table isn't under your control, then be carweful of parsing it
> with an XML parser -- HTML isn't XML (XHTML on the web usually isn't
> either). It's not a good assumption to make if you're trying to build
> robust code - something as simple as an embedded might break it.
>
For this purpose, use an HTML parser ; I personally use neko HTML that I
have included in the RefleX toolkit ; with RefleX, parsing an HTML file
is as simple as parsing an XML file :
http://reflex.gforge.inria.fr/tips.html#N80178E
(section : HTML to XML)
example :
<!--parse a non-well-balanced HTML file to XML-->
<xcl:parse-html name="htmlFile" source="file:///path/to/file.html"/>
<!--apply a stylesheet to it-->
<xcl:transform output="file:///path/to/new-file.html" source="{
$htmlFile }"
stylesheet="file:///path/to/stylesheet.xsl">
of course, you could select with XPath the tag to transform, say the
<body> tag of the parsed HTML ; something like this :
<xcl:transform output="file:///path/to/new-file.html" source="{
$htmlFile/html/body }"
stylesheet="file:///path/to/stylesheet.xsl">
--
Cordialement,
///
(. .)
--------ooO--(_)--Ooo--------
| Philippe Poulard |
-----------------------------
http://reflex.gforge.inria.fr/
Have the RefleX !
| ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
