Altova Mailing List Archives>Archive Index >comp.text.xml Archive Home >Recent entries >Thread Prev - HTML Parsing Question >Thread Next - Re: HTML Parsing Question Re: HTML Parsing QuestionTo: NULL Date: 1/2/2007 2:09:00 AM Stefan Kleineikenscheidt wrote: > Hi all, > > i'm trying to convert an HTML page to a hierachical structure, but I am > stuck. Consider a page like that: > > <h1>First Heading1</h1> > <p>some text</p> > <p>more text</p> > > <h2>First Heading2</h2> > <p>more text</p> > > <h2>Second Heading2</h2> > ... > <h1>Second Heading1</h1> > ... > <h2>Third Heading2</h2> > ... First of all you would need to make it well-formed XHTML (use W3C Tidy for that). This ensures that any subsequent XSLT process won't gag. > This is my 'h1' template, where i try to process all elements between > two 'h1' elements: > > <xsl:template match="//h:h1"> > <section> > <title><xsl:value-of select="text()" /></title> > <xsl:variable name="nexth1" select="position(parent::*/*[(name() > = 'h1')])" /> > <xsl:apply-templates select="following-sibling::*[position() > <= $nexth1]" /> > </section> > </xsl:template> <?xml version="1.0" encoding="iso-8859-1"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="xml" indent="yes"/> <xsl:template match="h1|h2|h3|h4"> <xsl:variable name="id" select="generate-id(.)"/> <xsl:variable name="level"> <xsl:value-of select="number(translate(name(),'h',''))"/> </xsl:variable> <xsl:variable name="gi" select="name()"/> <xsl:element name="{concat('sect',$level)}"> <xsl:attribute name="id" select="$id"/> <title> <xsl:apply-templates/> </title> <xsl:apply-templates select="following-sibling::* [generate-id(preceding-sibling::*[name()=$gi][1])=$id] [not(substring(name(),1,1)='h' and name()!='hr' and number(translate(substring(name(),1,1),'h',''))<$level)] [not(number(translate(name(preceding-sibling::*[substring(name(),1,1)='h' and name()!='hr'][1]),'h',''))<$level)]"/> </xsl:element> </xsl:template> <xsl:template match="p"> <para> <xsl:apply-templates/> </para> </xsl:template> </xsl:stylesheet> This needs some more work: it's not subsetting out the higher-level H* element types, but I've run out of time here. ///Peter -- XML FAQ: http://xml.silmaril.ie/ | ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
