Altova Mailing List Archives>Archive Index >comp.text.xml Archive Home >Recent entries >Thread Prev - XSL for removing words less than 4 letters in a sitemap >Thread Next - Re: XSL for removing words less than 4 letters in a sitemap Re: XSL for removing words less than 4 letters in a sitemapTo: NULL Date: 4/2/2008 1:35:00 PM Olagato wrote: > I need to transform this: > > <urlset xmlns="http://www.google.com/schemas/sitemap/0.84"> > <url> > <loc>http://localhost/index.php/index./Paths-for-the-extreme-player</ > loc> > </url> > <url> > <loc>http://localhost/index.php/index.php/Games/The-edge-of-the- > wall</loc> > </url> > </urlset> > > into this: > > <urlset xmlns="http://www.google.com/schemas/sitemap/0.84"> > <url> > <loc>http://localhost/index.php/index./Books/Paths-for-the- > extreme-player</loc> > <news:news> > <news:keywords>Books, Paths, extreme, player</ > news:keywords> > </news:news> > </url> > <url> > <loc>http://localhost/index.php/index.php/Games/The-edge-of-the- > wall</loc> > <news:news> > <news:keywords>Games, edge, wall</news:keywords> > </news:news> > </url> > </urlset> > > I mean, I need a template for creating a <news:keywords> tag which > contents all the words from <loc> tag with words of more than 3 > letters. Do you want to use XSLT 2.0 or 1.0? What about words like 'localhost' or 'index', how do you decide that those are not taken? Here is an XSLT 2.0 stylesheet that should show you an approach using the tokenize method: <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:news="http://example.com/2008/news" xmlns:sm="http://www.google.com/schemas/sitemap/0.84" exclude-result-prefixes="sm" version="2.0"> <xsl:output method="xml" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="@* | node()"> <xsl:copy> <xsl:apply-templates select="@* | node()"/> </xsl:copy> </xsl:template> <xsl:template match="sm:url"> <xsl:copy> <xsl:apply-templates select="@* | node()"/> <news:news> <news:keywords> <xsl:value-of select="for $s in tokenize(sm:loc, '/')[position() > 5] return tokenize($s, '[\-/]')[string-length(.) > 3]" separator=", "/> </news:keywords> </news:news> </xsl:copy> </xsl:template> </xsl:stylesheet> Result with Saxon 9 when run against your posted input sample (with a 'root' element added and a namespace choosen for the 'news' prefix) is <root> <urlset xmlns="http://www.google.com/schemas/sitemap/0.84"> <url> <loc>http://localhost/index.php/index./Paths-for-the-extreme-player</loc> <news:news xmlns:news="http://example.com/2008/news"> <news:keywords>Paths, extreme, player</news:keywords> </news:news> </url> <url> <loc>http://localhost/index.php/index.php/Games/The-edge-of-the-wall</loc> <news:news xmlns:news="http://example.com/2008/news"> <news:keywords>Games, edge, wall</news:keywords> </news:news> </url> </urlset> </root> -- Martin Honnen http://JavaScript.FAQTs.com/ | ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
