Altova Mailing List Archives>Archive Index >xml-dev Archive Home >Recent entries >Thread Prev - >Thread Next - RE: [xml-dev] How to parse XML document with default namespace with [xml-dev] How to parse XML document with default namespace with JDOMTo: xml-dev@-----.---.--- Date: 11/4/2008 1:10:00 PM Hi All,  I am having difficulty parsing using Saxon and TagSoup parser on a namespace html document. The relevant content of this document are as follows:  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> …….. </head> <body>    <div id="container">        <div id="content">            <table class="sresults">                <tr>                    <td>                        <a href="http://www.abc.com/areas" title="Hollywood, CA">hollywood</a>                    </td>                    <td>                        <a href="http://www.abc.com/areas" title="San Jose, CA">san jose</a>                    </td>                    <td>                        <a href="http://www.abc.com/areas" title="San Francisco, CA">san francisco</a>                    </td>                    <td>                        <a href="http://www.abc.com/areas" title="San Diego, CA">San diego</a>                    </td>              </tr> ………. </body> </html>  Below is the relevant code snippets illustrates how I have attempted to retrieve the contents (value of <a>):              import java.util.*;             import org.jdom.*;             import org.jdom.xpath.*;             import org.saxpath.*;             import org.ccil.cowan.tagsoup.Parser;  ( 1 )      frInHtml = new FileReader("C:\\Tmp\\ABC.html"); ( 2 )      brInHtml = new BufferedReader(frInHtml); ( 3 ) //   SAXBuilder saxBuilder = new SAXBuilder("org.apache.xerces.parsers.SAXParser"); ( 4 )      SAXBuilder saxBuilder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser"); ( 5 )      org.jdom.Document jdomDocument = saxbuilder.build(brInHtml); ( 6 )      XPath xpath =  XPath.newInstance("/ns:html/ns:body/ns:div[@id='container']/ns:div[@id='content']/ns:table[@class='sresults']/ns:tr/ns:td/ns:a"); ( 7 )      xpath.addNamespace("ns", "http://www.w3.org/1999/xhtml"); ( 8 )      java.util.List list = (java.util.List) (xpath.selectNodes(jdomDocument)); ( 9 )      Iterator iterator = list.iterator(); ( 10 )    while (iterator.hasNext()) ( 11 )    { ( 12 )           Object object = iterator.next(); ( 13 ) //        if (object instanceof Element) ( 14 ) //              System.out.println(((Element)object).getTextNormalize()); ( 15 )            if (object instanceof Content) ( 16 )                  System.out.println(((Content)object).getValue());              } ….  This program would work on the same document without the default namespace, hence, it would not be necessary to include “ns†prefix along in the XPath statements (line 6-7) either. Moreover, I was using “org.apache.xerces.parsers.SAXParser†to have successfully retrieve content of <a> from the same document without default namespace in the past.  I would like to achieve the following objectives if possible:  ( i ) Exclude DTD and namespace in order to simplifying the parsing process. How this could be done? ( ii ) If this is not possible, how to include it in XPath statements (line 6-7) so that the value of <a> is picked up correctly? ( iii ) Would changing from “org.apache.xerces.parsers.SAXParser†to “org.ccil.cowan.tagsoup.Parser†make any difference as far as using XPath is concerned? ( iv ) Failing to exlude DTD, how to change the lookup of a PUBLIC DTD to a local SYSTEM one and include a local DTD for reference?  I am running JDK 1.6.0_06, Netbeans 6.1, JDOM 1.1, Saxon6-5-5, Tagsoup 1.2 on Windows XP platform.  Any assistance would be appreciated.  Thanks in advance,  Jack Search 1000's of available singles in your area at the new Yahoo!7 Dating. Get Started http://au.dating.yahoo.com/?cid=53151&pid=1011 | ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
