Altova Mailing List Archives>Archive Index >xml-dev Archive Home >Recent entries >Thread Prev - [xml-dev] How to parse XML document with default namespace with JDOM [Thread Next] RE: [xml-dev] How to parse XML document with default namespace withTo: "'Jack Bush'" <netbeansfan@-----.---.-->, <xml-dev@-----.---.---> Date: 11/4/2008 1:44:00 PM I see no Saxon code here. You are using the XPath engine that comes with JDOM. You might be better off asking on the JDOM list. I have to confess I'm surprised to see you declaring namespaces AFTER compiling the XPath expression, but I can't say I'm familiar with this API. Michael Kay http://www.saxonica.com/ _____ From: Jack Bush [mailto:netbeansfan@y...] Sent: 04 November 2008 13:02 To: xml-dev@l... Subject: [xml-dev] How to parse XML document with default namespace with JDOM XPath Hi All, I am having difficulty parsing using Saxon and TagSoup parser on a namespace html document. The relevant content of this document are as follows: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> .... </head> <body> <div id="container"> <div id="content"> <table class="sresults"> <tr> <td> <a href="http://www.abc.com/areas" title="Hollywood, CA">hollywood</a> </td> <td> <a href="http://www.abc.com/areas" title="San Jose, CA">san jose</a> </td> <td> <a href="http://www.abc.com/areas" title="San Francisco, CA">san francisco</a> </td> <td> <a href="http://www.abc.com/areas" title="San Diego, CA">San diego</a> </td> </tr> .... </body> </html> Below is the relevant code snippets illustrates how I have attempted to retrieve the contents (value of <a>): import java.util.*; import org.jdom.*; import org.jdom.xpath.*; import org.saxpath.*; import org.ccil.cowan.tagsoup.Parser; ( 1 ) frInHtml = new FileReader("C:\\Tmp\\ABC.html"); ( 2 ) brInHtml = new BufferedReader(frInHtml); ( 3 ) // SAXBuilder saxBuilder = new SAXBuilder("org.apache.xerces.parsers.SAXParser"); ( 4 ) SAXBuilder saxBuilder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser"); ( 5 ) org.jdom.Document jdomDocument = saxbuilder.build(brInHtml); ( 6 ) XPath xpath = XPath.newInstance("/ns:html/ns:body/ns:div[@id='container']/ns:div[@id='cont ent']/ns:table[@class='sresults']/ns:tr/ns:td/ns:a"); ( 7 ) xpath.addNamespace("ns", "http://www.w3.org/1999/xhtml"); ( 8 ) java.util.List list = (java.util.List) (xpath.selectNodes(jdomDocument)); ( 9 ) Iterator iterator = list.iterator(); ( 10 ) while (iterator.hasNext()) ( 11 ) { ( 12 ) Object object = iterator.next(); ( 13 ) // if (object instanceof Element) ( 14 ) // System.out.println(((Element)object).getTextNormalize()); ( 15 ) if (object instanceof Content) ( 16 ) System.out.println(((Content)object).getValue()); } .. This program would work on the same document without the default namespace, hence, it would not be necessary to include "ns" prefix along in the XPath statements (line 6-7) either. Moreover, I was using "org..apache.xerces.parsers.SAXParser" to have successfully retrieve content of <a> from the same document without default namespace in the past. I would like to achieve the following objectives if possible: ( i ) Exclude DTD and namespace in order to simplifying the parsing process. How this could be done? ( ii ) If this is not possible, how to include it in XPath statements (line 6-7) so that the value of <a> is picked up correctly? ( iii ) Would changing from "org.apache.xerces.parsers.SAXParser" to "org.ccil.cowan.tagsoup.Parser" make any difference as far as using XPath is concerned? ( iv ) Failing to exlude DTD, how to change the lookup of a PUBLIC DTD to a local SYSTEM one and include a local DTD for reference? I am running JDK 1.6.0_06, Netbeans 6.1, JDOM 1.1, Saxon6-5-5, Tagsoup 1.2 on Windows XP platform. Any assistance would be appreciated. Thanks in advance, Jack _____ Search 1000's of available singles in your area at the new Yahoo!7 Dating. Get <http://au.rd.yahoo.com/dating/mail/tagline1/*http://au.dating.yahoo.com/?ci d=53151&pid=1011> Started. | ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
