Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


RE: [xml-dev] How to parse XML document with default namespace with

From: "Michael Kay" <mike@--------.--->
To: "'Jack Bush'" <netbeansfan@-----.---.-->, <xml-dev@-----.---.--->
Date: 11/5/2008 9:15:00 AM
The book you are quoting is very old information. Where it says that you can
do something, it's probably right, but where it says that you can't, it may
well be wrong.
 
Frankly, I forget exactly what's in Saxon 6.5.x because it's so long ago - I
know JDOM was already supported back then but I don't remember the details
of the API. I do recall, that as Elliotte says in his book, the Saxon API
for invoking XPath was pretty clumsy in those days (because it was designed
primarily for internal use by the XSLT engine, not as a user-facing
interface). I would use a more recent release.
 
But the code you gave us wasn't even trying to use Saxon, it was using the
XPath engine within JDOM, using a JDOM API that I'm not very familiar with,
and therefore I can't tell you why it isn't working.
 
My own preference for this kind of coding would be to use Saxon's s9api
interface, documented at
 
http://www.saxonica.com/documentation/javadoc/net/sf/saxon/s9api/package-sum
mary.html
 
There are sample applications in the saxon-resources download (from
SourceForge). It includes an example of XPath with JDOM like this:
 
         public void run() throws SaxonApiException {
             // Build the JDOM document
             org.jdom.input.SAXBuilder jdomBuilder = new
org.jdom.input.SAXBuilder();
             File file = new File("data/books.xml");
             org.jdom.Document doc;
             try {
                 doc = jdomBuilder.build(file);
             } catch (org.jdom.JDOMException e) {
                 throw new SaxonApiException(e);
             } catch (IOException e) {
                 throw new SaxonApiException(e);
             }
             Processor proc = new Processor(false);
             DocumentBuilder db = proc.newDocumentBuilder();
             XdmNode xdmDoc = db.wrap(doc);
             XPathCompiler xpath = proc.newXPathCompiler();
             XPathExecutable xx = xpath.compile("//ITEM/TITLE");
             XPathSelector selector = xx.load();
             selector.setContextItem(xdmDoc);
             for(XdmItem item : selector) {
                 XdmNode node = (XdmNode)item;
                 org.jdom.Element element =
(org.jdom.Element)node.getExternalNode();
                 System.out.println(element.getValue());
             }
         }
 
(The method getExternalNode() was added in Saxon 9.1.0.2 and is not yet in
the published Javadoc)
 
You would probably want to add a call
 
xpath.declareNamespace("prefix", "uri")
 
before the compile() call. 
 
Michael Kay
http://www.saxonica.com/
 
  _____  

From: Jack Bush [mailto:netbeansfan@y...] 
Sent: 05 November 2008 03:48
To: Michael Kay; xml-dev@l...
Subject: Re: [xml-dev] How to parse XML document with default namespace with
JDOM XPath



Hi Michael,
 
Thanks for responding to this question.
 
I have not had any luck with jdom-interest@j... forum at all since
subscribing to them a few months back.
 
In the meantime, can you confirm that it is not possible to use Sax 6.5.x
with JDOM according to
http://www.cafeconleche.org/books/xmljava/chapters/ch16s05.html? Or is it
because you are not familiar with JDOM?
 
Could anyone point me to a more useful JDOM forum to assistance with this
question?
 
Many thanks,
 
Jack



  _____  

From: Michael Kay <mike@s...>
To: Jack Bush <netbeansfan@y...>; xml-dev@l...
Sent: Wednesday, 5 November, 2008 12:39:48 AM
Subject: RE: [xml-dev] How to parse XML document with default namespace with
JDOM XPath


I see no Saxon code here. You are using the XPath engine that comes with
JDOM. You might be better off asking on the JDOM list. I have to confess I'm
surprised to see you declaring namespaces AFTER compiling the XPath
expression, but I can't say I'm familiar with this API.
 
Michael Kay
http://www.saxonica.com/


  _____  

From: Jack Bush [mailto:netbeansfan@y...] 
Sent: 04 November 2008 13:02
To: xml-dev@l...
Subject: [xml-dev] How to parse XML document with default namespace with
JDOM XPath



Hi All,

 

I am having difficulty parsing using Saxon and TagSoup parser on a namespace
html document. The relevant content of this document are as follows:

 

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

....

</head>

<body>

    <div id="container">

        <div id="content">

            <table class="sresults">

                <tr>

                    <td>

                        <a href="http://www.abc.com/areas" title=" Hollywood
, CA "> hollywood </a>

                    </td>

                    <td>

                        <a href="http://www.abc.com/areas" title=" San Jose
, CA "> san jose </a>

                    </td>

                    <td>

                        <a href="http://www.abc.com/areas" title=" San
Francisco , CA "> san francisco </a>

                    </td>

                    <td>

                        <a href="http://www.abc.com/areas" title=" San Diego
, CA "> San diego </a>

                    </td>

              </tr>

....

</body>

</html>

 

Below is the relevant code snippets illustrates how I have attempted to
retrieve the contents (value of  <a>):

 

             import java.util.*;

             import org.jdom.*;

             import org.jdom.xpath.*;

             import org.saxpath.*;

             import org.ccil.cowan.tagsoup.Parser;

 

( 1 )       frInHtml = new FileReader("C:\\Tmp\\ABC.html");

( 2 )       brInHtml = new BufferedReader(frInHtml);

( 3 ) //    SAXBuilder saxBuilder = new
SAXBuilder("org.apache.xerces.parsers.SAXParser");

( 4 )       SAXBuilder saxBuilder = new
SAXBuilder("org.ccil.cowan.tagsoup.Parser");

( 5 )       org.jdom.Document jdomDocument = saxbuilder.build(brInHtml);

( 6 )       XPath xpath =
XPath.newInstance("/ns:html/ns:body/ns:div[@id='container']/ns:div[@id='cont
ent']/ns:table[@class='sresults']/ns:tr/ns:td/ns:a");

( 7 )       xpath.addNamespace("ns", "http://www.w3.org/1999/xhtml");

( 8 )       java.util.List list = (java.util.List)
(xpath.selectNodes(jdomDocument));

( 9 )       Iterator iterator = list.iterator();

( 10 )     while (iterator.hasNext())

( 11 )     {

( 12 )            Object object = iterator.next();

( 13 ) //         if (object instanceof Element)

( 14 ) //
System.out.println(((Element)object).getTextNormalize());

( 15 )             if (object instanceof Content)

( 16 )                   System.out.println(((Content)object).getValue());

              }

..

 

This program would work on the same document without the default namespace,
hence, it would not be necessary to include "ns" prefix along in the XPath
statements (line 6-7) either. Moreover, I was using
"org..apache.xerces.parsers.SAXParser" to have successfully retrieve content
of <a> from the same document without default namespace in the past.

 

I would like to achieve the following objectives if possible:

 

( i ) Exclude DTD and namespace in order to simplifying the parsing process.
How this could be done?

( ii ) If this is not possible, how to include it in XPath statements (line
6-7) so that the value of <a> is picked up correctly?

( iii ) Would changing from "org.apache.xerces.parsers.SAXParser" to
"org.ccil.cowan.tagsoup.Parser" make any difference as far as using XPath is
concerned?

( iv ) Failing to exlude DTD, how to change the lookup of a PUBLIC DTD to a
local SYSTEM one and include a local DTD for reference?

 

I am running JDK 1.6.0_06, Netbeans 6.1, JDOM 1.1, Saxon6-5-5, Tagsoup 1.2
on Windows XP platform.

 

Any assistance would be appreciated.

 

Thanks in advance,

 

Jack


  _____  

Search 1000's of available singles in your area at the new Yahoo!7 Dating.
Get Started
<http://au.rd.yahoo.com/dating/mail/tagline1/*http://au.dating.yahoo.com/?ci
d=53151&pid=1011> .


  _____  

Search 1000's of available singles in your area at the new Yahoo!7 Dating.
Get
<http://au.rd.yahoo.com/dating/mail/tagline1/*http://au.dating.yahoo.com/?ci
d=53151&pid=1011> Started.



transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent