Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: [xml-dev] practical question re: Java/XML handling

From: Uche Ogbuji <uche@------.--->
To: Mike Sokolov <sokolov@--------.--->
Date: 9/3/2009 2:15:00 PM
On Thu, Sep 3, 2009 at 7:26 AM, Mike Sokolov <sokolov@i...> wrote:

> After all the discussion about "What is data?" I don't know if this list is
> the place to discuss actual details of implementation, but please feel free
> to send me elsewhere if you can think of a better venue.
>

For my part, I find it refreshing a place where one can discuss such
fundamental matters as well as the lineaments of running code.  I think
you'll find in the archives plenty of discussion of code, and plenty of
code-free discussion alike.



> I have a need to handle XML that references a non-existent DTD.  The DTD is
> irrelevant to the actual processing of the XML, and isn't available
> anywhere, but it is declared in in the DOCTYPE.  I'm sure many of you have
> encountered this situation: it's practically the norm, in my experience.
>
> After years of dealing with this inherently unsatisfactory situation in a
> variety of ways, I came up with a new one that I am liking at the moment,
> which is to insert a Stream into a Java XML processing stack that strips out
> the prolog of the XML document before handing it off to a parser.  This has
> the nice property that it doesn't require modifications to the stored XML
> files.  It loses PIs and comments and the XML decl, but I can live with
> that.
>

Expat allows you to specify a standalone flag, which in effect expunges all
external parameter entity declarations (and other such external resources
incompatible with standalone="yes").  This certainly skates the edges of XML
spec compliance, but I think it's legit, because I see it as an implicit
transform.  Anyway, your Java tools might have the equivalent.  FWIW, I know
that Jython 2.5 includes Expat wrapped for the core XMl libs, so that might
be an option.

In Amara 2.x we expose this flag very conveniently.  You can do:

import amara
doc = amara.parse(myxml, standalone=True) #flag uses boolean values, not
strings

And it will in effect ignore those pesky parameter entitiy decls, including
declarations of external subset.

The rest of your post is Java-specific, so I'll snip and run like hell :)


-- 
Uche Ogbuji                       http://uche.ogbuji.net
Founding Partner, Zepheira        http://zepheira.com
Linked-in profile: http://www.linkedin.com/in/ucheogbuji
Articles: http://uche.ogbuji.net/tech/publications/
Friendfeed: http://friendfeed.com/uche
Twitter: http://twitter.com/uogbuji
Join me at Balisage:
* http://www.balisage.net/


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent