Altova Mailing List Archives>Archive Index >xml-dev Archive Home >Recent entries >Thread Prev - [xml-dev] practical question re: Java/XML handling [Thread Next] Re: [xml-dev] practical question re: Java/XML handlingTo: Mike Sokolov <sokolov@--------.---> Date: 9/3/2009 2:15:00 PM On Thu, Sep 3, 2009 at 7:26 AM, Mike Sokolov <sokolov@i...> wrote: > After all the discussion about "What is data?" I don't know if this list is > the place to discuss actual details of implementation, but please feel free > to send me elsewhere if you can think of a better venue. > For my part, I find it refreshing a place where one can discuss such fundamental matters as well as the lineaments of running code. I think you'll find in the archives plenty of discussion of code, and plenty of code-free discussion alike. > I have a need to handle XML that references a non-existent DTD. The DTD is > irrelevant to the actual processing of the XML, and isn't available > anywhere, but it is declared in in the DOCTYPE. I'm sure many of you have > encountered this situation: it's practically the norm, in my experience. > > After years of dealing with this inherently unsatisfactory situation in a > variety of ways, I came up with a new one that I am liking at the moment, > which is to insert a Stream into a Java XML processing stack that strips out > the prolog of the XML document before handing it off to a parser. This has > the nice property that it doesn't require modifications to the stored XML > files. It loses PIs and comments and the XML decl, but I can live with > that. > Expat allows you to specify a standalone flag, which in effect expunges all external parameter entity declarations (and other such external resources incompatible with standalone="yes"). This certainly skates the edges of XML spec compliance, but I think it's legit, because I see it as an implicit transform. Anyway, your Java tools might have the equivalent. FWIW, I know that Jython 2.5 includes Expat wrapped for the core XMl libs, so that might be an option. In Amara 2.x we expose this flag very conveniently. You can do: import amara doc = amara.parse(myxml, standalone=True) #flag uses boolean values, not strings And it will in effect ignore those pesky parameter entitiy decls, including declarations of external subset. The rest of your post is Java-specific, so I'll snip and run like hell :) -- Uche Ogbuji http://uche.ogbuji.net Founding Partner, Zepheira http://zepheira.com Linked-in profile: http://www.linkedin.com/in/ucheogbuji Articles: http://uche.ogbuji.net/tech/publications/ Friendfeed: http://friendfeed.com/uche Twitter: http://twitter.com/uogbuji Join me at Balisage: * http://www.balisage.net/ | ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
