Altova Mailing List Archives>Archive Index >xml-dev Archive Home >Recent entries >Thread Prev - RE: [xml-dev] SAX - not well formed data [Thread Next] RE: [xml-dev] SAX - not well formed dataTo: "'Johannes Lichtenberger'" <Johannes.Lichtenberger@------------.--> Date: 2/3/2009 3:58:00 PM Incidentally, you could also achieve the same effect with a one-line query
using the Saxon-SA streaming capabilities.
java com.saxonica.Query -qs:"saxon:stream(doc('in.xml')/xml/page)[1]"
should do the job. It will automatically stop reading the input when it has
found the data it needs.
Michael Kay
http://www.saxonica.com/
> -----Original Message-----
> From: Johannes Lichtenberger
> [mailto:Johannes.Lichtenberger@u...]
> Sent: 03 February 2009 15:49
> To: Michael Kay
> Cc: 'xml-dev'
> Subject: RE: [xml-dev] SAX - not well formed data
>
> Am Dienstag, den 03.02.2009, 14:39 +0000 schrieb Michael Kay:
> > > I have a document like this:
> > >
> > > <xml>
> > > <page>
> > > <rev>...</rev>
> > > <rev>...</rev>
> > > </page>
> > > ... (some hundreds of pages)
> > > <page>
> > > <rev>...
> > >
> > > so it's not well formed.
> >
> > It's not clear from that description why it isn't well-formed.
>
> Well, I'm downloading and extracting a file with `curl
> http://... | bzcat > test.xml`, but because it's very big,
> and I maybe haven't got the time to analyse the whole data,
> I'm extracting pages from the beginning, so I press CTRL+C
> sometime afterwards. Maybe I could extract pages on-the-fly,
> with something like `curl http://... | bzcat | java -jar
> ExtractArticles but I'm not really familiar with Pipes and so
> on :( Probably I would need XMLStreamReader instead of the
> reader and buffer input or something like that, but I tried
> it and failed...
>
> > > I only want to be able to write out the first pages, but the SAX
> > > Parser throws errors:
> >
> > You should be able to abort the parse when you have read what you
> > want, by throwing an exception from any of the callback
> methods (e.g endElement()).
> > The parser will then exit back to your application with an
> exception,
> > which you can catch. You should check that this exception
> is the one
> > you were expecting, not some other unrelated error in your input.
>
> Ok, that's possibly the best thing.
>
> Thank you!
>
>
>
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.
[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@l...
subscribe: xml-dev-subscribe@l...
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
| ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
