Altova Mailing List Archives


Re: [xml-dev] combining XMLEvent lists

From: David <dlee@-------.--->
To: xml-dev@-----.---.---
Date: 9/28/2010 5:29:00 PM
  My guess would be "XMLEvent" is refering to StAX Events.

http://woodstox.codehaus.org/javadoc/stax-api/1.0/javax/xml/stream/events/XMLEvent.html

which is a parsed XML event (startDocument, startElement  , characters ... )


David A. Lee
dlee@c...
http://www.xmlsh.org


On 9/28/2010 1:17 PM, Michael Kay wrote:
>
>  On 28/09/2010 4:13 PM, Johannes.Lichtenberger wrote:
>> On 09/28/2010 04:33 PM, Michael Kay wrote:
>>> Sounds fascinating, and I wish I had time to get involved. It would
>>> certainly be elegant if you could have both the productivity of writing
>>> this declaratively in XSLT and the performance of running it on Hadoop
>>> MapReduce. Intrinsically, the two seem to fit together hand in glove,
>>> but I suspect some engineering effort is needed to make it work.
>> Hello Michael,
>>
>> I think it would be too complicated to achieve the desired grouping with
>> Java. Do you think it makes sense to first serialize the results and
>> then use Saxon's XSLT 2.0 processor to achieve the results? Or do you
>> have any direct input from a List of XMLEvents to Saxon's XSLT
>> processor? I assume it reads XML-data from an InputSource or some kind
>> of a stream.
>
> I'm not sure whether "XMLEvent" is something I'm expected to know 
> about: you said earlier "
>
> I've got an Iterator with Lists (Java) out of XMLEvents, which are
> serialized fragments
>
> so I assume they are just strings containing unparsed XML. That's not 
> going to be a particularly efficient representation for processing, so 
> the first step will be to parse each one to a tree (for example, a 
> Saxon TinyTree).
>
> You then said,
>
> I want to find combine Lists which have the same page id and the same
> revision timestamp
>
> but you left out the critical information as to whether this would 
> always combine elements
> that were adjacent in the list. If the groups are adjacent then you 
> could potentially devise
> a strategy that avoid holding all the trees in memory at the same time.
>
> Supplying a sequence of trees as input to Saxon grouping is not a 
> problem. Using the s9api interface,
> you can use a DocumentBuilder to build each tree as an XdmNode, then a 
> sequence can be constructed using
> the constructor public XdmValue(Iterable<XdmItem>  items), and then 
> this XdmValue can be passed as a parameter
> to an XsltTransformer, and a reference to the parameter can be used 
> in<xsl:for-each-group select="$param">.
> Using this approach the whole structure will be held in memory, but 
> there are ways of avoiding that by going
> to lower-level interfaces.
>
> Michael Kay
> Saxonica
>
>
>> It's a special case, where two or more revisions of one article are made
>> at the same time (in the same second). I would have to look through the
>> XML file with BaseX or Saxon, but I'm pretty sure such cases exist
>> somewhere in the hugh file (as of now I've only extracted a small subset
>> of articles and replaced WikiText inside text-elements with XML).
>>
>> The whole task is to sort the revisions to shredder it into our XML
>> datastorage system (the deltas of the revisions), which has the
>> capability to store and retrieve revisions compactly and efficiently. In
>> parallel I'm currently writing the import of a sorted XML file.
>>
>> My main task (master project and thesis) is or will be the visualization
>> of temporal tree structured data to gain further insights into the
>> evolution of the data, which are otherwise very difficult to realize.
>>
>> regards,
>> Johannes
>>
>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@l...
> subscribe: xml-dev-subscribe@l...
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@l...
subscribe: xml-dev-subscribe@l...
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php

Disclaimer

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.