Altova Mailing List Archives

Re: [xml-dev] Auto schema/xpath generation from doc collection

From: "G. Ken Holman" <gkholman@----------------.--->
To: xml-dev@-----.---.---
Date: 5/20/2009 1:49:00 PM
At 2009-05-20 06:20 -0700, Paul M wrote:
>Say one has a collection of docs:
>....doc20000 (many docs)
>I am looking for a solution(application, ideas, designs) that would return:
>1. A listing of xpaths to elements

I formalized that by creating an XML vocabulary for what I've termed 
"An XPath file".  Such an instance turns out to be very useful in 
specifying the required behaviours in a stylesheet 
specification.  The first support of XPath files I released converted 
a W3C Schema for UBL into an XPath file, but this proves unwieldy for 
document models such as UBL Order with 880,000 elements and 
attributes, not including recursion, for a single instance.

To make things more manageable, but also more fragile, I created a 
stylesheet to read an XML document and enumerate the elements and 
attributes found there-in:

I say "fragile" because changing an instance happens more frequently 
than changing the model.  The UBL document model hasn't changed in 
over two years, while adding a single element to an instance will 
change the enumeration of subsequent elements in that instance's XPath file.

>2. A schema from the docs in a collection.
>3. Other ideas?

For what?  You've described a facility you need ... is that the 
entire problem or are you using this in a particular context that 
might spark other ideas?

As I said, the context for me for creating XPath files was/is for the 
specification of stylesheet behaviours:  one prints off a blank UN 
Layout Key form and manually annotates the form with the reference 
numbers enumerating the desired elements and attributes that belong 
in each box.  That becomes one component of the specification of what 
goes where.  One of the outputs from an "XPath file" is an XML 
instance that instantiates every element and attribute, using the 
element and attributes ordinal as its content.  Then when you run 
that instance through your development stylesheet, the result should 
be filled with numbers that match your manually-created specification.

This was presented at XML Europe 2004 but I see that the IDEAlliance 
archives cannot be accessed to review a copy of my paper.

I hope this helps.

. . . . . . . . . . . . Ken

XQuery/XSLT/XSL-FO hands-on training - Los Angeles, USA 2009-06-08
Crane Softwrights Ltd.
Training tools: Comprehensive interactive XSLT/XPath 1.0/2.0 video
Video lesson:
Video overview:
G. Ken Holman                 mailto:gkholman@C...
Male Cancer Awareness Nov'07
Legal business disclaimers:


XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address:
Or unsubscribe: xml-dev-unsubscribe@l...
subscribe: xml-dev-subscribe@l...
List archive:
List Guidelines:


These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.