Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: [xsl] XInclude as an XSLT transformation?

From: "W. Eliot Kimber" <ekimber@------------------->
To:
Date: 1/3/2005 4:28:00 PM
Elliotte Harold wrote:



W. Eliot Kimber wrote:



The issue is that in the transcluded result the IDs must be unique 
(this is a basic requirement of XML). 




This is not a basic requirement of XML. IDs in XML documents may in fact 
be non-unique, and even non-name tokens (as recently came up in a 
different context). A document containing such non-unique IDs would be 
invalid but well-formed, and might still be usefully processed.

I did misspeak: if an attribute has a declared type of "ID" *and* the 
document is intended to be DTD valid then the IDs must be unique. This 
validation requirement is a non-optional feature of XML (in that if you 
want DTD validity then ID uniqueness is not optional).



With XSD schema you can, of course, define key attributes that have 
different scopes of uniqueness, although, unless I've missed something 
in the XSD spec, the largest possible scope for key uniqueness is still 
the physical XML document.



That is, my main point and the crux of the issue as far as element 
addressing goes is, that in XML, regardless of which 
constraint-specification standard you use, each XML document establishes 
one or more identifier name spaces, which means that addressing elements 
using some form of standard-defined (or standard-governed) mechanism 
always involves first addressing the XML document and then the things in it.



[Note that using indirect addresses, such as those defined in the W3C 
XIndirect note submitted by Innodata Isogen 
(http://www.w3.org/TR/2003/NOTE-XIndirect-20030612/) you can impose a 
global namespace over a group of documents at will. One distinguishing 
feature of this approach is that the scope of the imposed namespace is 
flexible--it's not an all-or-nothing approach. And by having multiple 
stages of indirection you can, of course, combine distinct namespaces 
into to new, larger namespaces, if needed.]



This fact is reflected in the XInclude href/xpointer attribute pair, 
which I think is the best design choice given the overall constraints 
imposed by XML syntax and practice. In particular, it clearly 
distinguishes the storage object part of the address (a URI with no 
fragment identifier) from the semantic object part of the address (e.g, 
an XPointer that addressses an element, attribute, or sequence of data 
characters) and avoids tricky syntactic interference between URI syntax 
and the syntaxes of semantic addresses.



It's not clear, at least to my reading, whether or not the XInclude 
allows or requires ID values to be rewritten such that all IDs in the 
result are unique even if two input elements (from two different 
source documents) have the same ID value. 




The XInclude spec is clear. This is *not* allowed. ID values may not be 
rewritten by a conforming XInclude processor, even if they conflict.

I think I was insufficiently precise in making my statement. The 
challenge comes from the disjoint between how the XInclude specification 
is defined and what one has to do in practice in an XSLT (or DOM or SAX) 
processing environment.



That is, the XInclude specification is defined only in terms of infoset 
modification and at that level you are correct--IDs are not rewritten in 
the sense that the infoset information reflecting the original ID values 
is not modified in the transcluded result. What *can be* modified are 
the final reference pointer properties, which must reflect the original 
reference target in its final, transcluded location.



Thus, while the syntactic IDs themselves are not changed, the final 
effect of the references may be.



When discussing an XSLT implementation of XInclude processing the 
problem is that XSLT is not operating on the infoset but on 
XSLT-specific node trees in memory. In this tree there is no abstract 
pointer property, only the original syntactic values. Therefore there is 
no way to directly implement the XInclude reference fixup process except 
by modifying the data values that are then interpreted as references.



Thus, in the process of performing transclusion one has no choice but to 
change the ID and reference values in the transcluded result (which is a 
new document tree) such that the reference correctness constraint is 
preserved. This necessarily means that you either rewrite the values of 
the original ID and reference attributes as they are copied from the 
source to the transcluded result or you create new attributes that hold 
the pointers and IDs as they need to be constructed in the transcluded 
result.



The second approach would be closer in spirit to the XInclude spec 
(because the original ID and reference values would be unchanged) but 
would make it impossible to then process the transcluded result using 
generic templates that were written against the original attribute 
names. That is, if I have a pre-XInclude template that expects XRef 
elements to use "refid" to point to "id" attributes, that template will 
not work post-XInclude if I have used different attribute names for the 
transcluded result references. This means that XInclude processing 
cannot be an essentially transparent process for the core business logic 
in this case.



What I do (or have done to date) is to copy the original ID and 
reference values to new, transclusion-specific attributes, so that the 
values are accessible to subsequent XSLT processors but rewrite the ID 
and reference attributes. This allows me to issue messages that reflect 
the original storage locations of references and targets but allows the 
generic transformation business logic to be unchanged. This allows you 
to retrofit XInclude processing into any existing XSLT process with a 
minimal amount of effort (all you have to do it modify the root template 
and implement any required ID and reference rewriting needed for the 
document types involved).



Note too that in my processing the transcluded result is purely an 
in-memory construct.



So I contend that at best my XSLT process is conforming, to the degree 
that the XInclude spec can have an opinion about the conformance of 
processing that does not involve direct (or effective) infoset 
processing; or at worst a necessary concession to practicality in order 
to have a working system that does not require globally-unique element 
IDs but that is still consistent with the spirit and intent of the 
XInclude recommendation.



Note that as a potential non-conformance there is very little risk 
because the processing effect is, as far as I can tell, consistent with 
the intent of the XInclude specification, which is what really counts 
with a processing standard like XInclude.



With data standards, like XML, correctness is a binary condition. With 
processing standards, correctness is necessarily more fuzzy.



Note that my personal intent is to conform as closely as I can to the 
XInclude specification. I think it's a very good specification and very 
badly needed. But it is, by itself, insufficient to meet the 
requirements of the types of business processes I support, so I have no 
choice but to either diverge from it in some places or extend it 
unilaterally. But I try to do so in the most controlled and principled 
way that I can.



If time and energy allow it would be nice to either extend the XInclude 
spec to include support for my requirements (for example, formalize the 
ability to specialize from xi:include). But there are lots of things 
that need doing and only so many people to do them, and it's much easier 
to just do what needs to be done for now and let standardization come 
when it's really needed.



Cheers,



Eliot
--
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8122

ekimber@xxxxxxxxxxxxxxxxxxx
www.innodata-isogen.com


transparent
Print
Mail
Digg
delicious
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent