Altova Mailing List Archives
>xml-dev Archive Home
>Thread Prev - Re: Are we losing out because of grammars?
>Thread Next - Re: Are we losing out because of grammars?
Re: Are we losing out because of grammars?
To: James Clark <jjc@------.--->
Date: 2/1/2001 11:07:00 AM
> <element name="x"> > <zeroOrMore> > <element name="y"> > <attribute name="z"> > <data type="xsd:string"/> > </attribute> > </element> > </zeroOrMore> > <element name="y"> > <attribute name="z"> > <data type="xsd:integer"/> > </attribute> > </element> > </element> The example has a typo. I guess the above one is what you are thinking about, right? > unless I lookahead and see whether it's the last element "y" element in > the "x". The TREX implementation works on a stream of SAX events, so > this is a big complication. Right. But it's not so big a complication. > It's not in general easy, unless you restrict the grammar. Without restricting TREX's expressiveness, you can report type if you can parse documents twice. No random access capability is necessary. After the first scan, the whole result of type-assignment is available, even if the grammar is ambiguous. You need the second scan only to feed the application with SAX event and type information. > Type assignment may require quite different implementation > techniques from validation. No, I understand you may want to see it before believe it. But believe me, I did it once by myself :-) > - You seem to think type-assignment is very important. Why? Other people might have better reasons. Mine is: (1) Type-assignment makes it simple to automatically generate object model that in turn automatically parse the document. Without type-assignment, "automatic parsing" process is much more complicated. (2) As an application programmer, I don't want to check the ancestor's information when deciding what to do with current element. > dispatching on the "FQGI" (ie on the name of the element and the names > of its ancestor elements) is sufficient for many applications. Type In other words, I think it's sufficient too, but I don't even want to see the ancestor information. If I can receive type, all I need to see is the type. For example, in RELAX, "tag" element has two different definition, depending on where it appears. If you see it under "elementRule" tag, then it has <!ATTLIST name CDATA #required> whereas if you see it elsewhere, then it has <!ATTLIST role CDATA #required name CDATA #implied> > - Your ambiguity detection algorithm for RELAX detects whether it is > possible to assign labels to elements in more than one way. I would find > it more interesting to know whether it is possible to assign datatypes > (as specified by the RELAX "type" attribute) to leaf elements and > attributes in more than one way. Is it possible/easy to detect this > kind of ambiguity? I understand that you are interested in the algorithm that answers the following question by Yes/No. "Is there any lexical value that can be accepted by given two datatypes?" My answer depends on why you need this algorithm. Actually, label ambiguity depends on datatype ambiguity. But I intentionally left them unexplained in my post, so that the core idea of the algorithm is easily understood. Computation of the intersection of two lexical spaces may or may not be decidable (in the sense of computer science). Even if it is decidable, the algorithm has to be very dependent to datatype spec. However, some sound algorithms (again in the sense of CS) can be easily implemented, and it is reasonably practical, I think. For example, that algorithm can always answer the question if - they are both derived from decimal type, without pattern facet. or - they are both derived from string type (some restriction apply) or - one of the type has finite lexical space (enumeration facet) And I think these simplified algorithm covers 80% of use cases. If the grammar is unambiguous, you never have to worry about the intersection of two datatypes. The following example is always unambiguous regardless of type1 & type2, and as a result you don't have to compute the intersection at all. <element name="y"> <choice> <group> <attribute name="z" type="type1" /> <element name="a" /> </group> <group> <attribute name="z" type="type2" /> <element name="b" /> </group> </choice> </element> As a result, The algorithm is rarely used, and thus far less important. regards, ---------------------- K.Kawaguchi E-Mail: k-kawa@b...