Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


RE: [xml-dev] A single, all-encompassing data validation language - good or bad for the marketplace?

From: Amelia A Lewis <amyzing@--------.--->
To: xml-dev@-----.---.---
Date: 8/8/2007 12:01:00 AM
On 2007-08-07 00:11:44 -0400 Michael Kay <mike@s...> wrote:
>> I do not find it difficult to imagine a grammar that can specify 
>> these 
>> constraints; using the set-notation formalisms common in discussions 
>> of 
>> automata, it's relatively straightforward (handling the various 
>> gregorian 
>> exceptions gets increasingly difficult and verbose as one grows more 
>> and 
>> more precise, but it is not inherently beyond the scope of a grammar 
>> ... is 
>> it?).
> 
> I've no idea what the theoretical limits of what you can do with a 
> grammar
> are (I don't even know what the accepted definition of "grammar" is), 
> but I

Err.  Murata-san has a very nice paper that characterizes the sort of 
grammar embodied by several available grammars for XML, including W3C 
XML Schema (which is characterized as a "restrained-competition" 
grammar).

Are you speaking of "grammar" used colloquially?  Because--disclaimer, 
I am a humanities-trained sort just now investigating what amounts to 
Computer Science 101, via texts on formal languages, automata, and 
discrete mathematics--there does seem to be a fairly clear definition. 
  A grammar of a particular type (that is, with particular rules for 
what it can define) defines a particular class of languages, and in 
turn is strongly associated with a class of automata.

DTDs define a grammar for a regular language, and I believe that this 
was the design point of W3C XML Schema as well--a language that could 
be parsed by a deterministic finite automaton (WXS missed that, as a 
consequence of complex interactions, as Murata-san points out in the 
paper on taxonomy of schema languages).  RNG has regular hedge grammar 
as its design point (and hedge automata).

Context free grammars define a class of languages that includes 
regular languages (every regular grammar is a context free grammar), 
but with more power, and are associated with pushdown automata.  It's 
very likely that most W3C XML Schema parsers are implemented in this 
fashion, rather than as deterministic (or equivalently, 
nondeterministic) finite automata.  If that *is* the case, then it 
seems (to me) rather sensible to make use of the power of that style 
of automata more fully, rather than providing a sort of "escape" 
mechanism to break out of the automaton (which effectively puts you 
into turing machine territory, with an implementation problem of 
unknown complexity).

> think I have a feel for the practical limits of when grammar ceases 
> to be
> the most convenient way of stating the rules. I see people sometimes 
> doing
> things with regular expressions that in my view are well beyond that 
> limit.

That's a particular case of grammars.  Regular language == regular 
expressions == regular grammar == deterministic and nondeterministic 
finite automata.  Try and apply regular expressions to paren matching, 
and joy will not be your reward.  You can't express "string over Sigma 
followed immediately by its reversal".  But if W3C XML Schema has 
already passed the border from regular grammar into the land of 
context free grammar, why not investigate the limits of a CFG, before 
introducing yet another, unrelated technology?

In fact, while I was teaching myself this stuff, I borrowed the brain 
of a mathematician friend of mine (hey, hardly been used, you kn-- 
owww!).  She pointed out (and the various texts eventually explained, 
somewhat more verbosely) that you can gain quite a bit of power by 
equipping a finite state machine with a pushdown stack (that is, going 
from regular language to context free, from nfa/dfa to pushdown 
automata), so I naturally immediately suggested having *two* ... but 
was informed that such an automaton is functionally equivalent to a 
turing machine.  Determination of its capabilities is *much* more 
complex, I gather.

But these are ... surely they must be ... issues that the 1.1 W3C XML 
Schema working group has considered, are they not?  With the example 
of RNG, and the formalisms underlying it, and the literature that 
increasingly surrounds the characterization of XML as a regular hedge 
language (consequently a language most comfortably defined by a 
regular hedge grammar, though not *necessarily* practically most 
effectively implemented in a hedge automaton), surely the original 
design point of W3C XML Schema 1.0, of a regular language which could 
be efficiently processed by a finite automaton (in fact, by a top-down 
deterministic finite automaton, which is the justification of UPA if 
I'm not mistaken) was *at least* discussed?  The whole question of 
determinism (the unique particle attribution rule) *ought* to be on 
the WXS 1.1 WG's radar; it's a sore point with users of the technology 
(as is co-occurrence constraints, which is being addressed, albeit in 
a fashion I find rather worthy of remark).

> Why stretch one technology to its limits when you've crossed into a 
> domain
> where other technologies do it better?

Uh ... like moving from topdown deterministic finite automata to 
pushdown automata?

I'm sorry, but grammar defines language and is associated with a class 
of automata; WXS 1.0 had a very clear (if, in my opinion, mistaken) 
design goal in this area.  WXS 1.1 has already introduced an exception 
to the UPA (one implicit in WXS 1.0, as I understand it, but the 
formal acknowledgement is an advance, or potentially is so), so ... 
either the NFA/DFA is equipped with a sort of "exception" machinery of 
indeterminate complexity, or a different class of automata is 
indicated as the design point.

At the risk of repeating myself, I'm just learning stuff that they 
(supposedly?) teach CS undergraduates, so perhaps I've gone off the 
rails.  If so, I'd be happy to hear in what fashion I've done so (and 
my employer would prolly be happy if I get unrealistic ideas out of my 
tiny little head).

Amy!
-- 
Amelia A. Lewis                    amyzing {at} talsever.com
The flesh is strong.  The spirit stronger.  So shed your skin, baby.
Let it through.  Come on over.
                 -- Amy Ray


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent