Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Discrepancies in the W3C Schema docs?

From: "Dan Maharry" <dan@---.---->
To: <xmlschema-dev@--.--->
Date: 6/8/2007 5:24:00 PM
All I did was try to write a small set of extension methods to validate=0D=0A=
whether a given string was valid according to the built-in schema string=0D=
=0Atypes and the editor in me comes out and starts nit picking. The W3C=0D=0A=
Schema docs are very good but sometimes annoyingly ambiguous without a=0D=0A=
degree in lateral thinking. =0D=0A=0D=0AProblem #1 : Is "" valid=3F=0D=0A=0D=
=0ASection 3.2.1 says =0D=0A=0D=0AThe *value space* of string is the set =
of finite-length sequences of=0D=0Acharacters (as defined in [XML 1.0 (Seco=
nd Edition)]) that *match* the=0D=0AChar production from [XML 1.0 (Second E=
dition)].=0D=0A=0D=0ASo, is the empty string valid then=3F Taking this defi=
nition on spec, the=0D=0Aanswer seems to depend on what 'finite-length' mea=
ns. According to the=0D=0Adictionary finite means =0D=0A=0D=0A1.having bo=
unds or limits; not infinite; measurable.=0D=0A2.Mathematics. =0D=0A=0D=0A=
(of a set of elements) capable of being completely counted. =0D=0Anot inf=
inite or infinitesimal. =0D=0Anot zero. =0D=0A=0D=0ASo maybe an empty s=
tring isn't valid then=3F The dictionary implies it.=0D=0AAlas, no. The XML=
 Schema spec at the top of section 4 also states =0D=0A=0D=0AAny property=
 identified as a having a set, subset or *list* value may=0D=0Ahave an empt=
y value unless this is explicitly ruled out:this is not the=0D=0Asame as ab=
sent.=0D=0A=0D=0AOK, so the empty string is valid as a string but could the=
 W3C please=0D=0Alink to this last note about sets containing the empty val=
ue from the=0D=0Amany uses of the word 'set' around the document please=3F =
Either that or=0D=0Adefine the phrase 'finite-length' in situ as 'zero or g=
reater'.=0D=0A=0D=0AProblem #2 : In which string data types is "" invalid=3F=0D=
=0A=0D=0AThe problem with the note about sets is that it states a type must=0D=
=0Aexplicitly rule the empty string as invalid before it really is invalid.=0D=
=0ABut what about it being implied elsewhere but not in black and white as,=0D=
=0Asay the value space of the NMTOKENS type=3F=0D=0A=0D=0ANMTOKENS represen=
ts the NMTOKENS attribute type from [XML 1.0 (Second=0D=0AEdition)]. The *v=
alue space* of NMTOKENS is the set of finite,=0D=0Anon-zero-length sequence=
s of *NMTOKEN*s=0D=0A=0D=0ALet's go one step back up the type hierarchy to =
the NMTOKEN type. =0D=0A=0D=0ANMTOKEN represents the NMTOKEN attribute ty=
pe from [XML 1.0 (Second=0D=0AEdition)]. The *value space* of NMTOKEN is th=
e set of tokens that=0D=0A*match* the Nmtoken production in [XML 1.0 (Secon=
d Edition)].=0D=0A=0D=0ANo explicit mention of non-zero-length anything her=
e. But the definition=0D=0Aof the NMTOKEN in XML 1.0 says that it should co=
nsist of one or more=0D=0Acharacters.=0D=0A=0D=0ANameChar ::= Letter | Di=
git | '.' | '-' | '_' | ':' | CombiningChar |=0D=0AExtender =0D=0ANmtoken=
 ::= (NameChar)+ =0D=0A=0D=0ABy those rules, a valid NMTOKEN cannot be =
empty even if the writer or=0D=0Athe schema sets minLength to 0. The same l=
ogic applies to the Language=0D=0Aand Name string types in the schema defin=
ition as well so if none of=0D=0Athem can be empty, neither can NCName, ID,=
 IDREF, IDREFS, ENTITY or=0D=0AENTITIES either despite the fact that only I=
DREFS and ENTITIES are the=0D=0Aonly of these to also mention valid types t=
o be non-zero-length=0D=0Aexplicitly. =0D=0A=0D=0ASo then, what phrase is=
 missing from "must explicitly rule the empty=0D=0Astring as invalid" becau=
se it's definitely not all there.=0D=0A=0D=0AProblem #3 : Colons or not=3F=0D=
=0A=0D=0AThe next issue spans three W3C recommendations and it's a question=
 of=0D=0Acolons. In the XML Schema document, =0D=0A=0D=0A[the Name type i=
s] the set of all strings which *match* the Name=0D=0Aproduction of [XML 1.=
0 (Second Edition)]. =0D=0A=0D=0AFrom the XML spec, the Name production l=
ooks like this=0D=0A=0D=0ANameChar ::= Letter | Digit | '.' | '-' | '_' |=
 ':' | CombiningChar |=0D=0AExtender =0D=0AName ::= (Letter | '_' | ':'=
) (NameChar)* =0D=0A=0D=0AThe Name type has several derived types - ID, I=
DREF and ENTITY all of=0D=0Awhich are defined similarly and which have the =
same ambiguity. Let's use=0D=0AIDREF=0D=0A=0D=0AIDREF represents the IDREF =
attribute type from [XML 1.0 (Second=0D=0AEdition)]. The *value space* of I=
DREF is the set of all strings that=0D=0A*match* the NCName production in [=
Namespaces in XML]. The *lexical=0D=0Aspace* of IDREF is the set of strings=
 that *match* the NCName production=0D=0Ain [Namespaces in XML].=0D=0A=0D=0A=
>From the [Namespaces in XML] spec then, the basic gist of the NCName=0D=0Ap=
roduction is that it's the same as the Name production in [XML 1.0=0D=0A(Se=
cond Edition)] but without the colons=0D=0A=0D=0ANCNameChar ::= Letter | =
Digit | '.' | '-' | '_' | CombiningChar |=0D=0AExtender =0D=0ANCName ::==
 (Letter | '_') (NCNameChar)* =0D=0A=0D=0AOK=3F Name with colons. NCName =
without. Now the XML spec defines the IDREF=0D=0Aattribute type as follows=0D=
=0A=0D=0AValues of type IDREF must match the Name production....=0D=0A=0D=0A=
So then, values of the schema type IDREF which cannot have colons must=0D=0A=
be able to represent XML IDREF attributes which can have colons. Is it=0D=0A=
me or is there potential for a problem with that=3F I realise that=0D=0A're=
present' doesn't mean 'be the same as' but still.=0D=0A=0D=0AProblem #4 : S=
ingle spaces or more=3F=0D=0A=0D=0ALast issue is another ambiguity which co=
uld be easily sorted if the W3C=0D=0Aever revised the Schema docs. At the b=
ottom of the string type=0D=0Aderivation tree are two 'plural' types, IDREF=
S and ENTITIES. Both are=0D=0Adefined in the same way, so let's use IDREFS.=0D=
=0A=0D=0AIDREFS represents the IDREFS attribute type from [XML 1.0 (Second=0D=
=0AEdition)]. The *value space* of IDREFS is the set of finite,=0D=0Anon-ze=
ro-length sequences of IDREFs. The *lexical space* of IDREFS is=0D=0Athe se=
t of space-separated lists of tokens, of which each token is in=0D=0Athe *l=
exical space* of IDREF.=0D=0A=0D=0AFor me at least, the ambiguity is in the=
 word "space-separated". How=0D=0Amany spaces=3F Whitespace in general or l=
iterally just the space=0D=0Acharacter, \x20=3F Again, we have to consult t=
he XML specification to get=0D=0Athe answer where we're told=0D=0A=0D=0Aval=
ues of type IDREFS must match [the] Names [production]=0D=0A=0D=0Aand [the]=
 Names [production] reveals that it means each IDREF must be=0D=0Aseparated=
 by a single \x20 character only else the string isn't a valid=0D=0AIDREFS =
type string.=0D=0A=0D=0ANames   ::=   Name (#x20 Name)* =0D=0A=0D=0A So=
 why can't the schema spec just say something like =0D=0A=0D=0AThe *lexic=
al space* of IDREFS is the set of lists of tokens each=0D=0Aseparated by a =
single \x20 character,....=0D=0A=0D=0Aand take the ambiguity out of the sta=
tement=3F=0D=0A=0D=0AThanks,=0D=0A=0D=0ADan Maharry=0D=0A=0D=0AP.S. This is=
 formatted slightly better online at=0D=0Ahttp://blogs.ipona.com/dan/archiv=
e/2007/05/17/8381.aspx=0D=0A=0D=0A=0D=0AThe Midcounties Co-operative is an =
innovative co-operative business, owned by its customers and staff in the 9=
 counties it spans. We trade in a=0D=0Anumber of retail sectors including f=
ood, travel, funerals, motors, childcare, pharmacy, post offices and IT. We=
 are proud to be a successful=0D=0Aco-operative, founded on co-operative va=
lues and principles that co-ops share throughout the world.=0D=0A=0D=0AThis=
 e-mail is confidential and is for the named recipient(s) only. If you are =
not the named recipient(s) please do not disseminate or copy this=0D=0Ae-ma=
il, but please delete it and any copies from your computer. The Midcounties=
 Co-operative has taken reasonable precautions to ensure that=0D=0Aany atta=
chment to this e-mail has been checked for viruses. However, we cannot acce=
pt liability for any damage sustained as a result of=0D=0Aany such viruses =
and advise you to carry out your own virus checks before opening any attach=
ment. Furthermore, we do not accept responsibility for any=0D=0Achange made=
 to this message after it was sent by the sender. =0D=0A=0D=0A*** The Mid=
counties Co-operative works to protect our environment ***  =0D=0A*** Ple=
ase don't print this e-mail unless you really need to ***=0D=0A=0D=0AThis M=
essage has been Scanned by SurfControl(c) Email Filter=0D=0A


From mike@s... Sat Jun


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent