Altova Mailing List Archives>Archive Index >xmlschema-dev Archive Home >Recent entries >Thread Prev - Newbie question. It is posible do complex type content dependent of a attribute value without xsi:type >Thread Next - RE: Discrepancies in the W3C Schema docs? Discrepancies in the W3C Schema docs?To: <xmlschema-dev@--.---> Date: 6/8/2007 5:24:00 PM All I did was try to write a small set of extension methods to validate=0D=0A= whether a given string was valid according to the built-in schema string=0D= =0Atypes and the editor in me comes out and starts nit picking. The W3C=0D=0A= Schema docs are very good but sometimes annoyingly ambiguous without a=0D=0A= degree in lateral thinking. =0D=0A=0D=0AProblem #1 : Is "" valid=3F=0D=0A=0D= =0ASection 3.2.1 says =0D=0A=0D=0AThe *value space* of string is the set = of finite-length sequences of=0D=0Acharacters (as defined in [XML 1.0 (Seco= nd Edition)]) that *match* the=0D=0AChar production from [XML 1.0 (Second E= dition)].=0D=0A=0D=0ASo, is the empty string valid then=3F Taking this defi= nition on spec, the=0D=0Aanswer seems to depend on what 'finite-length' mea= ns. According to the=0D=0Adictionary finite means =0D=0A=0D=0A1.having bo= unds or limits; not infinite; measurable.=0D=0A2.Mathematics. =0D=0A=0D=0A= (of a set of elements) capable of being completely counted. =0D=0Anot inf= inite or infinitesimal. =0D=0Anot zero. =0D=0A=0D=0ASo maybe an empty s= tring isn't valid then=3F The dictionary implies it.=0D=0AAlas, no. The XML= Schema spec at the top of section 4 also states =0D=0A=0D=0AAny property= identified as a having a set, subset or *list* value may=0D=0Ahave an empt= y value unless this is explicitly ruled out:this is not the=0D=0Asame as ab= sent.=0D=0A=0D=0AOK, so the empty string is valid as a string but could the= W3C please=0D=0Alink to this last note about sets containing the empty val= ue from the=0D=0Amany uses of the word 'set' around the document please=3F = Either that or=0D=0Adefine the phrase 'finite-length' in situ as 'zero or g= reater'.=0D=0A=0D=0AProblem #2 : In which string data types is "" invalid=3F=0D= =0A=0D=0AThe problem with the note about sets is that it states a type must=0D= =0Aexplicitly rule the empty string as invalid before it really is invalid.=0D= =0ABut what about it being implied elsewhere but not in black and white as,=0D= =0Asay the value space of the NMTOKENS type=3F=0D=0A=0D=0ANMTOKENS represen= ts the NMTOKENS attribute type from [XML 1.0 (Second=0D=0AEdition)]. The *v= alue space* of NMTOKENS is the set of finite,=0D=0Anon-zero-length sequence= s of *NMTOKEN*s=0D=0A=0D=0ALet's go one step back up the type hierarchy to = the NMTOKEN type. =0D=0A=0D=0ANMTOKEN represents the NMTOKEN attribute ty= pe from [XML 1.0 (Second=0D=0AEdition)]. The *value space* of NMTOKEN is th= e set of tokens that=0D=0A*match* the Nmtoken production in [XML 1.0 (Secon= d Edition)].=0D=0A=0D=0ANo explicit mention of non-zero-length anything her= e. But the definition=0D=0Aof the NMTOKEN in XML 1.0 says that it should co= nsist of one or more=0D=0Acharacters.=0D=0A=0D=0ANameChar ::= Letter | Di= git | '.' | '-' | '_' | ':' | CombiningChar |=0D=0AExtender =0D=0ANmtoken= ::= (NameChar)+ =0D=0A=0D=0ABy those rules, a valid NMTOKEN cannot be = empty even if the writer or=0D=0Athe schema sets minLength to 0. The same l= ogic applies to the Language=0D=0Aand Name string types in the schema defin= ition as well so if none of=0D=0Athem can be empty, neither can NCName, ID,= IDREF, IDREFS, ENTITY or=0D=0AENTITIES either despite the fact that only I= DREFS and ENTITIES are the=0D=0Aonly of these to also mention valid types t= o be non-zero-length=0D=0Aexplicitly. =0D=0A=0D=0ASo then, what phrase is= missing from "must explicitly rule the empty=0D=0Astring as invalid" becau= se it's definitely not all there.=0D=0A=0D=0AProblem #3 : Colons or not=3F=0D= =0A=0D=0AThe next issue spans three W3C recommendations and it's a question= of=0D=0Acolons. In the XML Schema document, =0D=0A=0D=0A[the Name type i= s] the set of all strings which *match* the Name=0D=0Aproduction of [XML 1.= 0 (Second Edition)]. =0D=0A=0D=0AFrom the XML spec, the Name production l= ooks like this=0D=0A=0D=0ANameChar ::= Letter | Digit | '.' | '-' | '_' |= ':' | CombiningChar |=0D=0AExtender =0D=0AName ::= (Letter | '_' | ':'= ) (NameChar)* =0D=0A=0D=0AThe Name type has several derived types - ID, I= DREF and ENTITY all of=0D=0Awhich are defined similarly and which have the = same ambiguity. Let's use=0D=0AIDREF=0D=0A=0D=0AIDREF represents the IDREF = attribute type from [XML 1.0 (Second=0D=0AEdition)]. The *value space* of I= DREF is the set of all strings that=0D=0A*match* the NCName production in [= Namespaces in XML]. The *lexical=0D=0Aspace* of IDREF is the set of strings= that *match* the NCName production=0D=0Ain [Namespaces in XML].=0D=0A=0D=0A= >From the [Namespaces in XML] spec then, the basic gist of the NCName=0D=0Ap= roduction is that it's the same as the Name production in [XML 1.0=0D=0A(Se= cond Edition)] but without the colons=0D=0A=0D=0ANCNameChar ::= Letter | = Digit | '.' | '-' | '_' | CombiningChar |=0D=0AExtender =0D=0ANCName ::== (Letter | '_') (NCNameChar)* =0D=0A=0D=0AOK=3F Name with colons. NCName = without. Now the XML spec defines the IDREF=0D=0Aattribute type as follows=0D= =0A=0D=0AValues of type IDREF must match the Name production....=0D=0A=0D=0A= So then, values of the schema type IDREF which cannot have colons must=0D=0A= be able to represent XML IDREF attributes which can have colons. Is it=0D=0A= me or is there potential for a problem with that=3F I realise that=0D=0A're= present' doesn't mean 'be the same as' but still.=0D=0A=0D=0AProblem #4 : S= ingle spaces or more=3F=0D=0A=0D=0ALast issue is another ambiguity which co= uld be easily sorted if the W3C=0D=0Aever revised the Schema docs. At the b= ottom of the string type=0D=0Aderivation tree are two 'plural' types, IDREF= S and ENTITIES. Both are=0D=0Adefined in the same way, so let's use IDREFS.=0D= =0A=0D=0AIDREFS represents the IDREFS attribute type from [XML 1.0 (Second=0D= =0AEdition)]. The *value space* of IDREFS is the set of finite,=0D=0Anon-ze= ro-length sequences of IDREFs. The *lexical space* of IDREFS is=0D=0Athe se= t of space-separated lists of tokens, of which each token is in=0D=0Athe *l= exical space* of IDREF.=0D=0A=0D=0AFor me at least, the ambiguity is in the= word "space-separated". How=0D=0Amany spaces=3F Whitespace in general or l= iterally just the space=0D=0Acharacter, \x20=3F Again, we have to consult t= he XML specification to get=0D=0Athe answer where we're told=0D=0A=0D=0Aval= ues of type IDREFS must match [the] Names [production]=0D=0A=0D=0Aand [the]= Names [production] reveals that it means each IDREF must be=0D=0Aseparated= by a single \x20 character only else the string isn't a valid=0D=0AIDREFS = type string.=0D=0A=0D=0ANames ::= Name (#x20 Name)* =0D=0A=0D=0A So= why can't the schema spec just say something like =0D=0A=0D=0AThe *lexic= al space* of IDREFS is the set of lists of tokens each=0D=0Aseparated by a = single \x20 character,....=0D=0A=0D=0Aand take the ambiguity out of the sta= tement=3F=0D=0A=0D=0AThanks,=0D=0A=0D=0ADan Maharry=0D=0A=0D=0AP.S. This is= formatted slightly better online at=0D=0Ahttp://blogs.ipona.com/dan/archiv= e/2007/05/17/8381.aspx=0D=0A=0D=0A=0D=0AThe Midcounties Co-operative is an = innovative co-operative business, owned by its customers and staff in the 9= counties it spans. We trade in a=0D=0Anumber of retail sectors including f= ood, travel, funerals, motors, childcare, pharmacy, post offices and IT. We= are proud to be a successful=0D=0Aco-operative, founded on co-operative va= lues and principles that co-ops share throughout the world.=0D=0A=0D=0AThis= e-mail is confidential and is for the named recipient(s) only. If you are = not the named recipient(s) please do not disseminate or copy this=0D=0Ae-ma= il, but please delete it and any copies from your computer. The Midcounties= Co-operative has taken reasonable precautions to ensure that=0D=0Aany atta= chment to this e-mail has been checked for viruses. However, we cannot acce= pt liability for any damage sustained as a result of=0D=0Aany such viruses = and advise you to carry out your own virus checks before opening any attach= ment. Furthermore, we do not accept responsibility for any=0D=0Achange made= to this message after it was sent by the sender. =0D=0A=0D=0A*** The Mid= counties Co-operative works to protect our environment *** =0D=0A*** Ple= ase don't print this e-mail unless you really need to ***=0D=0A=0D=0AThis M= essage has been Scanned by SurfControl(c) Email Filter=0D=0A From mike@s... Sat Jun | ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
