Altova Mailing List Archives

Re: Strong Typing in SGML and XML

From: Richard Light <richard@-----.-----.--.-->
To: Eric Albright <eric_albright@-------.--->
Date: 5/7/1997 5:15:00 PM
In message <199705070346.UAA02957@m...>, Eric Albright
<eric_albright@s...> writes
>Having said that, I ask when is strong data typing necessary? As far as I
>can tell there is only one place where it is useful -- when the document is
>being created or altered. There will always be data validation that cannot
>be handled by data typing and as such must be delegated to a validating
>application or a human. e.g.

>From a museum perspective, we have found the need for two types of data
validation/strong typing, which we call 'syntax control' and 'vocabulary

Syntax control deals with things like the form of personal names.  These
are _not_ analysed in our application, but expressed in a consistent way
suitable for alphabetical sorting, e.g.:

        Light, Richard B.
rather than
        Richard B. Light

The syntax check would pick up non-capitalised words (apart from a 'stop
list' of known weak prefixes), inconsistent use of full stop and/or
spaces after initials, etc.  This starts to be hard work for a regular
expression, and might more easily be supported as a 'notation', for
which an external helper applet is called up in the context of editing.

Vocabulary control involves checking the data content against an
external authority, which could be a simple termlist or a complex

Another use we make of data syntax is as a short-cut for markup.  (This
was before we knew about SGML, by the way!  The conventions were
originally devised to make optimal use of A5 catalogue cards ...)  We
use colons as a 'field separator', e.g.:

        <person>maker : Light, R.B.</person>
                <persname>Light, R.B.</persname>

and ampersands (definitely pre-SGML!) as keyword separators:

        <place>Burgess Hill & W. Sussex & U.K.</place>
                <placename>Burgess Hill</placename>
                <placename>W. Sussex</placename>

These practices tie in with the SGML concept of short references, which
are not available in XML.  So a general conclusion I have come to is
that ':' and '&' need to be mapped to suitable subelements, and our
users need to come to terms with more heavily tagged records than they
are used to.  

This is relevant (really!) in the context of Tim's suggestion that
strong typing should apply only to PCDATA-only elements.  In the more
general case of 'data validation' we might well want to validate
elements with substructure.

Richard Light
SGML and Museum Information Consultancy
3 Midfields Walk 
Burgess Hill
West Sussex RH15 8JA
tel. (44) 1444 232067

xml-dev: A list for W3C XML Developers
Archived as:
To unsubscribe, send to majordomo@i... the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@i...)


These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.