Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


RE: [xml-dev] Data versioning strategy: address semantic, relationship, and syntactic changes?

From: "Costello, Roger L." <costello@-----.--->
To: <xml-dev@-----.---.--->
Date: 12/20/2007 7:23:00 PM
Hi Folks,

Thanks for your excellent insights into the creation of a data
versioning strategy!  

I am still in the process of assimilating all of your ideas.  

The discussion has given me a glimpse into the immensity and complexity
of the "versioning strategy problem."

To help me cope with all the information, I have focused on a few
comments that were made. 

A FEW SELECT COMMENTS

Greg Hunt challenges us to think in terms of managing change as part of
a "business process":

> I think that you need to look at some other things, 
> semantics, structure and syntax are at too low a 
> level because useful version management needs to 
> be embedded in a business process or a set of 
> business agreements.

Greg notes that a change may not cause syntax problems or semantic
problem, but may nonetheless cause problems:

> A semantically non-breaking change for one class of 
> consumer might present problems for another.  Consider 
> a statistical data flow with a number of elements in it 
> that are not summed (e.g. a structure containing a count 
> of heart attacks, count of ambulance movements and a 
> textual status report).  On the face of it, in semantic 
> terms adding another statistical element for morbidity 
> should not be a problem if the element can be ignored.  
> However, someone out there will eventually try to count 
> instances of morbidity statistics.

Bruce Cox challenges us to create a change management strategy that
makes no assumptions about the consumers of the data:

> We cannot even dream of placing any constraints on the consumers of
the data.


CLARITY SOUGHT

What does this mean: "The version management needs to be embedded in a
business process"?

What does it mean: "Avoid placing constraints on consumers of the
data"?

Can we view an example of: "A semantically non-breaking change for one
class of consumer might present problems for another"?
 
EXAMPLE

Let's take an example to illustrate the ideas that Greg and Bruce are
raising.

Suppose that the Center for Disease Control (CDC) makes available data
about deaths in the U.S.  Here is sample data: 
 
VERSION 1 DATA

<deaths year="2004" source="http://www.cdc.gov/nchs/fastats/lcod.htm">
      <heart-disease>652486</heart-disease>
      <cancer>553888</cancer>
      <stroke>150074</stroke>
 
<chronic-lower-respitory-diseases>121987</chronic-lower-respitory-disea
ses>
      <accidents>112012</accidents>
      <diabetes>73138</diabetes>
      <alzheimers>65965</alzheimers>
      <influenza-and-pneumonia>59664</influenza-and-pneumonia>
 
<nephritis-and-nephrotic-syndrome-and-nephrosis>42480</nephritis-and-ne
phrotic-syndrome-and-nephrosis>
</deaths>

The data conforms to an XML Schema that the CDC created [see the schema
below].  Further, the CDC has documented the meaning of each piece of
data. [The document defines, for example, what is meant by "the number
of deaths due to accidents"]

Consumers of the CDC data happily use it.

Later, the CDC updates to also provide information on "the number of
deaths due to septicemia."  Here is a sample of the updated data: 

VERSION 2 DATA 

<deaths year="2004" source="http://www.cdc.gov/nchs/fastats/lcod.htm">
      <heart-disease>652486</heart-disease>
      <cancer>553888</cancer>
      <stroke>150074</stroke>
 
<chronic-lower-respitory-diseases>121987</chronic-lower-respitory-disea
ses>
      <accidents>112012</accidents>
      <diabetes>73138</diabetes>
      <alzheimers>65965</alzheimers>
      <influenza-and-pneumonia>59664</influenza-and-pneumonia>
 
<nephritis-and-nephrotic-syndrome-and-nephrosis>42480</nephritis-and-ne
phrotic-syndrome-and-nephrosis>
      <septicemia>33373</septicemia>
</deaths>

This data conforms to the CDC's updated XML schema, which now includes
a declaration of the <septicemia> element [see updated schema below].
The document containing the meaning of each piece of data is also
updated to define what is meant by "the number of deaths due to
septicemia."


BREAKAGE?

What will break as a result of the CDC adding the data on septicemia?


VALIDATE NEW DATA AGAINST OLD SCHEMA

Validation of the new data against the old XML Schema will result in
validation errors.  


AVERAGE NEW DATA AGAINST OLD COUNT OF DEATH CAUSES

In the version 1 data there are nine causes of death listed
(heart-disease, cancer, stroke, etc). An application which computes the
average number of deaths per cause by summing all the values and
dividing by nine will produce an incorrect answer with the new data. 

UNANTICIPATED PROBLEMS

We cannot anticipate or control what consumers of the data do with the
data or how they write their applications.  The new data could cause
problems that we cannot anticipate.   


LESSONS LEARNED?

1. Greg challenges us to think in terms of managing change as part of a
"business process."  What does this mean for the CDC example?  For
example, should the CDC post a "usage rules" to any consumers of its
data such as:

--> Do not validate the data

--> Anticipate new data will be added

2. Bruce challenges us to create a change management strategy that
makes no assumptions about the consumers of the data. What does this
mean for the CDC, which wants to add data about the number of deaths
due to septicemia? Can the CDC meet the challenge by simply setting up
two URLs, one for the old version and one for the new version?

/Roger 

----------------------------------------------
CDC VERSION 1 SCHEMA

<?xml version="1.0"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
        elementFormDefault="qualified">
    <element name="deaths">
        <complexType>
            <sequence>
                <element name="heart-disease" type="unsignedInt"/>
                <element name="cancer" type="unsignedInt"/>
                <element name="stroke" type="unsignedInt"/>
                <element name="chronic-lower-respitory-diseases"
type="unsignedInt"/>
                <element name="accidents" type="unsignedInt"/>
                <element name="diabetes" type="unsignedInt"/>
                <element name="alzheimers" type="unsignedInt"/>
                <element name="influenza-and-pneumonia"
type="unsignedInt"/>
                <element
name="nephritis-and-nephrotic-syndrome-and-nephrosis"
type="unsignedInt"/>
            </sequence>
            <attribute name="year" type="gYear"/>
            <attribute name="source" type="anyURI"/>
        </complexType>
    </element>
</schema>

CDC VERSION 2 SCHEMA

<?xml version="1.0"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema"
        elementFormDefault="qualified">
    <element name="deaths">
        <complexType>
            <sequence>
                <element name="heart-disease" type="unsignedInt"/>
                <element name="cancer" type="unsignedInt"/>
                <element name="stroke" type="unsignedInt"/>
                <element name="chronic-lower-respitory-diseases"
type="unsignedInt"/>
                <element name="accidents" type="unsignedInt"/>
                <element name="diabetes" type="unsignedInt"/>
                <element name="alzheimers" type="unsignedInt"/>
                <element name="influenza-and-pneumonia"
type="unsignedInt"/>
                <element
name="nephritis-and-nephrotic-syndrome-and-nephrosis"
type="unsignedInt"/>
                <element name="septicemia" type="unsignedInt"/>
            </sequence>
            <attribute name="year" type="gYear"/>
            <attribute name="source" type="anyURI"/>
        </complexType>
    </element>
</schema>


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent