Altova Mailing List Archives


RE: [xml-dev] XML and entropy, again

From: "Roger L. Costello" <costello@-----.--->
To: <xml-dev@-----.---.--->
Date: 12/21/2004 7:03:00 PM
Shannon uses the term "entropy" as a measure of the amount of information.
The amount of information is proportional to the number of choices that the
sender has, or, likewise, the amount of uncertainty that the receiver has.
A high number of choices for the sender, and a high uncertainty for the
receiver means a high entropy (information). So, which of these approaches
has the greater entropy:

Approach #1

<Object>
    <Name>Roger L. Costello</Name>
    <HairColor>Red</HairColor>
    <SSN>123-45-6789</SSN>
    <Height>176 cm</Height>
    <Weight>74 kg</Weight>
</Object>

Approach #2

<Object>
    <hasA property="Name">Roger L. Costello</hasA>
    <hasA property="HairColor">Red</hasA>
    <hasA property="SSN">123-45-6789</hasA>
    <hasA property="Height">176 cm</hasA>
    <hasA property="Weight">74 kg</hasA>
</Object>

More accurately, which of these XML Schemas has the greater entropy:

Approach #1

<element name="Object">
    <complexType>
        <element name="Name" type="string"/>
        <element name="HairColor" type="string"/>
        <element name="SSN" type="string"/>
        <element name="Height" type="string"/>
        <element name="Weight" type="string"/>
    </complexType>
</element>

Approach #2

<element name="Object">
    <complexType>
        <element name="hasA">
            <complexType>
                <simpleContent base="string">
                    <attribute name="property" type="string"
use="required"/>
                </simpleContent>
            </complexType>
        </element>
    </complexType>
</element>

With Approach #1 there is virtually no choice for the sender - Object must
contain these properties: Name, HairColor, SSN, Height, and Weight.
Likewise, there is no uncertainty on the part of the receiver.  Thus, the
entropy (information) is low.

With Approach #2 there is no limit to the variety of properties that Object
can contain.  The sender has an unlimited choice of messages and the
receiver has a high uncertainty.  Thus, the entropy (information) is high.

Does this indicate that Approach #2 is superior?  Is entropy a way to
measure the value (quality) of Schemas? /Roger

Disclaimer

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.