Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: Structuring flat data

From: Peter Flynn <peter.nosp@-.--------.-->
To: NULL
Date: 3/5/2006 10:13:00 PM
ariana_paris@y... wrote:
> Cutting a long story short, I have some files in a rather flat XML
> structure which I now want to upgrade to a more sophisticated schema.

The problem of encapsulation can be solved using SGML if you can write a
DTD which allows missing start-tags and end-tags on the right selection
of elements. For example:

<!DOCTYPE xml [
<!ELEMENT xml - - (chapter+)>
<!ELEMENT chapter O O (title?,section+)>
<!ELEMENT section O O (p|break)*>
<!ELEMENT (title,p) - - (#PCDATA)>
<!ELEMENT break - O EMPTY>
]>
<xml>
   <title>Title 1</title>
     <p>Aaaaaa</p>
     <break>
     <p>Bbbbbb</p>
   <title>Title 2</title>
     <p>Cccccc</p>
</xml>

(converting the <break/> to the SGML format <break>). Running this 
through OSGMLNORM (part of SP, same place onsgmls comes from) using
the SGML Declaration for DocBook 3 with NAMECASE GENERAL NO gives:

<xml>
<chapter>
<title>Title 1</title>
<section>
<p>Aaaaaa</p>
<break>
<p>Bbbbbb</p>
</section>
</chapter>
<chapter>
<title>Title 2</title>
<section>
<p>Cccccc</p>
</section>
</chapter>
</xml>

You just need to convert <break> back to <break/> afterwards.

However, you would need to ensure that all non-ASCII characters are
given as numeric character references, or do some deep surgery on the
SGML Declaration to allow non-ASCII characters.

> The files can have the following structures:
> 
> Example 1:
> <xml>
>   <p>blablabla</p>
> </xml>

Gives:

<xml>
<chapter>
<section>
<p>blablabla</p>
</section>
</chapter>
</xml>


> Example 2:
> <xml>
>   <p>Aaaaaa</p>
>   <break/>
>   <p>Bbbbbb</p>
> </xml>

Gives:

<xml>
<chapter>
<section>
<p>Aaaaaa</p>
<break>
<p>Bbbbbb</p>
</section>
</chapter>
</xml>

> The point being that the files can have various combinations of titles
> and breaks, or none at all. I'm hoping I can get a one-pass solution 

Modulo the caveats on content models and character encoding, this should
work if you get the DTD right.

SGML does still have its uses :-)

///Peter
-- 
XML FAQ: http://xml.silmaril.ie/


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent