Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: [xml-dev] text to xml conversion

From: Rick Jelliffe <rjelliffe@-------.---.-->
To: ycao5@---.--------.--
Date: 6/2/2009 5:14:00 AM
ycao5@s... wrote:
>
> Hello everyone,
>
>     I want to ask one question about covering text to xml file. Is 
> there any way to attach a schema to a text document and parse it into 
> xml according the rules defined in the schema? Can I find such kind a 
> tool, otherwise I plan to write one myself. Please give me some 
> references. Thanks.
There is one called SP, which is open source from James Clark.

It is parses data files using SGML configuration files and schema, and 
is suitable when that file contains Wiki kinds of markup  or CSV or 
other formats with explicit delimiters, but not so much for more 
free-form data. It is probably only worth using if you will have to do 
this kind of things many times.

See http://www.xml.com/lpt/a/1377   for an overview of this approach. SP 
is industrial strength.

You could convert your XML Schema to an XML DTD, then decorate it with 
information to make it an SGML DTD to say:

 1) Which delimiters in your text should be substituted for which tags
 2) In which contexts this recognition takes place
 3) Which tags won't have corresponding delimiters in your file and are 
allowed to be implied

The output is XML. SGML has many gotchas for new players, but if you 
aleady know HTML and XML and DTDs or XSD, then they will be much easier 
to cope with (SGML, XML's precursor, got a bad rep because people needed 
to learn the equivalent to XML + HTML bits + this kind of text parsing 
system all  at the same time.)

I also made a some software that wasn't based on grammars for doing this 
task: it was called Psyche in Java and Micah Dubinko also made an 
implementation of it (for .NET?) but we never released them. If there is 
interest I could drag it out again: it also requires delimiters.

Cheers
Rick Jelliffe

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@l...
subscribe: xml-dev-subscribe@l...
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php



transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent