Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: [xml-dev] XML not ideal for Big Data

From: Liam Quin <liam@--.--->
To: "Simon St.Laurent" <simonstl@--------.--->
Date: 9/3/2009 7:00:00 PM
On Thu, Sep 03, 2009 at 11:53:40AM -0400, Simon St.Laurent wrote:
> Perhaps there were better ways to have made XML work with his 
> problems... but I think on the whole he's right.
> 
> http://dataspora.com/blog/xml-and-big-data/

Nonsense, XML is perfect! :-)

OK, I'll be serious.

Today, loading a few tens of gigabytes of XML into 
Oracle or DB2 or SQL Server isn't likely ot be such a
huge bottleneck in performance (and if you find yourself
loading data into a database on a daily basis, you
should ask yourself why you are using the database).

Of course, today one could use MarkLogic, Qixz, DB XML,
or any of a number of other native-XML databases.

There's nothing to say one has to use XML of course.

[[
In its natural habitat, data lives in relational databases or as data
structures in programs. The common import and export formats of these
environments do not resemble XML, so much effort is dedicated to making
XML fit.
]]

I'd argue that there's typically more information _outside_ the
databases, in documents.  Documents are data too.

The articles complaint that the redundancy of XML tags is a
bad thing is misplaced: there's a trade-off between making
the data robust against errors, and easy to debug, vs size.

People write to me every so often and say we should bring
back </>and I give them an example like,
    <title>Simon Green</author>
    <author>Jennifer Lumpnose</title>

With </> we get
    <title>Simon Green</>
    <author>Jennifer Lumpnose</>

and there's no XML error.  But the correct markup
should have been
    <author>Simon Green</author>
    <title>Jennifer Lumpnose</title>

That is, it was the start tags that the programmer had
transposed by mistake.  Using </> reduces the chance of
catching that error considerably, and there's often no
automatic checking available.

Of course, extreme crazy tagging is a disease all too
common -- I've done it too -- no argument there.

The argument about "we already know LaTeX and don't
want to learn something else" carries weight in the
present, but if the future is longer than the past,
it's an _awful_ lot longer than the present!

For my part I'd rather receive a terabyte of data in XML
than a gigabyte of undocumented binary data -- if bit 3 is
set then the 9 following bits represent the number of bytes
in the header of the next chunk, unless the current chunk is
the last in a segment, in which case the next header will be
168 bytes in length... no thanks.  The month you spend
writing the software to read it, and the six months you
spend deugging, and the time everyone else working with
the format does the same, will never be paid back in
most cases.

It'll be interesting to see if "Efficient XML Interchange"
makes a difference here, though.

I think the bottom line is that badly-done XML projects
are bad, but a mediocre XML prject is often better than
a mediocre or even good project with entirely custom formats.

Liam

-- 
Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/
http://www.holoweb.net/~liam/ * http://www.fromoldbooks.org/

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@l...
subscribe: xml-dev-subscribe@l...
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php



transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent