Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: [xml-dev] XML not ideal for Big Data

From: Kurt Cagle <kurt.cagle@-----.--->
To: Liam Quin <liam@--.--->
Date: 9/3/2009 8:31:00 PM
I'd also raise another point here, and this is an issue that I've had before
in the discussion of streaming: There comes a certain stage where it makes
far more sense to plan your data strategies around an XML database, use some
intelligent indexing to decompose complex documents into simpler ones, and
use XQuery and the like to more effectively manage that data in a more
cohesive manner.

I keep encountering these stories and shake my head - it's like someone who
keeps a 100 MB CSV file and runs a Perl parser on it in lieu of storing this
into a relational database, then complains that SQL Relational databases are
just too damned poor at handling data management, when reality it's usually
that they are not willing to make the investment to learn SQL properly and
prefer working in Perl, and not willing to invest the time to build
efficient data architectures.

This is also a reason why I think that XML best practices will ultimately
end up promoting XML RESTful Services (aka XRX or MODS) and XML Databases
over method (and fomat dependent) SOA systems. Store your internal data in
an XML repository, assign URLs to collections as well as individual entries,
let each resource in those collections have both multiple input and output
representations and so forth, and bind XQuery operations to each
representation.  This means it doesn't matter whether your wireformat is
JSON or XML or HTML or YAML - so long as you have the relevant
representation processors, the internal data abstraction and querying remain
the same.

Of course, that would require that the author spend some time learning
XQuery. What is it about programmers that if the code isn't in their
absolute favorite language then they think the technology sucks?



Kurt Cagle
Managing Editor
http://xmlToday.org


On Thu, Sep 3, 2009 at 11:59 AM, Liam Quin <liam@w...> wrote:

> On Thu, Sep 03, 2009 at 11:53:40AM -0400, Simon St.Laurent wrote:
> > Perhaps there were better ways to have made XML work with his
> > problems... but I think on the whole he's right.
> >
> > http://dataspora.com/blog/xml-and-big-data/
>
> Nonsense, XML is perfect! :-)
>
> OK, I'll be serious.
>
> Today, loading a few tens of gigabytes of XML into
> Oracle or DB2 or SQL Server isn't likely ot be such a
> huge bottleneck in performance (and if you find yourself
> loading data into a database on a daily basis, you
> should ask yourself why you are using the database).
>
> Of course, today one could use MarkLogic, Qixz, DB XML,
> or any of a number of other native-XML databases.
>
> There's nothing to say one has to use XML of course.
>
> [[
> In its natural habitat, data lives in relational databases or as data
> structures in programs. The common import and export formats of these
> environments do not resemble XML, so much effort is dedicated to making
> XML fit.
> ]]
>
> I'd argue that there's typically more information _outside_ the
> databases, in documents.  Documents are data too.
>
> The articles complaint that the redundancy of XML tags is a
> bad thing is misplaced: there's a trade-off between making
> the data robust against errors, and easy to debug, vs size.
>
> People write to me every so often and say we should bring
> back </>and I give them an example like,
>    <title>Simon Green</author>
>    <author>Jennifer Lumpnose</title>
>
> With </> we get
>    <title>Simon Green</>
>    <author>Jennifer Lumpnose</>
>
> and there's no XML error.  But the correct markup
> should have been
>    <author>Simon Green</author>
>    <title>Jennifer Lumpnose</title>
>
> That is, it was the start tags that the programmer had
> transposed by mistake.  Using </> reduces the chance of
> catching that error considerably, and there's often no
> automatic checking available.
>
> Of course, extreme crazy tagging is a disease all too
> common -- I've done it too -- no argument there.
>
> The argument about "we already know LaTeX and don't
> want to learn something else" carries weight in the
> present, but if the future is longer than the past,
> it's an _awful_ lot longer than the present!
>
> For my part I'd rather receive a terabyte of data in XML
> than a gigabyte of undocumented binary data -- if bit 3 is
> set then the 9 following bits represent the number of bytes
> in the header of the next chunk, unless the current chunk is
> the last in a segment, in which case the next header will be
> 168 bytes in length... no thanks.  The month you spend
> writing the software to read it, and the six months you
> spend deugging, and the time everyone else working with
> the format does the same, will never be paid back in
> most cases.
>
> It'll be interesting to see if "Efficient XML Interchange"
> makes a difference here, though.
>
> I think the bottom line is that badly-done XML projects
> are bad, but a mediocre XML prject is often better than
> a mediocre or even good project with entirely custom formats.
>
> Liam
>
> --
> Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/
> http://www.holoweb.net/~liam/ <http://www.holoweb.net/%7Eliam/> *
> http://www.fromoldbooks.org/
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@l...
> subscribe: xml-dev-subscribe@l...
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent