Altova Mailing List Archives>Archive Index >xml-dev Archive Home >Recent entries >Thread Prev - Re: [xml-dev] XML not ideal for Big Data >Thread Next - Re: [xml-dev] XML not ideal for Big Data Re: [xml-dev] XML not ideal for Big DataTo: Liam Quin <liam@--.---> Date: 9/3/2009 8:31:00 PM I'd also raise another point here, and this is an issue that I've had before in the discussion of streaming: There comes a certain stage where it makes far more sense to plan your data strategies around an XML database, use some intelligent indexing to decompose complex documents into simpler ones, and use XQuery and the like to more effectively manage that data in a more cohesive manner. I keep encountering these stories and shake my head - it's like someone who keeps a 100 MB CSV file and runs a Perl parser on it in lieu of storing this into a relational database, then complains that SQL Relational databases are just too damned poor at handling data management, when reality it's usually that they are not willing to make the investment to learn SQL properly and prefer working in Perl, and not willing to invest the time to build efficient data architectures. This is also a reason why I think that XML best practices will ultimately end up promoting XML RESTful Services (aka XRX or MODS) and XML Databases over method (and fomat dependent) SOA systems. Store your internal data in an XML repository, assign URLs to collections as well as individual entries, let each resource in those collections have both multiple input and output representations and so forth, and bind XQuery operations to each representation. This means it doesn't matter whether your wireformat is JSON or XML or HTML or YAML - so long as you have the relevant representation processors, the internal data abstraction and querying remain the same. Of course, that would require that the author spend some time learning XQuery. What is it about programmers that if the code isn't in their absolute favorite language then they think the technology sucks? Kurt Cagle Managing Editor http://xmlToday.org On Thu, Sep 3, 2009 at 11:59 AM, Liam Quin <liam@w...> wrote: > On Thu, Sep 03, 2009 at 11:53:40AM -0400, Simon St.Laurent wrote: > > Perhaps there were better ways to have made XML work with his > > problems... but I think on the whole he's right. > > > > http://dataspora.com/blog/xml-and-big-data/ > > Nonsense, XML is perfect! :-) > > OK, I'll be serious. > > Today, loading a few tens of gigabytes of XML into > Oracle or DB2 or SQL Server isn't likely ot be such a > huge bottleneck in performance (and if you find yourself > loading data into a database on a daily basis, you > should ask yourself why you are using the database). > > Of course, today one could use MarkLogic, Qixz, DB XML, > or any of a number of other native-XML databases. > > There's nothing to say one has to use XML of course. > > [[ > In its natural habitat, data lives in relational databases or as data > structures in programs. The common import and export formats of these > environments do not resemble XML, so much effort is dedicated to making > XML fit. > ]] > > I'd argue that there's typically more information _outside_ the > databases, in documents. Documents are data too. > > The articles complaint that the redundancy of XML tags is a > bad thing is misplaced: there's a trade-off between making > the data robust against errors, and easy to debug, vs size. > > People write to me every so often and say we should bring > back </>and I give them an example like, > <title>Simon Green</author> > <author>Jennifer Lumpnose</title> > > With </> we get > <title>Simon Green</> > <author>Jennifer Lumpnose</> > > and there's no XML error. But the correct markup > should have been > <author>Simon Green</author> > <title>Jennifer Lumpnose</title> > > That is, it was the start tags that the programmer had > transposed by mistake. Using </> reduces the chance of > catching that error considerably, and there's often no > automatic checking available. > > Of course, extreme crazy tagging is a disease all too > common -- I've done it too -- no argument there. > > The argument about "we already know LaTeX and don't > want to learn something else" carries weight in the > present, but if the future is longer than the past, > it's an _awful_ lot longer than the present! > > For my part I'd rather receive a terabyte of data in XML > than a gigabyte of undocumented binary data -- if bit 3 is > set then the 9 following bits represent the number of bytes > in the header of the next chunk, unless the current chunk is > the last in a segment, in which case the next header will be > 168 bytes in length... no thanks. The month you spend > writing the software to read it, and the six months you > spend deugging, and the time everyone else working with > the format does the same, will never be paid back in > most cases. > > It'll be interesting to see if "Efficient XML Interchange" > makes a difference here, though. > > I think the bottom line is that badly-done XML projects > are bad, but a mediocre XML prject is often better than > a mediocre or even good project with entirely custom formats. > > Liam > > -- > Liam Quin, W3C XML Activity Lead, http://www.w3.org/People/Quin/ > http://www.holoweb.net/~liam/ <http://www.holoweb.net/%7Eliam/> * > http://www.fromoldbooks.org/ > > _______________________________________________________________________ > > XML-DEV is a publicly archived, unmoderated list hosted by OASIS > to support XML implementation and development. To minimize > spam in the archives, you must subscribe before posting. > > [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ > Or unsubscribe: xml-dev-unsubscribe@l... > subscribe: xml-dev-subscribe@l... > List archive: http://lists.xml.org/archives/xml-dev/ > List Guidelines: http://www.oasis-open.org/maillists/guidelines.php > > | ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
