Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: [xml-dev] Shredding XML

From: michael odling-smee <mike.odlingsmee@-----.--->
To: Fraser Goffin <goffinf@----------.--->
Date: 11/3/2009 12:52:00 PM
Fraser,

I was thinking about this problem whilst stuck in a traffic jam this morning
- and have devised the approach I would take. I will leave it to you to
decide whether it is appropriate for the concrete scenario you face.

To start with I would probably put the XML aside and work out how I would
represent the same information in a relational way - what are the
significant entities and what attributes do they have? IMHO I believe that
any tooling based "auto-magical mapping" approach tends to make compromises
such as flexibility versus simplicity. By way of example if your XML
represented a purchase order you could imagine having tables such as
PURCHASE_ORDER, LINE_ITEM, ADDRESS (delivery, invoice etc.). For the
attributes I would suggest that you make a first guess as to what
information the MI reports are likely to contain and make sure that these
are represented as columns within your primary entities. To allow for
extensibility you could define some additional tables (e.g.
PURCHASE_ORDER_ADDITIONAL_INFO, LINE_ITEM_ADDITIONAL_INFO) that allows any
additional information to be stored as key value pairs.

Once you have devised the significant entities the problem becomes one of
mapping the XML into the schema. This is where technology choices could be
taken - do you directly map XML -> DB, XML -> Object (e.g. via Java) -> DB.
Seeing as your requirements indicate that you only want to put items in the
DB for reporting purposes I am not sure that an object layer adds anything.
To provide flexibility I would use XSLT to map the XML directly into INSERT
SQL statements that could then be run against the DB (you could take
advantage of Saxon's SQL extension functions for this). You have mentioned
that you may need to join some of the new information to existing records -
perhaps some Saxon or java extension functions embedded in the XSLT could
serve here (e.g. to do the address/postcode lookup). Alternatively you could
delegate this to a stored procedure on the DB in which case the XSLT would
probably require an extension function to call the stored procedure. I would
also zip up the XML (to reduce storage overhead) and store it as BLOB for
retrieval if other information is required in the future.

Now let's consider some change scenarios.

1.] An ad-hoc/infrequent report needs to contain a new field that is not
currently extracted and stored within the DB. The new field does not need to
be available for retrospective reporting.

Solution: modify the XSLT slightly to extract the additional information
which can then be stored in you extension tables. Optionally you could
create a materialised view for reporting convenience. Update or create a
report that then uses the new information.

2.] The MI dashboard/BAM report (runs frequently) is missing a significant
bit of information. This needs to be extracted from the XML and is also
required for retrospective analysis.

Solution: Create a new column on the appropriate table. Update the XSLT to
map the relevant information into this new column for all new transactions.
Write an upgrade tool that pulls out the original XML from the BLOB, runs an
XSLT to create an UPDATE script that populates this new column. This
upgrading tool could run as a background task (possible working in reverse
chronological order) that would enrich the level of information available
for reporting over time. Note: this may not be appropriate if your
transaction volumes are high where the tooling could never keep up.

In summary I think the combination of XSLT and a DB schema with deliberate
extensibility factored in balances flexibility with simplicity.

Michael
www.xml-solutions.com


On Tue, Nov 3, 2009 at 9:40 AM, Fraser Goffin <goffinf@g...>wrote:

> > Do your applications that use the relational database need to update
>
> Thats a difficult one. As far as producing up-stream interfaces, no,
> they are created just by selecting from the database.
>
> However, as I mentioned (lost in the detail) earlier, messages
> received will have a relationship to existing records. I'm not at all
> convinced that update will be possible because not all of the keys
> will be available in the message received. Some form of 'fuzzy'
> matching may be possible in some case, for example, if I receive a new
> address then the 1st line + PostCode + the relationship of this
> address to something else may be sufficient.
>
> It might be the case that we have to consider time based snap-shots,
> i.e. the 'state' of a collection of data at a point in time. Not sure
> yet, its one of those areas of difficulty that probably doesn't change
> much whether I use relational of native XML stores.
>
> At present the primary purpose of the database is for retreival, i.e.
> its NOT an operational store, its for MI, so optimiastions for read
> rather than update are appropriate.
>
> > do they need to produce reports that present the information in multiple
> ways
>
> Yes.
>
> Fraser.
>
> 2009/11/3 Jim Tivy <jimt@b...>:
> > Do your applications that use the relational database need to update and
> do
> > they need to produce reports that present the information in multiple
> ways
> > hierarchial ways?  This may suggest whether your database needs to be
> > normalized (approaching 3rd Normal form)?
> >
> > -----Original Message-----
> > From: Fraser Goffin [mailto:goffinf@g...]
> > Sent: Monday, November 02, 2009 3:31 PM
> > To: xml-dev@l...
> > Subject: Re: [xml-dev] Shredding XML
> >
> > Hi Jim,
> >
> > thats interesting ... which should be the 'driving' schema, XML or Db ?
> >
> > I guess I've been somewhat tiptoe'ing around this one.
> >
> > I should admit my bias if its not already apparent. I work mainly in
> > the SOA integration space and since XML is the primary exchange format
> > and XML schema does a reasonable job as the type system, I favour
> > processing XML as .. well XML ... whilst I understand the argument
> > around leveraging existing technologies and skillset .. often-times
> > this is little more than protectionism and continually [de]composing
> > from XML to objects then to CopyLibs and then to relational just seems
> > unnecessary a lot of the time (sorry - soap box over).  But of course
> > the whole world isn't XML and just like most other large organisations
> > the vast majority of our processing capability and data isn't and
> > probably never will be ... I have no issue with that.
> >
> > On the one hand it is the end product that drives the design (even if
> > that design has a relatively short shelf-life ... but hey, we all do
> > agile right). In that case it is definately the Database schema that
> > prevails from the pure delivery point of view, since this is the
> > desired source for the staging area from which to produce interface
> > files for upstream applications. At present there appears to be no
> > possiblility of revisiting that choice. At the same time, I don't want
> > to 'paint myself into a corner' or promote this as an exemplar for all
> > future approaches (unless it turns out that way :-)
> >
> > My unease is around the brittleness of the database schema in the face
> > of change, but I suppose that situation is almost inevitable since I
> > can't crystal-ball what changes might be coming along next week and
> > its probably folly to try. XML changes that dynamic, but not
> > completely.
> >
> > I have been having this internal debate about, .. if I concede I'm
> > going to have a relational database then should its design be derived
> > from the XML schema or should the XML schema change to accomodate the
> > database, indeed one of the Solution Designers on this has already
> > indicated a desire to 'flatten' the XML schema (although I have to say
> > I disagree that it is necessary). I have some degree of opportunity to
> > change the XML schema (although messages
> > are received from external sources, within reason, I can transform
> > them into any 'shape' I like so long as thats a loss-less exchange).
> > The database is green field so it can be any shape, but clearly some
> > designs are going to lend themsleves better than others to XML mapping
> > I would have thought ?
> >
> > Surely there are some structures in XML that don't map
> > straight-forwardly. Ted Neward called this the 'last mile' (a familiar
> > term to us all I'm sure), where the illusion of a high fidelity
> > solution draws us in, and indeed 80%+ appears to go quite well, but
> > that last few % hold a disproportiate cost and increasing complexity
> > (but you don't realise that until late on at which point some are
> > going to object to a rethink). I want to know where that 'last mile'
> > lives so I can try and avoid it !
> >
> > Fraser.
> >
> > 2009/11/2 Jim Tivy <jimt@b...>:
> >> Fraser
> >>
> >> I am not entirely hearing firm commitment that you plan to establish an
> > RDB
> >> schema and make it the driving schema.  In other words, what this would
> > mean
> >> is that data elements cannot be put into the RDB unless they exist in
> the
> >> RDB schema.  For example, if some new data elements show up in some
> > external
> >> XML to be imported then the DBA decides whether to allow them into the
> >> appropriate RDB column or not, or drop them for the time being.
> >>
> >> Another option (from the infinite number) would be to let the XML schema
> >> generate the RDB schema and the mapping code.  For your application
> >> programmers using SQL on the RDB this would likely lead to gagging and
> >> hacking and an "out of body experience"  This is not something I would
> >> recommend and if this is what you want then get a database that supports
> >> XQuery and retrain your developers.
> >>
> >> But I think you have to choose between these two - the first being what
> it
> >> sounds like you want - then work backwards from that decision.
> >>
> >> Jim
> >>
> >> -----Original Message-----
> >> From: Fraser Goffin [mailto:goffinf@g...]
> >> Sent: Monday, November 02, 2009 12:22 PM
> >> To: xml-dev@l...
> >> Subject: Re: [xml-dev] Shredding XML
> >>
> >> Yes Jim, that is spot on.
> >>
> >> Whilst there has been much discussion thus far on the technolgies and
> >> techniques of getting data out of the database (and that has been
> >> interesting), the programming for doing so are 'bread and butter' to
> >> our mainframe Cobol and Sapiens guys, so thats not really my problem.
> >>
> >> Mine is the task of getting the data from a fairly complex XML content
> >> model into an appropriately factored relational database. The design
> >> of that database is 'green field' but (and thanks to many on this
> >> thread who have posted related papers) this may not be as easy at it
> >> might at first appear, what with impedence mismatches here there and
> >> everywhere ;-)
> >>
> >> Its also the case that the XML data doesn't contain enough data
> >> inherently to represent primary or foreign key values for all of the
> >> relationships that are likely to arise. In some cases I MAY be
> >> permitted to generate them myself (say using a UUID) as I 'walk' the
> >> XML, in other cases I MAY be required to get the database to provide
> >> the value(s), not sure yet. The later may increase the complexity
> >> somewhat (sidenote: our DBAs don't allow stored procs (don't ask)  so
> >> I'm going to be doing whole bunches of INSERTs as part of the
> >> tree-walk I suspect)
> >>
> >> I'm really interested in the gotchas and best practices. Some have
> >> already been mentioned like the fact that the XML schema may define
> >> optional items and unrestricted length facets and such like. Others
> >> I've seen in reading talk about the mis-match of identity approaches
> >> (although this was talking primarily about OO/Relational mapping but
> >> the idea is similar I suspect). This could be important, since some
> >> messages received may 'relate' to others already loaded and, given
> >> what I said about not having all of the data in the XML to form all of
> >> the keys, this might be a significant problem.
> >>
> >> It is my intention to look into other options (we have recently
> >> acquired DB2 v9 which includes pureXML) but as is so often the case,
> >> the immediate project delivery pressures won't allow it. The PM is
> >> very nervous about using any new tech, perhaps justifiably, but my
> >> sense of unease is more to do with the perhaps misplaced assumption
> >> that 'tried and tested' tech like relational databases will always
> >> provide a workable solution, imho sometimes they actually represent
> >> the most significant constraint.
> >>
> >> So yes, back to the actual problem. How to come up with a database
> >> design that provides the capability of staging the shredded XML in a
> >> reasonable efficient manner and enables it to be loaded from XML
> >> instances received, again efficiently (ideally without 100's of tables
> >> and joins to negotiate). As far as efficiency of storage, well that
> >> MAY be a concern although perhaps not a huge one so long as the Db
> >> doesn't bloat up too much if normalisation is preferred over extra
> >> tables.
> >>
> >> Please add your thoughts and suggestions and experiences as you are
> >> able. Nothing is too trivial (or rude) to mention (i.e. if you want to
> >> say don't do this if you want to keep your sanity, thats ok).
> >>
> >> regards
> >>
> >> Fraser.
> >>
> >> I'm
> >>
> >>
> >> 2009/11/1 Jim Tivy <jimt@b...>:
> >>> Interesting post, but I am not sure that "now is the time to talk of
> many
> >>> things".
> >>>
> >>> Let me try to focus:
> >>>
> >>> Proper software execution comes from the choice of appropriate
> >>> actions/technologies to match the driving requirements.  But more
> >>> importantly, the greatest Wisdom, is to frame the driving requirements
> >>> correctly before "going off half cocked" or doing something that is
> >>> unnecessary and unwarranted.
> >>>
> >>> So lets start by framing the requirements again:
> >>>
> >>> Fraser Gofin wrote:
> >>>
> >>> "
> >>> The basics are we receive XML messages from an external trading partner
> >> and
> >>> process those messages, enriching and routing to a number of internal
> >>> subscriber applications. One of these applications is MI and the deal
> > here
> >>> is that they want the data to been put into a relational database so
> that
> >>> they can create a number of interfaces 'files' which are sent to still
> >> more
> >>> applications.
> >>> "
> >>>
> >>> OR
> >>>
> >>> "
> >>> I am mainly interested in the process of LOADING XML data to a database
> >>> rather than extracting (at least for the purposes of this discussion).
> >>> "
> >>>
> >>> It is possible that the "mother persistent application datamodel" is
> >>> contained in the relational database in all its normalized glory.  If
> so,
> >>> then, "processing the messages" is simply a "data import" operation.
>  So
> >> the
> >>> question is, how do I get XML X* to tables T*.  It would strike me that
> >> lots
> >>> of people are doing this.  Are there common techniques and technologies
> >> for
> >>> doing this import?
> >>>
> >>> Fraser, is that a proper framing of the question/requirements?
> >>>
> >>> Jim
> >>>
> >>>
> >>> -----Original Message-----
> >>> From: Petite Abeille [mailto:petite.abeille@g...]
> >>> Sent: Sunday, November 01, 2009 9:56 AM
> >>> To: xml-dev@l...
> >>> Subject: Re: [xml-dev] Shredding XML
> >>>
> >>>
> >>> On Oct 29, 2009, at 10:20 PM, Fraser Goffin wrote:
> >>>
> >>>> opinions on the subject of decomposing XML into relational databases
> >>>
> >>> Outside of the most trivial case, this is a major PITA of the same
> >>> epic proportion as the object-relational one:
> >>>
> >>>
> >
> http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
> >>>
> >>> Good luck.
> >>>
> >>>
> >>>
> >>> _______________________________________________________________________
> >>>
> >>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> >>> to support XML implementation and development. To minimize
> >>> spam in the archives, you must subscribe before posting.
> >>>
> >>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> >>> Or unsubscribe: xml-dev-unsubscribe@l...
> >>> subscribe: xml-dev-subscribe@l...
> >>> List archive: http://lists.xml.org/archives/xml-dev/
> >>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________________________________
> >>>
> >>> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> >>> to support XML implementation and development. To minimize
> >>> spam in the archives, you must subscribe before posting.
> >>>
> >>> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> >>> Or unsubscribe: xml-dev-unsubscribe@l...
> >>> subscribe: xml-dev-subscribe@l...
> >>> List archive: http://lists.xml.org/archives/xml-dev/
> >>> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
> >>>
> >>>
> >>
> >> _______________________________________________________________________
> >>
> >> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> >> to support XML implementation and development. To minimize
> >> spam in the archives, you must subscribe before posting.
> >>
> >> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> >> Or unsubscribe: xml-dev-unsubscribe@l...
> >> subscribe: xml-dev-subscribe@l...
> >> List archive: http://lists.xml.org/archives/xml-dev/
> >> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
> >>
> >>
> >>
> >>
> >
> > _______________________________________________________________________
> >
> > XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> > to support XML implementation and development. To minimize
> > spam in the archives, you must subscribe before posting.
> >
> > [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> > Or unsubscribe: xml-dev-unsubscribe@l...
> > subscribe: xml-dev-subscribe@l...
> > List archive: http://lists.xml.org/archives/xml-dev/
> > List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
> >
> >
> >
> >
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@l...
> subscribe: xml-dev-subscribe@l...
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent