Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: [xml-dev] Profiling, diff and change tracking best practices?

From: Lech Rzedzicki <xchaotic@-----.--->
To: xml-dev@-----.---.---
Date: 10/1/2009 4:12:00 PM
On Thu, Oct 1, 2009 at 4:40 PM, michael odling-smee
<mike.odlingsmee@g...> wrote:
>
> Funnily enough I have just started thinking about this for my own project
> with a similar use-case - i.e. understanding the changes between two
> different baselines of an XML document or XML document set.

Great to hear that - I was expecting just that - it is a common
fallacy in the computer world that developers do reinvent the wheel,
while all you need to do is a bit of google-fu and creative
discussion.
>
> My high-level thoughts so far are:
>
> 1.] Add suitable meta-data attributes (e.g. version/create and modify
> date/author) to fairly coarse grained components within the XML data model.

On a bit lower level, have you already though what would be a
complete-enough set of metadata that fits your requirements? I have
tried to follow the Dublin Core model, but it might be overly complex
for your purposes...
- Show quoted text -
> 2.] Create a baseline of the document or set of XML documents set by:
> 2.1] Creating a fairly light weight XML file (perhaps using XSLT) that only
> contains this meta-data. Save this to disk (i.e. create a memento of the
> meta-data)
> 2.2] Saving a copy of the original XML in a version control system/file
> system where it will not be edited further.
> 3.] Later on when trying to do a diff. between the original baseline and
> current:
> 3.1] Using the same mechanism as in step 2.1 create a new memento of the
> current XML document or set of XML documents
> 4.] Compare the two mementos reporting on changes - if required the baseline
> copy of the XML can be used to compute exactly what content has changed (I
> think you need add/delete and update) between the two versions.
>
> I am still undecided whether both the memento and document copy are required
> - logically the memento is not actually required. However the lightweight
> memento may prove useful if:
>
> The XML document or set of documents is very large such that it would not be
> desirable to store a complete copy of the document(s).
> To aid with deep differencing optimisation (especially relevant where there
> is a set of XML documents that you are comparing so you only have to parse
> files where differences occur).
> The diff. report is only meant to identify where differences are not what
> they are.
>
> Anyway I have only had early thoughts on the subject so would glady listen
> to any other suggestions that the community has to offer.

Sounds like a neat approach, but just like you, my initial feeling is
that separation of the metadata is an awkward thing to do indeed and
might make processing a bit too complex - after all to create a simple
delta document, you would need to compare the two mementos then go
back to the original files and locate the changes, I agree that it
might be necessary when dealing with large documents, but in such
cases, I suppose you could aplly stream processing like SAX instead,
especially for comparing things...

I don't know if that's the case in your environment, but in my
scenario, the raw XML is going to be maintained by people, so I am
striving for simplicity. The separation of metadata, like you propose
might mean a bit more complex processing, but the XML that people see,
could in effect be more managable, so I'll certainly have a think
about it...

Lech

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@l...
subscribe: xml-dev-subscribe@l...
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php



transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent