Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: [xml-dev] Profiling, diff and change tracking best practices?

From: michael odling-smee <mike.odlingsmee@-----.--->
To: Lech Rzedzicki <xchaotic@-----.--->
Date: 10/1/2009 3:41:00 PM
Hi Lech,

Funnily enough I have just started thinking about this for my own project
with a similar use-case - i.e. understanding the changes between two
different baselines of an XML document or XML document set.

My high-level thoughts so far are:

1.] Add suitable meta-data attributes (e.g. version/create and modify
date/author) to fairly coarse grained components within the XML data model.
2.] Create a baseline of the document or set of XML documents set by:
2.1] Creating a fairly light weight XML file (perhaps using XSLT) that only
contains this meta-data. Save this to disk (i.e. create a memento of the
meta-data)
2.2] Saving a copy of the original XML in a version control system/file
system where it will not be edited further.
3.] Later on when trying to do a diff. between the original baseline and
current:
3.1] Using the same mechanism as in step 2.1 create a new memento of the
current XML document or set of XML documents
4.] Compare the two mementos reporting on changes - if required the baseline
copy of the XML can be used to compute exactly what content has changed (I
think you need add/delete and update) between the two versions.

I am still undecided whether both the memento and document copy are required
- logically the memento is not actually required. However the lightweight
memento may prove useful if:

   - The XML document or set of documents is very large such that it would
   not be desirable to store a complete copy of the document(s).
   - To aid with deep differencing optimisation (especially relevant where
   there is a set of XML documents that you are comparing so you only have to
   parse files where differences occur).
   - The diff. report is only meant to identify where differences are not
   what they are.

Anyway I have only had early thoughts on the subject so would glady listen
to any other suggestions that the community has to offer.

Kind regards,

Michael Odling-Smee

On Thu, Oct 1, 2009 at 3:44 PM, Lech Rzedzicki <xchaotic@g...> wrote:

> Hi all.
>
> I am at a fortunate stage where we are redesigning our XML schema so
> that it fits our requirements better.
> To give you an idea of the XML we're dealing with, it's loosely based
> on DocBook and used for multi-channel publishing.
> Some frequent scenarios include updating XML with new content,
> comparing versions, different languages, sending diffs to tranlation,
> but also producing slight variations depending on the output. Tracking
> changes (by being able to see what's been added and deleted) is also a
> nice to have feature.
> Basically what I aim to put in place is structures to help with these
> function that are not too verbose to overwhelm editors, yet powerful
> enough for 'future' scenarios.
>
> My initial thoughts are to employ xml:id attributes on block-level
> elements and add a set of attributes for each facet of profiling,
> possibly reusing DocBook attributes such as condition, version,
> audience, but my fear is that it won't powerful enough in the future.
>
> I would love to hear your general thoughts on best practices in this
> area of managing XML content and specifically on:
>
> 1. How low should we go with id's on elements? My main concern here is
> making diffs as easy as possible and possibly identifying chunks of
> xml that are as small as possible, making translation cheaper. On the
> other hand should I be bother at all about the performance, since all
> the documents are size-limited to a book size of ca 1000 pages(a few
> MB of XML)?
> 2. Use a possible verbose set of elements/attributes on the elements
> directly or use a meta-attribute that links to an attribute/element
> set in a secondary file? (less verbose but more complex)
> 3. Are 'add' and 'remove' sufficient change tracking marks to cover
> all scenarios? (I think any more complex edits such as update can be
> built up from those two)?
>
> I really hope I can get some good feedback from you and thanks in
> advance for that,
>
> Lech Rzedzicki
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: xml-dev-unsubscribe@l...
> subscribe: xml-dev-subscribe@l...
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>


transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent