Altova Mailing List Archives>Archive Index >xml-dev Archive Home >Recent entries >Thread Prev - [xml-dev] Profiling, diff and change tracking best practices? [Thread Next] Re: [xml-dev] Profiling, diff and change tracking best practices?To: Lech Rzedzicki <xchaotic@-----.---> Date: 10/1/2009 3:41:00 PM Hi Lech, Funnily enough I have just started thinking about this for my own project with a similar use-case - i.e. understanding the changes between two different baselines of an XML document or XML document set. My high-level thoughts so far are: 1.] Add suitable meta-data attributes (e.g. version/create and modify date/author) to fairly coarse grained components within the XML data model. 2.] Create a baseline of the document or set of XML documents set by: 2.1] Creating a fairly light weight XML file (perhaps using XSLT) that only contains this meta-data. Save this to disk (i.e. create a memento of the meta-data) 2.2] Saving a copy of the original XML in a version control system/file system where it will not be edited further. 3.] Later on when trying to do a diff. between the original baseline and current: 3.1] Using the same mechanism as in step 2.1 create a new memento of the current XML document or set of XML documents 4.] Compare the two mementos reporting on changes - if required the baseline copy of the XML can be used to compute exactly what content has changed (I think you need add/delete and update) between the two versions. I am still undecided whether both the memento and document copy are required - logically the memento is not actually required. However the lightweight memento may prove useful if: - The XML document or set of documents is very large such that it would not be desirable to store a complete copy of the document(s). - To aid with deep differencing optimisation (especially relevant where there is a set of XML documents that you are comparing so you only have to parse files where differences occur). - The diff. report is only meant to identify where differences are not what they are. Anyway I have only had early thoughts on the subject so would glady listen to any other suggestions that the community has to offer. Kind regards, Michael Odling-Smee On Thu, Oct 1, 2009 at 3:44 PM, Lech Rzedzicki <xchaotic@g...> wrote: > Hi all. > > I am at a fortunate stage where we are redesigning our XML schema so > that it fits our requirements better. > To give you an idea of the XML we're dealing with, it's loosely based > on DocBook and used for multi-channel publishing. > Some frequent scenarios include updating XML with new content, > comparing versions, different languages, sending diffs to tranlation, > but also producing slight variations depending on the output. Tracking > changes (by being able to see what's been added and deleted) is also a > nice to have feature. > Basically what I aim to put in place is structures to help with these > function that are not too verbose to overwhelm editors, yet powerful > enough for 'future' scenarios. > > My initial thoughts are to employ xml:id attributes on block-level > elements and add a set of attributes for each facet of profiling, > possibly reusing DocBook attributes such as condition, version, > audience, but my fear is that it won't powerful enough in the future. > > I would love to hear your general thoughts on best practices in this > area of managing XML content and specifically on: > > 1. How low should we go with id's on elements? My main concern here is > making diffs as easy as possible and possibly identifying chunks of > xml that are as small as possible, making translation cheaper. On the > other hand should I be bother at all about the performance, since all > the documents are size-limited to a book size of ca 1000 pages(a few > MB of XML)? > 2. Use a possible verbose set of elements/attributes on the elements > directly or use a meta-attribute that links to an attribute/element > set in a secondary file? (less verbose but more complex) > 3. Are 'add' and 'remove' sufficient change tracking marks to cover > all scenarios? (I think any more complex edits such as update can be > built up from those two)? > > I really hope I can get some good feedback from you and thanks in > advance for that, > > Lech Rzedzicki > > _______________________________________________________________________ > > XML-DEV is a publicly archived, unmoderated list hosted by OASIS > to support XML implementation and development. To minimize > spam in the archives, you must subscribe before posting. > > [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/ > Or unsubscribe: xml-dev-unsubscribe@l... > subscribe: xml-dev-subscribe@l... > List archive: http://lists.xml.org/archives/xml-dev/ > List Guidelines: http://www.oasis-open.org/maillists/guidelines.php > > | ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
