Altova Mailing List Archives>Archive Index >comp.text.xml Archive Home >Recent entries [Thread Prev] >Thread Next - Re: BBC news story: Judge bans Microsoft Word sales Re: BBC news story: Judge bans Microsoft Word salesTo: NULL Date: 8/18/2009 12:32:00 AM >>>>> Pete Becker <pete@v...> (PB) wrote: >PB> The cited news article is rather superficial. Be careful about drawing >PB> conclusions about how the legal system works from reading such sources. >PB> They're often wrong. >PB> The patent itself was filed in 1994 (not 1998, as the article says) and >PB> issued in 1998. It mentions SGML (the parent of XML) in several places, and >PB> says that the method at issue is fundamentally different because it does >PB> not put structural information in the data stream. More particularly: >PB> Thus, in sharp contrast to the prior art the present >PB> invention is based on the practice of separating encoding >PB> conventions from the content of a document. The invention >PB> does not use embedded metacoding to differentiate the content >PB> of the document, but rather, the metacodes of the document are >PB> separated from the content and held in distinct storage in a >PB> structure called a metacode map, whereas document content is >PB> held in a mapped content area. Raw content is an extreme >PB> example of mapped content wherein the latter is totally >PB> unstructured and has no embedded metacodes in the data stream. >PB> That doesn't sound like a description of XML. Well, read the whole patent. What they do is process a document with embedded markup (like troff, SGML, XML, or maybe even TeX) in such a way that inside the program the markup is separated from the plain text. The external representation is still the marked up text. So it does apply to XML. This is quite a primitive way of parsing the markup. It is just scanning the input until you find a tag (called metacode in the patent) copying the text before the tag to an output area, and copying the tag to a list of tags (called a metacode map in the patent). So compared to modern parsing techniques there are two differences: (1) nowaday you usually build a parse tree; they have just a degenerate tree (only a list). (2) usually the plain text is put in the leaves of the tree; they have the text in one contiguous area, and the `parse tree' contains pointers or indices to this area. The advantage of their structure comes when you need more than one tag structure on top of the text: for example when you both have the hierarchical XML structure and a structure with lines and pages. SGML has the possibility of having more than one structure in the same document and that fact is mentioned in the patent. The only innovative idea in the patent is this separation because it makes it easier to do editing on the document when you have more than one structure on top of it. And I don't know how innovative it is because once you need to edit a marked up text with more than one (markup) structure on top of it, this is quite a logical choice. And moreover ideas cannot be patented, so the idea doesn't count (but IANAL). Once you have this idea, implementing it is peanuts. You could give this to any student that attends a beginner's programming course when they have had strings, arrays and loops, and they should be able to solve it. So the patent is about the transformation of the marked up text to the separated data structure and v.v. and about calculating another structure from the first one, plus some minor other things. I find it really silly that you can get a patent for this kind of thing. I am writing a small Python program that illustrates the patented algorithms. -- Piet van Oostrum <piet@c...> URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4] Private email: piet@v... | ||||||
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | Mobile | Full Site | |||
|
