Altova Mailing List Archives
>comp.text.xml Archive Home
>Thread Prev - Re: RFC: thoughts for a "streamlined" XML syntax variant...
>Thread Next - Re: RFC: thoughts for a "streamlined" XML syntax variant...
Re: RFC: thoughts for a "streamlined" XML syntax variant...
Date: 5/11/2012 5:01:00 PM
On 5/11/2012 1:44 PM, Peter Flynn wrote: > On 11/05/12 18:40, BGB wrote: >> one issue partly in the case of XML for its use in structured data >> is its relative verbosity, especially in cases where it is entered by >> hand or being read by a human (say, for debugging reasons, ...). > > I think this was expected to be a very rare case, which is why the spec > says that terseness in XML markup is of minimal importance. > fair enough. I mostly use it for things like compiler ASTs, network protocols, and file-formats (generally structured-data). currently used forms of XML are: raw/plaintext XML; as deflated plaintext XML; as an in-use binary format (similar to an "improved" version of WBXML with a few more features and density-improvements, with both being byte-based). I have another format I could use, but going into it likely pushes topicality (it is a Huffman-compressed binary serialization format, currently used for sending messages over a TCP socket in a 3D game engine, but this doesn't have much in particular to do with XML, as the message format it is currently used with is S-Expression based, rather than XML based). but, yeah, I guess originally XML was intended for markup of mostly textual documents (like in HTML or similar), rather than for representing structured data (or being used for humans viewing said structured data as debugging output). I wonder if anyone ever really considered "scene-graph delta-update messages in a 3D FPS game" as a possible use-case for XML either? somehow I doubt it (I had intended to do this originally, despite eventually opting for a different representation for said deltas). even as such, I did end up aggressively compressing them (via a specialized encoding scheme), as otherwise the bandwidth usage would have been a bit steep for a typical end-user internet connection. >> so, the thought here would be to allow a "modest" syntax extension >> (probably would be limited to particular implementations which >> support the extension). >> >> more specifically, I was considering it as a possible extension >> feature to my own implementation, but have some doubts given that, >> yes, this would be non-standard extension. note that there probably >> would be a feature to manually "enable" it, such as to avoid >> necessarily breaking compatibility. > > Switchable is good. > yeah. >> in my case, the current primary use is for things like compiler ASTs, >> where it competes some with the use of S-Expressions for ASTs (Lisp >> style, not the "Rivest" variant / name-hijack). note that these ASTs >> normally never leave the application which created them, so the >> impact of using a non-standard syntax when serializing them is likely >> fairly small. >> >> example, say that a person has an expression like: >> <if> >> <cond> >> <binary op="<"> >> <ref name="x"/> >> <number value="3"/> >> </binary> >> </cond> >> <then> >> <funcall name="foo"> >> <args/> >> </funcall> >> </then> >> </if> >> >> representing, say, the AST of the statement "if(x>3)foo();". >> >> the parser and printer could use a more compact encoding, say: >> <if >> <cond<binary op="<"<ref name="x"/> <number value="3"/>>>> >> <then<funcall name="foo"<args/>>> > > This syntax (or very nearly) is already available in SGML: > > <!doctype if [ > <!element if - - (cond,then)> > <!element cond - - (binary)> > <!element binary - - (ref,number)> > <!element number - - empty> > <!element then - - (funcall)> > <!element funcall - - (args)> > <!element (args,ref) - - empty> > <!attlist binary op cdata #required> > <!attlist (ref,funcall) name cdata #required> > <!attlist number value cdata #required> > <!entity lt sdata "<"> > ]> > <if<cond<binary op="<"<ref name=x<number value="3"></></> > <then<funcall name=foo<args></></></> > fair enough. >> which would be regarded as functionally-equivalent to the prior >> expression (and would generate equivalent DOM trees when read back in). >> >> with the following rules: >> <tag>...</tag> and<tag/> are the same as before. >> >> while: >> <tag<...> ...> >> would use an alternate parsing strategy, where ">" is significant (since >> the prior tag didn't actually end), and indicates the end of the >> expression (the magic here would be seeing another "<" within a tag). >> >> similarly, maybe "<[[" could also be parsed as a shorthand for >> "<![CDATA[" as well (and would also match nicer with the closing bracket >> "]]>"). >> >> note that it would be possible to mix them, as in: >> <foo> <bar<baz/>> </foo> >> and: >> <foo<bar> <baz/> </bar>> >> >> maybe also a different "name" would be a good idea, like "XEML" or >> similar would make sense, such as to reduce possible confusion. >> >> any thoughts or relevant information to look at?... > > I think you'd need a special editor: if the objective is to abbreviate > the syntax, there is a delicate breakpoint between the denseness of the > reduced syntax and the ability of the creator/user to understand it. > I hadn't considered this case. if the code is being viewed/edited in a generic text editor (such as Notepad), it shouldn't make too much of a difference, but granted a specialized XML editor could very well get confused. but, in this case, I doubt that such a change would render the syntax unreadable (to humans), but it could reduce verbosity and sprawl somewhat (in intermediate data files spit out by the application), which is currently the main problem area (finding things in multi-MB files is hard enough as-is, much less when the AST for a single function in a C-like syntax can span over a fairly large number of pages). but, I don't think it would be too much of a different issue from that of a person trying to read S-Expressions, if using a more compact format. this is partly because a C-style (programming language) syntax is fairly information-dense, but when parsed into ASTs and then dumped as XML, there is a significant amount of expansion. > What about writing up the method as a paper for the Balisage (markup) > conference? That's really the place to discuss new syntaxes. > I don't know much about them, I hadn't heard of this before. > ///Peter >