Altova Mailing List Archives
>xml-dev Archive Home
>Thread Prev - RE: [xml-dev] MSXML DOM Special Chars Less Than 32
RE: [xml-dev] MSXML DOM Special Chars Less Than 32
Date: 3/24/2002 12:44:00 PM
> From: Joshua Allen [mailto:joshuaa@m...] > Sent: Sunday, March 24, 2002 2:43 AM > To: Julian Reschke; michael.h.kay@n...; Rick Jelliffe; > xml-dev@l... > Subject: RE: [xml-dev] MSXML DOM Special Chars Less Than 32 > > > >That's a bit like saying that XML should not be used as marshalling > >information when arbitrary strings are sent around. So should SOAP and > >WebDAV changed? > > I don't see the problem with SOAP; I don't recall seeing SOAP > implementations that break the spec. I'll be the first to admit So what is a SOAP implementation supposed to do when a string parameter was declared as xs:string, but the parameter to be passed contains a BEL character? > that certain WebDAV implementations from a company in Redmond are > prone to emit broken XML. The same as the database people who > say "round-tripping is PARAMOUNT!". I personally have no > sympathy at all for this mentality. > > >However, ignoring the issue doesn't exactly help either. Many > >applications/protocols are stuck with the task of marshalling "arbitrary" > >strings as XML (and datatype xs:string), so it would be good if > there was an > >XML-1.0 compliant, cross-protocol format to do this. > > Nobody is saying to ignore the issue. The tradeoff has already > been made, and the best anyone can do is just follow the spec. > There ARE solutions (base64, using something other than XML, > etc.) and I do not think anyone (including Microsoft) should be > excused for creating crap and calling it XML. It poisons the > wells, and has negative impacts far down stream, as we have seen > with numerous XML parsers breaking on so-called "XML" from > certain WebDAV servers. If the stuff can't be processed by a > single conforming XML processor, there is really no point in > calling it XML. And I would disagree with your sentence "Many I completely agree so far... > applications are stuck with the task of marshalling 'arbitrary' > strings as XML datatype xs:string". You are in essence saying, > "many applications have the requirement to emit something that is > NOT xs:string, but call it xs:string anyway". My point is that I think this is the situation with SOAP and WebDAV. SOAP 1.1 says: "The datatype "string" is defined in "XML Schema Part 2: Datatypes" Specification . Note that this is not identical to the type called "string" in many database or programming languages, and in particular may forbid some characters those languages would permit. (Those values must be represented by using some datatype other than xsd:string.)" So the expectation clearly is that xs:string is only used when no "special" control characters are needed, but I doubt that when defining their interfaces, many programmers do indeed take this into account. > *no* application has such a requirement; they may *think* they > have such a requirement, but that is like saying "I need to be > able to store a string in this longint field" -- eventually the > developer figures out that maybe that wasn't the right datatype > to choose -- maybe a more appropriate datatype would be in oorder. OK, assuming the data type *can* be changed: what encoding would you suggest for encoding arbitrary Unicode data (where control characters may appear, but only occasionally)? Surely not base64 (it's for byte streams, adds a lot of overhead and makes your XML unreadable to humans). BTW: another side of this problem is DOM's current approach. createText() doesn't have to throw an exception when the string contains forbidden characters. There is no standard method to test for XML character code compliance (note that there's also an issue regarding Java characters not being valid Unicode characters in all cases). DOM level 2 doesn't describe serialization, so current serializers in the best case throw an exception (which is pretty late...) or ignore the issue at all (producing broken XML).