Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: [xml-dev] xml over http - RFC 3023

From: "Andrew Welch" <andrew.j.welch@-----.--->
To: "Rick Jelliffe" <rjelliffe@-------.---.-->
Date: 12/1/2008 10:17:00 AM
Hi Rick,

> The out-of-band signalling of character encoding is a fundamentally broken
> idea, because there are no mechanisms for programs which generate data to
> memoize the character encoding used that can then feed the rest of the
> food-chain.

How about the BOM - that's one way isn't it?  I wonder if a similar
ignorable byte sequence could be added to the start of all byte
sequences to indicate the encoding of what's coming.


>> At the moment it all seems pretty complicated...

> It is not complicated. Use application/xml
>
> If you do find intermediate web systems that implement the ASCII default or
> the IS8859-1 default as anything other than 8-bit clean for text/xml submit
> a bug report.


I'm dealing with RSS feeds from all over the world, so it's:

- 3 different types of feeds
- multiple languages, multiple encodings
- embedded inconsistenly escaped html, or cdata sections, or both
- and even, use of entities without even including the doctype, so it
doesn't even parse without help

It is possible to reject some of the feeds, but other readers accept
them so this one needs to at least match them before taking the moral
high ground (and it's not too hard to code around the problems).

So this is a real test of XML on the web.  The complicated part I was
referring to is reading the bytes from the http input stream in the
right encoding:

- extract the encoding from the contenttype
- if its not there read the first few bytes of stream in us-ascii and
then extra the encoding from the prolog
- if its not there use utf-8
- hope that actual encoding of the file and the encoding you've discovered match

...and that's not even completely correct as far as I understand.

So when you say:

"It is not complicated. Use application/xml"

I don't get it, what am I missing?

I would've thought the webserver would be aware that it was serving
xml and take of it - it could extract the encoding from the xml prolog
and ensure the file was served with that (maintaining it however it
liked)... it seems odd that the client should go through this process
every time.

thanks
-- 
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@l...
subscribe: xml-dev-subscribe@l...
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php



transparent
Print
Mail
Like It
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent