![]() |
![]() | ![]() | ![]() | Altova Mailing List Archives>Archive Index >microsoft.public.xml Archive Home >Recent entries >Thread Prev - Re: Multilingual support in generated XML (RSS) >Thread Next - Re: Multilingual support in generated XML (RSS) Re: Multilingual support in generated XML (RSS)To: NULL Date: 11/2/2006 4:07:00 PM "BarakF" <frohlinger@y...> wrote in message news:1162476269.294020.111830@i...... > > Is there any difference between the ASP page that reads from the DB and > > writes the XML and that backpage.asp that reads from the DB and writes > > the HTML output (which is UTF-8 encoded)? Are those ASP pages encoded > > with the same code page/encoding? > > > The backpage.asp - which reads the string from the DB and displays it - > has the meta tag on top: > <meta HTTP-EQUIV="content-type" CONTENT="text/html; charset=UTF-8"> > > The server page that prepares the XML (it is a single function called > on a server page), as well as the page that inserts it into the DB - > does not have any encoding directive. > None of the following: > Response.CharSet = "UTF-8" > Response.ContentType = "text/xml" > Response.CodePage = 65001 > > > > How is the text stored in the DB, what DB is that, what column type is > > the text stored in? > > The DB is Microsoft SQL Server 2000 > The field is defined as nvarchar and when inserting the string into the > DB, through SP, I use > "adVarWChar" define. > > BUT - > When adding Response.CodePage = 65001 to all server pages - I see the > Hebrew characters correctly. > Should I always add Response.CodePage = 65001 to server pages? > Should I leave the <meta HTTP-EQUIV="content-type" CONTENT="text/html; > charset=UTF-8"> for HTML pages? > > Now, what I inserted previously looks like garbage, and the new posts > in Hebrew look OK (in the DB, in the XML and in the ASP pages). > > I am confused. I don't know in which case I did right, and on which > case I did wrong... > What are the basic rules for UTF-8 support? The problem is in the way ASP decodes the form inputs. First you need to understand that the encoding a browser uses when submitting the content of a form is taken from the encoding of the loaded page. In a somewhat counter intuitive way ASP uses the Response codepage to inform it as to how to decode form fields in the request. Hence for correct operation a page receiving a form post should have it's Response codepage set to the codepage that matches the character set specified when sending the original form. So in your case you have a form in a UTF-8 page. Text entered is posted to the server in UTF-8 encoding. However the receiving page is currently set to a ANSI code page hence the 2 byte character encodings that some of the UTF-8 characters are using are treated as individual characters and that's how it's stored in the DB. Hence the content of the DB is corrupt. Now that you are specifying the codepage of your output pages correctly you are seeing the corruption. Before that change you were telling the client it was receiving UTF-8 but using an ANSI code page in the response. It appeared to work because this incorrect setup reverses the corruption of the characters in the DB when sent to the client. I hope I've made it clear, char sets and code pages can really bend your mind. ;) > > Thanks, Gabi. > | ![]() | ![]() | ![]() |
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | |||||
|
