![]() |
![]() | ![]() | ![]() | Altova Mailing List Archives>Archive Index >microsoft.public.xml Archive Home >Recent entries >Thread Prev - Re: Multilingual support in generated XML (RSS) >Thread Next - Re: Multilingual support in generated XML (RSS) Re: Multilingual support in generated XML (RSS)To: NULL Date: 11/3/2006 1:41:00 PM
"BarakF" <frohlinger@y...> wrote in message
news:1162500281.988446.178660@k......
> It does bend your mind INDEED!
>
> So let me see if I figured it out:
> The HTML (client) pages are set as UTF-8 (charset = UTF-8).
> But the server's codepage was set as 1255 (I printed the
> Response.CodePage, without setting it previously to 65001, and received
> 1255 - the Hebrew ANSI codepage).
> So the data arrived from the client as UTF-8, treated with the wrong
> codepage, and inserted corrupted into the DB, which expected valid
> UTF-8.
Pretty close. To be more exact the DB stores unicode characters which is
the native (and only) character type use in ASP. When the post is read via
ASPs Form object the UTF-8 encoded characters were treated as 1255 and a
codepage conversion from 1255 to unicode was made which is where the
corruption occurs.
> On the other way around - since the same corrupted conversions were
> made, the ASP page displayed the data correctly,
Yes the unicode characters from the DB were converted back to 1255 when they
were passed to the Response.Write method, the reverse of what happened when
they were read from the Form object earlier so the UTF-8 encoding is
restored. Since the client is being told it's getting UTF-8 it displays the
characters correctly.
>but the XML, which
> expected valid UTF-8, showed the corruption.
Yes assigning text of an XML element does no conversion from unicode since
internally XML is unicode. So the corruption remains in place.
>
> So to sum it up - if I set all the pages with codepage 65001 - it
> should fix everything (as it did, when I tested it), and all will have
> a valid UTF-8 - the client, server, DB and XML.
Yep.
>
> One more question left - I have lots of corrupted UTF-8 strings in the
> DB now.
> Is there a way to perform some kind of conversion to fix it?
Function MapBetweenCharSets(rsIn, charsetIn, charsetOut)
Dim oStream : Set oStream = CreateObject("ADODB.Stream")
oStream.Type = 2 ' Text
oStream.CharSet = charsetIn
oStream.Open
oStream.WriteText rsIn
oStream.Position = 0
oStream.CharSet = charsetOut
MapBetweenCharSets = oStream.ReadText
oStream.close
End Function
s = MapBetweenCharSets("£", "ISO-8859-1", "UTF-8")
£ is a corruption of the British pound sign. The result of the function
above restores it.
You usage would be:-
Do Until rs.EOF
rs("yourfield").value = MapBetweenCharSets(rs("yourfield").value ,
"ISO-8859-8", "UTF-8")
rs.update
Loop
Backup your DB first just in case ;)
>
> Thanks,
> Gabi.
>
| ![]() | ![]() | ![]() |
| Company | Legal | Press | Partners | Careers | Sitemap | Contact Us | Altova Blog | |||||
|
