Home. 
.

transparent

transparent

transparent

Altova Mailing List Archives


Re: Multilingual support in generated XML (RSS)

From: "Anthony Jones" <Ant@------------.--->
To: NULL
Date: 11/3/2006 1:41:00 PM


"BarakF" <frohlinger@y...> wrote in message
news:1162500281.988446.178660@k......
> It does bend your mind INDEED!
>
> So let me see if I figured it out:
> The HTML (client) pages are set as UTF-8 (charset = UTF-8).
> But the server's codepage was set as 1255 (I printed the
> Response.CodePage, without setting it previously to 65001, and received
> 1255 - the Hebrew ANSI codepage).
> So the data arrived from the client as UTF-8, treated with the wrong
> codepage, and inserted corrupted into the DB, which expected valid
> UTF-8.

Pretty close.  To be more exact the DB stores unicode characters which is
the native (and only) character type use in ASP. When the post is read via
ASPs Form object the UTF-8 encoded characters were treated as 1255 and a
codepage conversion from 1255 to unicode was made which is where the
corruption occurs.


> On the other way around - since the same corrupted conversions were
> made, the ASP page displayed the data correctly,

Yes the unicode characters from the DB were converted back to 1255 when they
were passed to the Response.Write method, the reverse of what happened when
they were read from the Form object earlier so the UTF-8 encoding is
restored.  Since the client is being told it's getting UTF-8 it displays the
characters correctly.

>but the XML, which
> expected valid UTF-8, showed the corruption.

Yes assigning text of an XML element does no conversion from unicode since
internally XML is unicode. So the corruption remains in place.

>
> So to sum it up - if I set all the pages with codepage 65001 - it
> should fix everything (as it did, when I tested it), and all will have
> a valid UTF-8 - the client, server, DB and XML.

Yep.


>
> One more question left - I have lots of corrupted UTF-8 strings in the
> DB now.
> Is there a way to perform some kind of conversion to fix it?

Function MapBetweenCharSets(rsIn, charsetIn, charsetOut)

 Dim oStream : Set oStream = CreateObject("ADODB.Stream")

 oStream.Type = 2 ' Text
 oStream.CharSet = charsetIn
 oStream.Open
 oStream.WriteText rsIn
 oStream.Position = 0
 oStream.CharSet = charsetOut
 MapBetweenCharSets = oStream.ReadText
 oStream.close
End Function

s =  MapBetweenCharSets("£", "ISO-8859-1", "UTF-8")

£ is a corruption of the British pound sign.  The result of the function
above restores it.

You usage would be:-

Do Until rs.EOF
    rs("yourfield").value = MapBetweenCharSets(rs("yourfield").value ,
"ISO-8859-8", "UTF-8")
    rs.update
Loop

Backup your DB first just in case ;)

>
> Thanks,
> Gabi.
>




transparent
Print
Mail
Digg
delicious
Disclaimer
.

These Archives are provided for informational purposes only and have been generated directly from the Altova mailing list archive system and are comprised of the lists set forth on www.altova.com/list/index.html. Therefore, Altova does not warrant or guarantee the accuracy, reliability, completeness, usefulness, non-infringement of intellectual property rights, or quality of any content on the Altova Mailing List Archive(s), regardless of who originates that content. You expressly understand and agree that you bear all risks associated with using or relying on that content. Altova will not be liable or responsible in any way for any content posted including, but not limited to, any errors or omissions in content, or for any losses or damage of any kind incurred as a result of the use of or reliance on any content. This disclaimer and limitation on liability is in addition to the disclaimers and limitations contained in the Website Terms of Use and elsewhere on the site.

.
.

transparent

transparent