Home All Groups Group Topic Archive Search About

Does it have unicode?

Author
28 Jun 2005 5:34 PM
Mike Labosh
I need to determine if a string contains double-byte (unicode) characters.

In SQL, it was easy.  Cast it from NVARCHAR to VARCHAR and back again, and
see if it got lossage.

But in VB.NET, all strings are stored as unicode, so I'm not sure what to
do.  I'd like to do something like this: [p-code]

Dim s1, s2 As String

s1 = [my db value]
s2 = CType(CType(s1, AnsiString), String)

If s1 = s2 Then
    All characters are ANSI ones
Else
    Some characters are double byte
End If

What I certainly don't want to do is loop over the characters to see if
AscW(Mid(s1, i, 1)) > 255 because I'm already looping over 100's of
thousands of records and I don't want to have to sniff individual characters
in a batch like this.

--
Peace & happy computing,

Mike Labosh, MCSD

"Mr. McKittrick, after very careful consideration, I have
come to the conclusion that this new system SUCKS."
-- General Barringer, "War Games"

Author
28 Jun 2005 6:18 PM
Armin Zingler
Show quote Hide quote
"Mike Labosh" <mlab***@hotmail.com> schrieb
> I need to determine if a string contains double-byte (unicode)
> characters.
>
> In SQL, it was easy.  Cast it from NVARCHAR to VARCHAR and back
> again, and see if it got lossage.
>
> But in VB.NET, all strings are stored as unicode, so I'm not sure
> what to do.  I'd like to do something like this: [p-code]
>
> Dim s1, s2 As String
>
> s1 = [my db value]
> s2 = CType(CType(s1, AnsiString), String)
>
> If s1 = s2 Then
>    All characters are ANSI ones
> Else
>    Some characters are double byte
> End If
>
> What I certainly don't want to do is loop over the characters to see
> if AscW(Mid(s1, i, 1)) > 255 because I'm already looping over 100's
> of thousands of records and I don't want to have to sniff individual
> characters in a batch like this.


I don't understand what you're trying to achieve. Strings are Unicode
already, as you wrote. Maybe your actual question is whether the String can
be ANSI encoded and back to Unicode without data loss, right? You can have a
look @ system.text.encoding.convert, but I am interested in what's actual
goal.

Armin
Author
28 Jun 2005 9:45 PM
Jay B. Harlow [MVP - Outlook]
Mike,
| What I certainly don't want to do is loop over the characters to see if
| AscW(Mid(s1, i, 1)) > 255 because I'm already looping over 100's of
| thousands of records and I don't want to have to sniff individual
characters
| in a batch like this.
AscW doesn't return Ansi char codes, it returns Unicode char codes, if you
want Ansi char codes you need to use Asc. However non-ansi char codes (codes
> 255) will be returned as a place holder ansi char...

As Armin stated you can use System.Text.Encoding.Default to convert a String
to/from an array of bytes in your current ansi code page as defined by
Windows Control Panel.

Something like:

        Dim s1, s2 As String
        Dim bytes() As Byte

        bytes = Encoding.Default.GetBytes(s1)
        s2 = Encoding.Default.GetString(bytes)

I would expect a loop that early outs to perform better then the converting
the entire string to Ansi & back again. Something like:

    Public Function IsAnsi(ByVal s As String) As Boolean
        For Each ch As Char In s
            If Chr(Asc(ch)) <> ch Then Return False
        Next
        Return True
    End Function

Hope this helps
Jay

Show quoteHide quote
"Mike Labosh" <mlab***@hotmail.com> wrote in message
news:%23HNDgeAfFHA.1136@TK2MSFTNGP12.phx.gbl...
|I need to determine if a string contains double-byte (unicode) characters.
|
| In SQL, it was easy.  Cast it from NVARCHAR to VARCHAR and back again, and
| see if it got lossage.
|
| But in VB.NET, all strings are stored as unicode, so I'm not sure what to
| do.  I'd like to do something like this: [p-code]
|
| Dim s1, s2 As String
|
| s1 = [my db value]
| s2 = CType(CType(s1, AnsiString), String)
|
| If s1 = s2 Then
|    All characters are ANSI ones
| Else
|    Some characters are double byte
| End If
|
| What I certainly don't want to do is loop over the characters to see if
| AscW(Mid(s1, i, 1)) > 255 because I'm already looping over 100's of
| thousands of records and I don't want to have to sniff individual
characters
| in a batch like this.
|
| --
| Peace & happy computing,
|
| Mike Labosh, MCSD
|
| "Mr. McKittrick, after very careful consideration, I have
| come to the conclusion that this new system SUCKS."
| -- General Barringer, "War Games"
|
|