Home All Groups Group Topic Archive Search About

ReadAllText, special characters

Author
20 Apr 2006 8:11 PM
kenny
I have a problem reading special characters in unicode files like the german ß.

I use the following code:

Dim enc As System.Text.Encoding = New System.Text.UnicodeEncoding(False,
False)

        Dim a As String =
Microsoft.VisualBasic.FileIO.FileSystem.ReadAllText(Application.StartupPath &
"\Avkon.r03", enc)


Microsoft.VisualBasic.FileIO.FileSystem.WriteAllText(Application.StartupPath
& "\Avkon.r05", _
        a, False, enc)


And the weird thing is that some ß are read out OK but others are simply
left out.
If I set throwOnInvalidbytes to True, then i get an error of course...

Author
21 Apr 2006 9:25 AM
Larry Lard
kenny wrote:
Show quoteHide quote
> I have a problem reading special characters in unicode files like the german ß.
>
> I use the following code:
>
> Dim enc As System.Text.Encoding = New System.Text.UnicodeEncoding(False,
> False)
>
>         Dim a As String =
> Microsoft.VisualBasic.FileIO.FileSystem.ReadAllText(Application.StartupPath &
> "\Avkon.r03", enc)
>
>
> Microsoft.VisualBasic.FileIO.FileSystem.WriteAllText(Application.StartupPath
> & "\Avkon.r05", _
>         a, False, enc)
>
>
> And the weird thing is that some ß are read out OK but others are simply
> left out.
> If I set throwOnInvalidbytes to True, then i get an error of course...

Which somewhat suggests that the file contains invalid Unicode...

I'm far from an expert on Unicode; is there some external resource you
can use to validate your files?

--
Larry Lard
Replies to group please
Author
21 Apr 2006 11:45 AM
kenny
Well, I am absolutely sure that the file is ok. And if it would be not, why
only some chars cannot be read out? Or is there perhaps a way to read files
independent from the encoding?

Show quoteHide quote
"Larry Lard" wrote:
>
> Which somewhat suggests that the file contains invalid Unicode...
>
> I'm far from an expert on Unicode; is there some external resource you
> can use to validate your files?
>
> --
> Larry Lard
> Replies to group please
>
>
Author
21 Apr 2006 1:18 PM
Larry Lard
kenny wrote:
> Well, I am absolutely sure that the file is ok. And if it would be not, why
> only some chars cannot be read out? Or is there perhaps a way to read files
> independent from the encoding?

Have a look at the file with a hex editor. Each instance of the German
Eszett should be represented by the same bytes (the Eszett is Unicode
00DF it appears, so the two bytes will be 00 and DF in some order). If
the file is corrupted, one will be wrong.

Of course, the problem might be somewhere else. Instead of using
ReadAllText, you could try reading the file a line at a time, and
seeing which line causes the problem.

If even that doesn't help, maybe read the file into a byte array using
a BinaryReader, and then try decoding it one character at a time...

--
Larry Lard
Replies to group please