Home All Groups Group Topic Archive Search About

TextFieldParser - reading tab delimited file

Author
21 Sep 2006 11:34 AM
al jones
I’m using textfieldparser to read a data file. which contains, for example:

Amondó Szegi    Amondo Szegi
andré nossek    André Nossek
© Characte    Character

Note the vowels with diacriticals and the copyright symbol - it is dropping
these (and other similar) characters which fall outside ascii range
(apparently)

The code is simple and looks like:
        Using MyReader As New TextFieldParser(Application.StartupPath &
"\designers.txt")
            MyReader.TextFieldType = FileIO.FieldType.Delimited
            MyReader.CommentTokens = New String() {"#"}
            MyReader.Delimiters = New String() {vbTab}
            MyReader.TrimWhiteSpace = True
            Dim currentRow As String()
            intElement = 0
            While Not MyReader.EndOfData
                Try
                    currentRow = MyReader.ReadFields()
                    If Microsoft.VisualBasic.Left(currentRow(0), 7) =
"UNKNOWN" Then
                        strUnknownDesigner = currentRow(1)
                        Continue While
                    End If
                    arDesigner(intElement, 0) = currentRow(0)
                    arDesigner(intElement, 1) = currentRow(1)
                    arDesignerCounter(intElement) = 0
                    intElement += 1
                Catch ex As MalformedLineException
                    MsgBox("Designer Line " & ex.Message & "is not valid
and will be skipped.")
                End Try
            End While
        End Using

I can’t see any reason in the documentation for it dropping copyright or
the French and German (etc…) vowels with accents.

Comments or suggestions anyone??

Thanks //al

Author
21 Sep 2006 12:02 PM
Andrew Morton
al jones wrote:
> I'm using textfieldparser to read a data file. which contains, for
> example:
>
> Amondó Szegi Amondo Szegi
> andré nossek André Nossek
> © Characte Character
>
> Note the vowels with diacriticals and the copyright symbol - it is
> dropping these (and other similar) characters which fall outside
> ascii range (apparently)

It appears to be an encoding problem where the file uses (I'm guessing)
ISO-8859-1 or maybe Windows-1252 whereas the .NET framework defaults to
Unicode. Does a TextFieldParser have a setting for that (or have a
..BaseClass that does)?

Or perhaps you can arrange for the file to be encoded with Unicode?

Andrew
Author
21 Sep 2006 5:31 PM
al jones
On Thu, 21 Sep 2006 13:02:59 +0100, Andrew Morton wrote:

Show quoteHide quote
> al jones wrote:
>> I'm using textfieldparser to read a data file. which contains, for
>> example:
>>
>> Amondó Szegi Amondo Szegi
>> andré nossek André Nossek
>> © Characte Character
>>
>> Note the vowels with diacriticals and the copyright symbol - it is
>> dropping these (and other similar) characters which fall outside
>> ascii range (apparently)
>
> It appears to be an encoding problem where the file uses (I'm guessing)
> ISO-8859-1 or maybe Windows-1252 whereas the .NET framework defaults to
> Unicode. Does a TextFieldParser have a setting for that (or have a
> .BaseClass that does)?
>
> Or perhaps you can arrange for the file to be encoded with Unicode?
>
> Andrew

Possibly my confusion is from the fact that I maintain these files (there
are three of them) within VS 2005 so I would have epected them to be
unicode. The characters exist within the files (the three line examples are
cut & paste from the file itself) so I don't understand why reading them
would literally eliminate the characters.

I've been over the TextFieldParser docs and see nothing that indicates that
it shouldn't take the data as presented.
Author
22 Sep 2006 4:42 PM
Jeff Glatt
Try OrchidGrid control, which can pase/import data from delimited files.



Show quoteHide quote
> I¡¯m using textfieldparser to read a data file. which contains, for
> example:
>
> Amond¨® Szegi Amondo Szegi
> andr¨¦ nossek Andr¨¦ Nossek
> ? Characte Character
>
> Note the vowels with diacriticals and the copyright symbol - it is
> dropping
> these (and other similar) characters which fall outside ascii range
> (apparently)
>
> The code is simple and looks like:
>        Using MyReader As New TextFieldParser(Application.StartupPath &
> "\designers.txt")
>            MyReader.TextFieldType = FileIO.FieldType.Delimited
>            MyReader.CommentTokens = New String() {"#"}
>            MyReader.Delimiters = New String() {vbTab}
>            MyReader.TrimWhiteSpace = True
>            Dim currentRow As String()
>            intElement = 0
>            While Not MyReader.EndOfData
>                Try
>                    currentRow = MyReader.ReadFields()
>                    If Microsoft.VisualBasic.Left(currentRow(0), 7) =
> "UNKNOWN" Then
>                        strUnknownDesigner = currentRow(1)
>                        Continue While
>                    End If
>                    arDesigner(intElement, 0) = currentRow(0)
>                    arDesigner(intElement, 1) = currentRow(1)
>                    arDesignerCounter(intElement) = 0
>                    intElement += 1
>                Catch ex As MalformedLineException
>                    MsgBox("Designer Line " & ex.Message & "is not valid
> and will be skipped.")
>                End Try
>            End While
>        End Using
>
> I can¡¯t see any reason in the documentation for it dropping copyright or
> the French and German (etc¡­) vowels with accents.
>
> Comments or suggestions anyone??
>
> Thanks //al