Home All Groups Group Topic Archive Search About

How to read large text file ?

Author
15 Aug 2006 3:47 PM
elena
Hi, All
I have text file ASCII with record length of 388 bytes, no record delimeter
and size of the file is 562477780 bytes and 1449685 records all togeter.
How can i read such file record by record ?

Please, help

Author
15 Aug 2006 5:03 PM
Tom Shelton
elena wrote:
> Hi, All
> I have text file ASCII with record length of 388 bytes, no record delimeter
> and size of the file is 562477780 bytes and 1449685 records all togeter.
> How can i read such file record by record ?
>
> Please, help

First of all, I wouldn't read this record-by-record.  Disk reads are
slow, and so it is always best to minimize the number of reads if
possible.  You can do this by reading in chunks...  You will want your
chunk size to be a multiple of 388 (your record length).  Then you can
process those records before you read your next chunk.  So, in psudo
code you would do something like:

while more records
read group of records
process records
end while

Now, in code you would probably want to use the System.IO.StreamReader
class for this - the built in VB.NET file functions are very slow, and
are best avoided when working with larger files.  There are examples in
the documentation of this class that show how to read a specific number
of char's at a time, so I'll direct you to the doc's for that :)

You may also want to consider not hardcoding the number of records to
process.  That way, you can optimize it for speed and memory usage :)
I would consider puting the group size in an app.config file.

--
Tom Shelton [MVP]
Author
15 Aug 2006 5:06 PM
Cor Ligthert [MVP]
Elena,

You have first to read the file encoding with ASCII (are you sure it is
ASCII, because that 7 bit code is seldom used).

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemiostreamreaderclassctortopic9.asp

Than you can go through that using the Mid or the Substring where I prefer
the substring

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfSystemStringClassSubstringTopic2.asp

I hope this helps,

Cor


Show quoteHide quote
"elena" <el***@discussions.microsoft.com> schreef in bericht
news:E0FBBFEA-0203-445D-994D-E6C517711360@microsoft.com...
> Hi, All
> I have text file ASCII with record length of 388 bytes, no record
> delimeter
> and size of the file is 562477780 bytes and 1449685 records all togeter.
> How can i read such file record by record ?
>
> Please, help
Author
15 Aug 2006 8:14 PM
elena
Thank you so much for your input, now i start.


Show quoteHide quote
"Cor Ligthert [MVP]" wrote:

> Elena,
>
> You have first to read the file encoding with ASCII (are you sure it is
> ASCII, because that 7 bit code is seldom used).
>
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemiostreamreaderclassctortopic9.asp
>
> Than you can go through that using the Mid or the Substring where I prefer
> the substring
>
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfSystemStringClassSubstringTopic2.asp
>
> I hope this helps,
>
> Cor
>
>
> "elena" <el***@discussions.microsoft.com> schreef in bericht
> news:E0FBBFEA-0203-445D-994D-E6C517711360@microsoft.com...
> > Hi, All
> > I have text file ASCII with record length of 388 bytes, no record
> > delimeter
> > and size of the file is 562477780 bytes and 1449685 records all togeter.
> > How can i read such file record by record ?
> >
> > Please, help
>
>
>