Home All Groups Group Topic Archive Search About

Variable String Limit

Author
8 Jun 2009 10:14 PM
Freddy Coal
Hi, I have an strange error;

I have a 200Mb txt file, for speed I try to load that in memory (my PC have
3Gb in Ram with Win XP 32Bit), my problem is trying to load that in an
string variable, I get the next error:

Exception of type 'System.OutOfMemoryException' was thrown.

My question is if the memory limit for the string is 2^31 (2.147.483.647
more than 2Gb) why I can get that error when try to load a file of less than
the theorical limit of the 32bit operating system.

For read the file I'm using the next code:

Dim Lector As New StreamReader(Path_File, True)
Dim Cadena as string

Cadena = Lector.ReadToEnd

In the moment I read the file line by line, but this process is very slow,
because I need process each line instead of all the block (for example for
remove quotes or get data with Regex), if I can load all the file, I can
apply regex for make multiples process.

Thanks in advance for any help.

Freddy Coal

Author
8 Jun 2009 11:21 PM
Armin Zingler
Freddy Coal wrote:
Show quoteHide quote
> Hi, I have an strange error;
>
> I have a 200Mb txt file, for speed I try to load that in memory (my
> PC have 3Gb in Ram with Win XP 32Bit), my problem is trying to load
> that in an string variable, I get the next error:
>
> Exception of type 'System.OutOfMemoryException' was thrown.
>
> My question is if the memory limit for the string is 2^31
> (2.147.483.647 more than 2Gb) why I can get that error when try to
> load a file of less than the theorical limit of the 32bit operating
> system.
> For read the file I'm using the next code:
>
> Dim Lector As New StreamReader(Path_File, True)
> Dim Cadena as string
>
> Cadena = Lector.ReadToEnd
>
> In the moment I read the file line by line, but this process is very
> slow, because I need process each line instead of all the block (for
> example for remove quotes or get data with Regex), if I can load all
> the file, I can apply regex for make multiples process.

I get the same error with the overloaded constructor you use. If I specify
the encoding (Unicode), there's no problem. Which character encoding is used
in the file? Is it possible you also specify the correct encoding?


Armin
Author
8 Jun 2009 11:38 PM
Nobody
"Freddy Coal" <freddyc***@gmail.com> wrote in message
news:uT6WFaI6JHA.1092@TK2MSFTNGP06.phx.gbl...
> Hi, I have an strange error;
>
> I have a 200Mb txt file, for speed I try to load that in memory (my PC
> have 3Gb in Ram with Win XP 32Bit), my problem is trying to load that in
> an string variable, I get the next error:
>
> Exception of type 'System.OutOfMemoryException' was thrown.

The 200MB could become 400MB if the source is ANSI. Also, the out of memory
error could be because there no contiguous block of memory of that size. You
may have 1 GB free, but fragmented.
Author
9 Jun 2009 1:39 AM
Mike
Others answered on the ANSI/UNICODE limits. I'm commenting on a design
alternative since you mentioned reading from disk and slowness.

I'm relative new to VB.NET so if I had this large text file need (and
I will) for VB.NET, I would naturally look for Memory Map I/O support.

A memory map is virtualized between the DISK and MEMORY
so its tons faster than reading from disk only, can't even tell the
difference unless you profiled it, and really not the much slower than
getting all into memory which will probably create other scalability
and performance issues anyway, increase page faults and GC stress.

If .NET already has library support for it, then you might want to
check that out.  I did a quick search for it last week but didn't find
anything. Let me research again..... OH WONDERFUL!

It appears .NET 4.0 has support for a new System library:

        System.IO.MemoryMappedFile

and best I can see, the only MSDN search result for that is:

http://blogs.msdn.com/salvapatuel/archive/2009/06/08/working-with-memory-mapped-files-in-net-4.aspx

In theory, all allocated memory is virtualized anyway, but not in your
control, i.e, like a big string.

I wonder if MemoryStream is .NET version of a memory map. Reading MSDN
docs for MemoryStream is not quite yelling that out but I suspect the
underlining implementation is memory mapped.  :-)

But as you can see from the example in this blog, the easy of
creating/opening one.

This is going to be a god-send for VB.NET developers!   Large file
names will be optimized for VB.NET now!

I'm still going to see if I can write a CMemoryMapFile class one for
..NET 2.0. :-)

--

Freddy Coal wrote:
Show quoteHide quote
> Hi, I have an strange error;
>
> I have a 200Mb txt file, for speed I try to load that in memory (my PC have
> 3Gb in Ram with Win XP 32Bit), my problem is trying to load that in an
> string variable, I get the next error:
>
> Exception of type 'System.OutOfMemoryException' was thrown.
>
> My question is if the memory limit for the string is 2^31 (2.147.483.647
> more than 2Gb) why I can get that error when try to load a file of less than
> the theorical limit of the 32bit operating system.
>
> For read the file I'm using the next code:
>
> Dim Lector As New StreamReader(Path_File, True)
> Dim Cadena as string
>
> Cadena = Lector.ReadToEnd
>
> In the moment I read the file line by line, but this process is very slow,
> because I need process each line instead of all the block (for example for
> remove quotes or get data with Regex), if I can load all the file, I can
> apply regex for make multiples process.
>
> Thanks in advance for any help.
>
> Freddy Coal
>
>
Author
9 Jun 2009 2:08 AM
Mike
Follow up:

Here are some examples of using memory maps in VB.NET

http://www.pinvoke.net/default.aspx/kernel32/MapViewOfFile.html

The OP might want see if this large text file operations need worked
for him using MMF.

--

Mike wrote:
Show quoteHide quote
> Others answered on the ANSI/UNICODE limits. I'm commenting on a design
> alternative since you mentioned reading from disk and slowness.
>
> I'm relative new to VB.NET so if I had this large text file need (and I
> will) for VB.NET, I would naturally look for Memory Map I/O support.
>
> A memory map is virtualized between the DISK and MEMORY
> so its tons faster than reading from disk only, can't even tell the
> difference unless you profiled it, and really not the much slower than
> getting all into memory which will probably create other scalability and
> performance issues anyway, increase page faults and GC stress.
>
> If .NET already has library support for it, then you might want to check
> that out.  I did a quick search for it last week but didn't find
> anything. Let me research again..... OH WONDERFUL!
>
> It appears .NET 4.0 has support for a new System library:
>
>        System.IO.MemoryMappedFile
>
> and best I can see, the only MSDN search result for that is:
>
> http://blogs.msdn.com/salvapatuel/archive/2009/06/08/working-with-memory-mapped-files-in-net-4.aspx
>
>
> In theory, all allocated memory is virtualized anyway, but not in your
> control, i.e, like a big string.
>
> I wonder if MemoryStream is .NET version of a memory map. Reading MSDN
> docs for MemoryStream is not quite yelling that out but I suspect the
> underlining implementation is memory mapped.  :-)
>
> But as you can see from the example in this blog, the easy of
> creating/opening one.
>
> This is going to be a god-send for VB.NET developers!   Large file names
> will be optimized for VB.NET now!
>
> I'm still going to see if I can write a CMemoryMapFile class one for
> .NET 2.0. :-)
>
> --
>
> Freddy Coal wrote:
>> Hi, I have an strange error;
>>
>> I have a 200Mb txt file, for speed I try to load that in memory (my PC
>> have 3Gb in Ram with Win XP 32Bit), my problem is trying to load that
>> in an string variable, I get the next error:
>>
>> Exception of type 'System.OutOfMemoryException' was thrown.
>>
>> My question is if the memory limit for the string is 2^31
>> (2.147.483.647 more than 2Gb) why I can get that error when try to
>> load a file of less than the theorical limit of the 32bit operating
>> system.
>>
>> For read the file I'm using the next code:
>>
>> Dim Lector As New StreamReader(Path_File, True)
>> Dim Cadena as string
>>
>> Cadena = Lector.ReadToEnd
>>
>> In the moment I read the file line by line, but this process is very
>> slow, because I need process each line instead of all the block (for
>> example for remove quotes or get data with Regex), if I can load all
>> the file, I can apply regex for make multiples process.
>>
>> Thanks in advance for any help.
>>
>> Freddy Coal
>>
Author
9 Jun 2009 8:16 PM
Freddy Coal
Thanks for all the answers, I get some results reading the file with the
next code:

Dim cadena As String = ""
Dim dato As Array
If File.Exists(ruta) = True Then
dato = My.Computer.FileSystem.ReadAllBytes(ruta)
cadena = System.Text.Encoding.GetEncoding(0).GetString(dato)
Return cadena
End If

Now I have the memory error when I try to split the string in an array,
which is the limit for the array?

Thanks in advance.

Freddy Coal

Show quoteHide quote
"Freddy Coal" <freddyc***@gmail.com> wrote in message
news:uT6WFaI6JHA.1092@TK2MSFTNGP06.phx.gbl...
> Hi, I have an strange error;
>
> I have a 200Mb txt file, for speed I try to load that in memory (my PC
> have 3Gb in Ram with Win XP 32Bit), my problem is trying to load that in
> an string variable, I get the next error:
>
> Exception of type 'System.OutOfMemoryException' was thrown.
>
> My question is if the memory limit for the string is 2^31 (2.147.483.647
> more than 2Gb) why I can get that error when try to load a file of less
> than the theorical limit of the 32bit operating system.
>
> For read the file I'm using the next code:
>
> Dim Lector As New StreamReader(Path_File, True)
> Dim Cadena as string
>
> Cadena = Lector.ReadToEnd
>
> In the moment I read the file line by line, but this process is very slow,
> because I need process each line instead of all the block (for example for
> remove quotes or get data with Regex), if I can load all the file, I can
> apply regex for make multiples process.
>
> Thanks in advance for any help.
>
> Freddy Coal
>
Author
9 Jun 2009 8:57 PM
Mike
My Engineering Opinion:

This is an inefficient method for operating on a large block of memory
using a String class.

While you might be able to LOAD a huge string, working with it is
another story because it assumes all sorts of STRING related working
relationships, unbound limits, temporary duplication, holding space of
memory. With a high frequency of such temporary operations, it is
highly inefficient and performance hits are realized.

If this was C/C++ you have more power using pointers, so maybe you an
emulate the same functionality in VB.NET.  It really all depends on
what you want to do and based on my high loading product experience,
trying to work with a  huge string, especially one that is managed and
wrapped with OOPS,  well, you just wouldn't do it unless you had a set
practical limit.  200mb?  you are asking for a lot IMO.  Loading is
one thing, working with it is entirely different set of issues.

I know there is a tendency to use the tools, like a String class, to
handle any kind of length requirements. I'm fall into that too, but
that is not practical in many huge data cases.  You need limits and a
working knowledge of how data is manipulated in memory by all these
higher level wrappers, classes, functions and methods.

The point?

There are huge data/array solutions but you need to do more than just
use the basic classes provided to you.  For example, virtualize it
using a memory map, working in clusters/blocks of data, more usage of
pointers, in fact, I think there is a solution here using a custom
stream class.

..NET 4.0 now includes a System.IO.MemoryMappedFile class. That is
going to do wonders for high-end VB.NET development.  I really wish MS
would provide it to .NET 2.0 environments.

--


Freddy Coal wrote:
Show quoteHide quote
> Thanks for all the answers, I get some results reading the file with the
> next code:
>
> Dim cadena As String = ""
> Dim dato As Array
> If File.Exists(ruta) = True Then
> dato = My.Computer.FileSystem.ReadAllBytes(ruta)
> cadena = System.Text.Encoding.GetEncoding(0).GetString(dato)
> Return cadena
> End If
>
> Now I have the memory error when I try to split the string in an array,
> which is the limit for the array?
>
> Thanks in advance.
>
> Freddy Coal
>
> "Freddy Coal" <freddyc***@gmail.com> wrote in message
> news:uT6WFaI6JHA.1092@TK2MSFTNGP06.phx.gbl...
>> Hi, I have an strange error;
>>
>> I have a 200Mb txt file, for speed I try to load that in memory (my PC
>> have 3Gb in Ram with Win XP 32Bit), my problem is trying to load that in
>> an string variable, I get the next error:
>>
>> Exception of type 'System.OutOfMemoryException' was thrown.
>>
>> My question is if the memory limit for the string is 2^31 (2.147.483.647
>> more than 2Gb) why I can get that error when try to load a file of less
>> than the theorical limit of the 32bit operating system.
>>
>> For read the file I'm using the next code:
>>
>> Dim Lector As New StreamReader(Path_File, True)
>> Dim Cadena as string
>>
>> Cadena = Lector.ReadToEnd
>>
>> In the moment I read the file line by line, but this process is very slow,
>> because I need process each line instead of all the block (for example for
>> remove quotes or get data with Regex), if I can load all the file, I can
>> apply regex for make multiples process.
>>
>> Thanks in advance for any help.
>>
>> Freddy Coal
>>
>
>
Author
9 Jun 2009 10:41 PM
Freddy Coal
Mike, thank very much for your response (you make me improve my code : ) ),
in the moment I'm learning...

I think to that load all in memory is inefficient (and require a very robust
machine), but I don't know the better way for read the file, I don't know if
read the file line by line (in the moment I make that) is more fast that
load all in memory and process that in arrays, and I don't know how read the
text file in 'blocks' where each block it's the integration of different
lines with the same value of time inside; get that 'blocks' its very easy
with tools like MatchCollection when you have all in the string, but when
you have all in a file, the only solution in my ignorance is read the file
line by line.

The example of my text file is something like:

"1","07/01/2008
16:03:23.304","2:08:25.26N","76:38:38.27W","","8","869800000.00","10400000.00","-95.50"
"2","07/01/2008
16:03:23.304","2:08:25.26N","76:38:38.27W","","8","869800000.00","10400000.00","-89.88"
"3","07/01/2008
16:03:23.304","2:08:25.26N","76:38:38.27W","","8","869800000.00","10400000.00","-79.75"

The most important parameter for my is the last column of each line, and the
date/time, I gather all the values with the same time, and with that get a
trace. Many of the other values are the same in all the txt file.

Mike, thanks for your time, and any advice it's welcome

Freddy Coal

Show quoteHide quote
"Mike" <unkn***@unknown.tv> wrote in message
news:uy$bbTU6JHA.5932@TK2MSFTNGP03.phx.gbl...
> My Engineering Opinion:
>
> This is an inefficient method for operating on a large block of memory
> using a String class.
>
> While you might be able to LOAD a huge string, working with it is another
> story because it assumes all sorts of STRING related working
> relationships, unbound limits, temporary duplication, holding space of
> memory. With a high frequency of such temporary operations, it is highly
> inefficient and performance hits are realized.
>
> If this was C/C++ you have more power using pointers, so maybe you an
> emulate the same functionality in VB.NET.  It really all depends on what
> you want to do and based on my high loading product experience, trying to
> work with a  huge string, especially one that is managed and wrapped with
> OOPS,  well, you just wouldn't do it unless you had a set practical limit.
> 200mb?  you are asking for a lot IMO.  Loading is one thing, working with
> it is entirely different set of issues.
>
> I know there is a tendency to use the tools, like a String class, to
> handle any kind of length requirements. I'm fall into that too, but that
> is not practical in many huge data cases.  You need limits and a working
> knowledge of how data is manipulated in memory by all these higher level
> wrappers, classes, functions and methods.
>
> The point?
>
> There are huge data/array solutions but you need to do more than just use
> the basic classes provided to you.  For example, virtualize it using a
> memory map, working in clusters/blocks of data, more usage of pointers, in
> fact, I think there is a solution here using a custom stream class.
>
> .NET 4.0 now includes a System.IO.MemoryMappedFile class. That is going to
> do wonders for high-end VB.NET development.  I really wish MS would
> provide it to .NET 2.0 environments.
>
> --
>
>
> Freddy Coal wrote:
>> Thanks for all the answers, I get some results reading the file with the
>> next code:
>>
>> Dim cadena As String = ""
>> Dim dato As Array
>> If File.Exists(ruta) = True Then
>> dato = My.Computer.FileSystem.ReadAllBytes(ruta)
>> cadena = System.Text.Encoding.GetEncoding(0).GetString(dato)
>> Return cadena
>> End If
>>
>> Now I have the memory error when I try to split the string in an array,
>> which is the limit for the array?
>>
>> Thanks in advance.
>>
>> Freddy Coal
>>
>> "Freddy Coal" <freddyc***@gmail.com> wrote in message
>> news:uT6WFaI6JHA.1092@TK2MSFTNGP06.phx.gbl...
>>> Hi, I have an strange error;
>>>
>>> I have a 200Mb txt file, for speed I try to load that in memory (my PC
>>> have 3Gb in Ram with Win XP 32Bit), my problem is trying to load that in
>>> an string variable, I get the next error:
>>>
>>> Exception of type 'System.OutOfMemoryException' was thrown.
>>>
>>> My question is if the memory limit for the string is 2^31 (2.147.483.647
>>> more than 2Gb) why I can get that error when try to load a file of less
>>> than the theorical limit of the 32bit operating system.
>>>
>>> For read the file I'm using the next code:
>>>
>>> Dim Lector As New StreamReader(Path_File, True)
>>> Dim Cadena as string
>>>
>>> Cadena = Lector.ReadToEnd
>>>
>>> In the moment I read the file line by line, but this process is very
>>> slow, because I need process each line instead of all the block (for
>>> example for remove quotes or get data with Regex), if I can load all the
>>> file, I can apply regex for make multiples process.
>>>
>>> Thanks in advance for any help.
>>>
>>> Freddy Coal
>>>
>>
Author
14 Jun 2009 3:32 PM
Göran_Andersson
Freddy Coal wrote:
> I think to that load all in memory is inefficient (and require a very robust
> machine), but I don't know the better way for read the file, I don't know if
> read the file line by line (in the moment I make that) is more fast that
> load all in memory and process that in arrays,

There shouldn't really be that much difference in speed, and your main
problem is your memory usage.

If you read the file into a byte array, then decode it, then split it
into lines, that means that you are using 1 GB of memory to read a 200
MB file. You should clearly do a basic processing of the data while
reading the stream, so that you don't have three copies of all the data
at once in memory.

> and I don't know how read the
> text file in 'blocks' where each block it's the integration of different
> lines with the same value of time inside;

You can't. Files are not line oriented (or even character oriented), you
can't do any line based operations on a file.

Use a StreamReader to read the file line by line. The FileStream will
buffer the input, and the StreamReader will handle decoding and
detecting line breaks.

> get that 'blocks' its very easy
> with tools like MatchCollection when you have all in the string, but when
> you have all in a file, the only solution in my ignorance is read the file
> line by line.

There isn't reasonably any other way to do it for such a large file.

> The example of my text file is something like:
>
> "1","07/01/2008
> 16:03:23.304","2:08:25.26N","76:38:38.27W","","8","869800000.00","10400000.00","-95.50"
> "2","07/01/2008
> 16:03:23.304","2:08:25.26N","76:38:38.27W","","8","869800000.00","10400000.00","-89.88"
> "3","07/01/2008
> 16:03:23.304","2:08:25.26N","76:38:38.27W","","8","869800000.00","10400000.00","-79.75"
>
> The most important parameter for my is the last column of each line, and the
> date/time, I gather all the values with the same time, and with that get a
> trace. Many of the other values are the same in all the txt file.
>

You should read the lines and parse each line into an object, which you
then can easily work with. When you parse a string into numerical data,
it will also take upp less memory. A string holding one line will take
up about 220 bytes, while an object holding the parsed data would take
up about 70 bytes.

The class for parsing and holding the data could look something like
this (guessing wildly about what the data is actually for...):

Public Class TempData

    Private _id As Integer;
    Private _time As Date;
    Private _latitude As Double;
    Private _longitude As Double;
    Private _id2 As Integer;
    Private _x As Double;
    Private _y As Double;
    Private _temperature As Double;

    Public Sub New(data As String)
       Dim s As String() = data.Substring(1, data.Length - 2).Split(""",""")
       _id = Integer.Parse(s(0))
       _time = DateTime.Parse(s(1))
       _latitude = ParseCoordinate(s(2))
       _longitude = ParseCoordinate(s(3))
       _id2 = Integer.Parse(s(4))
       _x = Double.Parse(s(5))
       _y = Double.Parse(s(6))
       _tempreature =  = Double.Parse(s(7))
    End Sub

    Public Property Id As Integer
       Get
          Return _id
       End Get
    End Property

    Public Property Time As
       Get
          Return _time
       End Get
    End Property

    Public Property Latitude As
       Get
          Return _latitude
       End Get
    End Property

    Public Property Longitude As
       Get
          Return _longitude
       End Get
    End Property

    Public Property Id2 As
       Get
          Return _id2
       End Get
    End Property

    Public Property X As
       Get
          Return _x
       End Get
    End Property

    Public Property Y As
       Get
          Return _y
       End Get
    End Property

    Public Property Temperature As
       Get
          Return _temperature
       End Get
    End Property

End Class

--
Göran Andersson
_____
http://www.guffa.com
Author
9 Jun 2009 10:10 PM
Armin Zingler
Freddy Coal wrote:
> Thanks for all the answers, I get some results reading the file with
> the next code:
>
> Dim cadena As String = ""
> Dim dato As Array
> If File.Exists(ruta) = True Then
> dato = My.Computer.FileSystem.ReadAllBytes(ruta)
> cadena = System.Text.Encoding.GetEncoding(0).GetString(dato)
> Return cadena
> End If

Going the longer way by using My.Crap usually does not help. You should
better make a straight call to IO.File.ReadAllBytes.

Then you are using GetEncoding(0) which returns the default encoding.
System.Text.Encoding.Default would to the same.

But the main problem you have is that you don't seem to be aware of the
encoding that is used in the file. This time it's the Default encoding, last
time you've passed detectEncodingFromByteOrderMarks = true to the
Streamreader. Which one is correct?

I suspect that it's not even a pure text file that you want to read, is it?

> Now I have the memory error when I try to split the string in an
> array, which is the limit for the array?

System.IO.File.ReadAllLines should work fine if you pass the appropriate
encoding.


Armin
Author
9 Jun 2009 10:20 PM
Armin Zingler
Armin Zingler wrote:
> I suspect that it's not even a pure text file that you want to read,
> is it?

Forget that. You wrote it is a text file.

Armin
Author
10 Jun 2009 12:16 AM
FC
Thanks Armin, Yes and No, I use that piece of code for read other files,
that not are pure text, but the size is minimun (less than 3Mb), and that
code work great for me.

You are right in your comment, the encoding is very important in some cases,
but this not the case.

Thanks for your comments Armin.

Freddy Coal

Show quoteHide quote
"Armin Zingler" <az.nospam@freenet.de> wrote in message
news:%23YKjmWV6JHA.1416@TK2MSFTNGP04.phx.gbl...
> Armin Zingler wrote:
>> I suspect that it's not even a pure text file that you want to read,
>> is it?
>
> Forget that. You wrote it is a text file.
>
> Armin
Author
10 Jun 2009 2:12 AM
Armin Zingler
FC wrote:
> Thanks Armin, Yes and No, I use that piece of code for read other
> files, that not are pure text, but the size is minimun (less than
> 3Mb), and that code work great for me.
>
> You are right in your comment, the encoding is very important in some
> cases, but this not the case.

I don't understand you because, if I specify the correct encoding, I can
call ReadAllLines without an exception.

> Thanks for your comments Armin.


Armin