|
web
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Variable String LimitI have a 200Mb txt file, for speed I try to load that in memory (my PC have 3Gb in Ram with Win XP 32Bit), my problem is trying to load that in an string variable, I get the next error: Exception of type 'System.OutOfMemoryException' was thrown. My question is if the memory limit for the string is 2^31 (2.147.483.647 more than 2Gb) why I can get that error when try to load a file of less than the theorical limit of the 32bit operating system. For read the file I'm using the next code: Dim Lector As New StreamReader(Path_File, True) Dim Cadena as string Cadena = Lector.ReadToEnd In the moment I read the file line by line, but this process is very slow, because I need process each line instead of all the block (for example for remove quotes or get data with Regex), if I can load all the file, I can apply regex for make multiples process. Thanks in advance for any help. Freddy Coal Freddy Coal wrote:
Show quoteHide quote > Hi, I have an strange error; I get the same error with the overloaded constructor you use. If I specify> > I have a 200Mb txt file, for speed I try to load that in memory (my > PC have 3Gb in Ram with Win XP 32Bit), my problem is trying to load > that in an string variable, I get the next error: > > Exception of type 'System.OutOfMemoryException' was thrown. > > My question is if the memory limit for the string is 2^31 > (2.147.483.647 more than 2Gb) why I can get that error when try to > load a file of less than the theorical limit of the 32bit operating > system. > For read the file I'm using the next code: > > Dim Lector As New StreamReader(Path_File, True) > Dim Cadena as string > > Cadena = Lector.ReadToEnd > > In the moment I read the file line by line, but this process is very > slow, because I need process each line instead of all the block (for > example for remove quotes or get data with Regex), if I can load all > the file, I can apply regex for make multiples process. the encoding (Unicode), there's no problem. Which character encoding is used in the file? Is it possible you also specify the correct encoding? Armin "Freddy Coal" <freddyc***@gmail.com> wrote in message The 200MB could become 400MB if the source is ANSI. Also, the out of memory news:uT6WFaI6JHA.1092@TK2MSFTNGP06.phx.gbl... > Hi, I have an strange error; > > I have a 200Mb txt file, for speed I try to load that in memory (my PC > have 3Gb in Ram with Win XP 32Bit), my problem is trying to load that in > an string variable, I get the next error: > > Exception of type 'System.OutOfMemoryException' was thrown. error could be because there no contiguous block of memory of that size. You may have 1 GB free, but fragmented. Others answered on the ANSI/UNICODE limits. I'm commenting on a design
alternative since you mentioned reading from disk and slowness. I'm relative new to VB.NET so if I had this large text file need (and I will) for VB.NET, I would naturally look for Memory Map I/O support. A memory map is virtualized between the DISK and MEMORY so its tons faster than reading from disk only, can't even tell the difference unless you profiled it, and really not the much slower than getting all into memory which will probably create other scalability and performance issues anyway, increase page faults and GC stress. If .NET already has library support for it, then you might want to check that out. I did a quick search for it last week but didn't find anything. Let me research again..... OH WONDERFUL! It appears .NET 4.0 has support for a new System library: System.IO.MemoryMappedFile and best I can see, the only MSDN search result for that is: http://blogs.msdn.com/salvapatuel/archive/2009/06/08/working-with-memory-mapped-files-in-net-4.aspx In theory, all allocated memory is virtualized anyway, but not in your control, i.e, like a big string. I wonder if MemoryStream is .NET version of a memory map. Reading MSDN docs for MemoryStream is not quite yelling that out but I suspect the underlining implementation is memory mapped. :-) But as you can see from the example in this blog, the easy of creating/opening one. This is going to be a god-send for VB.NET developers! Large file names will be optimized for VB.NET now! I'm still going to see if I can write a CMemoryMapFile class one for ..NET 2.0. :-) -- Show quoteHide quoteFreddy Coal wrote: > Hi, I have an strange error; > > I have a 200Mb txt file, for speed I try to load that in memory (my PC have > 3Gb in Ram with Win XP 32Bit), my problem is trying to load that in an > string variable, I get the next error: > > Exception of type 'System.OutOfMemoryException' was thrown. > > My question is if the memory limit for the string is 2^31 (2.147.483.647 > more than 2Gb) why I can get that error when try to load a file of less than > the theorical limit of the 32bit operating system. > > For read the file I'm using the next code: > > Dim Lector As New StreamReader(Path_File, True) > Dim Cadena as string > > Cadena = Lector.ReadToEnd > > In the moment I read the file line by line, but this process is very slow, > because I need process each line instead of all the block (for example for > remove quotes or get data with Regex), if I can load all the file, I can > apply regex for make multiples process. > > Thanks in advance for any help. > > Freddy Coal > > Follow up:
Here are some examples of using memory maps in VB.NET http://www.pinvoke.net/default.aspx/kernel32/MapViewOfFile.html The OP might want see if this large text file operations need worked for him using MMF. -- Show quoteHide quoteMike wrote: > Others answered on the ANSI/UNICODE limits. I'm commenting on a design > alternative since you mentioned reading from disk and slowness. > > I'm relative new to VB.NET so if I had this large text file need (and I > will) for VB.NET, I would naturally look for Memory Map I/O support. > > A memory map is virtualized between the DISK and MEMORY > so its tons faster than reading from disk only, can't even tell the > difference unless you profiled it, and really not the much slower than > getting all into memory which will probably create other scalability and > performance issues anyway, increase page faults and GC stress. > > If .NET already has library support for it, then you might want to check > that out. I did a quick search for it last week but didn't find > anything. Let me research again..... OH WONDERFUL! > > It appears .NET 4.0 has support for a new System library: > > System.IO.MemoryMappedFile > > and best I can see, the only MSDN search result for that is: > > http://blogs.msdn.com/salvapatuel/archive/2009/06/08/working-with-memory-mapped-files-in-net-4.aspx > > > In theory, all allocated memory is virtualized anyway, but not in your > control, i.e, like a big string. > > I wonder if MemoryStream is .NET version of a memory map. Reading MSDN > docs for MemoryStream is not quite yelling that out but I suspect the > underlining implementation is memory mapped. :-) > > But as you can see from the example in this blog, the easy of > creating/opening one. > > This is going to be a god-send for VB.NET developers! Large file names > will be optimized for VB.NET now! > > I'm still going to see if I can write a CMemoryMapFile class one for > .NET 2.0. :-) > > -- > > Freddy Coal wrote: >> Hi, I have an strange error; >> >> I have a 200Mb txt file, for speed I try to load that in memory (my PC >> have 3Gb in Ram with Win XP 32Bit), my problem is trying to load that >> in an string variable, I get the next error: >> >> Exception of type 'System.OutOfMemoryException' was thrown. >> >> My question is if the memory limit for the string is 2^31 >> (2.147.483.647 more than 2Gb) why I can get that error when try to >> load a file of less than the theorical limit of the 32bit operating >> system. >> >> For read the file I'm using the next code: >> >> Dim Lector As New StreamReader(Path_File, True) >> Dim Cadena as string >> >> Cadena = Lector.ReadToEnd >> >> In the moment I read the file line by line, but this process is very >> slow, because I need process each line instead of all the block (for >> example for remove quotes or get data with Regex), if I can load all >> the file, I can apply regex for make multiples process. >> >> Thanks in advance for any help. >> >> Freddy Coal >> Thanks for all the answers, I get some results reading the file with the
next code: Dim cadena As String = "" Dim dato As Array If File.Exists(ruta) = True Then dato = My.Computer.FileSystem.ReadAllBytes(ruta) cadena = System.Text.Encoding.GetEncoding(0).GetString(dato) Return cadena End If Now I have the memory error when I try to split the string in an array, which is the limit for the array? Thanks in advance. Freddy Coal Show quoteHide quote "Freddy Coal" <freddyc***@gmail.com> wrote in message news:uT6WFaI6JHA.1092@TK2MSFTNGP06.phx.gbl... > Hi, I have an strange error; > > I have a 200Mb txt file, for speed I try to load that in memory (my PC > have 3Gb in Ram with Win XP 32Bit), my problem is trying to load that in > an string variable, I get the next error: > > Exception of type 'System.OutOfMemoryException' was thrown. > > My question is if the memory limit for the string is 2^31 (2.147.483.647 > more than 2Gb) why I can get that error when try to load a file of less > than the theorical limit of the 32bit operating system. > > For read the file I'm using the next code: > > Dim Lector As New StreamReader(Path_File, True) > Dim Cadena as string > > Cadena = Lector.ReadToEnd > > In the moment I read the file line by line, but this process is very slow, > because I need process each line instead of all the block (for example for > remove quotes or get data with Regex), if I can load all the file, I can > apply regex for make multiples process. > > Thanks in advance for any help. > > Freddy Coal > My Engineering Opinion:
This is an inefficient method for operating on a large block of memory using a String class. While you might be able to LOAD a huge string, working with it is another story because it assumes all sorts of STRING related working relationships, unbound limits, temporary duplication, holding space of memory. With a high frequency of such temporary operations, it is highly inefficient and performance hits are realized. If this was C/C++ you have more power using pointers, so maybe you an emulate the same functionality in VB.NET. It really all depends on what you want to do and based on my high loading product experience, trying to work with a huge string, especially one that is managed and wrapped with OOPS, well, you just wouldn't do it unless you had a set practical limit. 200mb? you are asking for a lot IMO. Loading is one thing, working with it is entirely different set of issues. I know there is a tendency to use the tools, like a String class, to handle any kind of length requirements. I'm fall into that too, but that is not practical in many huge data cases. You need limits and a working knowledge of how data is manipulated in memory by all these higher level wrappers, classes, functions and methods. The point? There are huge data/array solutions but you need to do more than just use the basic classes provided to you. For example, virtualize it using a memory map, working in clusters/blocks of data, more usage of pointers, in fact, I think there is a solution here using a custom stream class. ..NET 4.0 now includes a System.IO.MemoryMappedFile class. That is going to do wonders for high-end VB.NET development. I really wish MS would provide it to .NET 2.0 environments. -- Show quoteHide quoteFreddy Coal wrote: > Thanks for all the answers, I get some results reading the file with the > next code: > > Dim cadena As String = "" > Dim dato As Array > If File.Exists(ruta) = True Then > dato = My.Computer.FileSystem.ReadAllBytes(ruta) > cadena = System.Text.Encoding.GetEncoding(0).GetString(dato) > Return cadena > End If > > Now I have the memory error when I try to split the string in an array, > which is the limit for the array? > > Thanks in advance. > > Freddy Coal > > "Freddy Coal" <freddyc***@gmail.com> wrote in message > news:uT6WFaI6JHA.1092@TK2MSFTNGP06.phx.gbl... >> Hi, I have an strange error; >> >> I have a 200Mb txt file, for speed I try to load that in memory (my PC >> have 3Gb in Ram with Win XP 32Bit), my problem is trying to load that in >> an string variable, I get the next error: >> >> Exception of type 'System.OutOfMemoryException' was thrown. >> >> My question is if the memory limit for the string is 2^31 (2.147.483.647 >> more than 2Gb) why I can get that error when try to load a file of less >> than the theorical limit of the 32bit operating system. >> >> For read the file I'm using the next code: >> >> Dim Lector As New StreamReader(Path_File, True) >> Dim Cadena as string >> >> Cadena = Lector.ReadToEnd >> >> In the moment I read the file line by line, but this process is very slow, >> because I need process each line instead of all the block (for example for >> remove quotes or get data with Regex), if I can load all the file, I can >> apply regex for make multiples process. >> >> Thanks in advance for any help. >> >> Freddy Coal >> > > Mike, thank very much for your response (you make me improve my code : ) ),
in the moment I'm learning... I think to that load all in memory is inefficient (and require a very robust machine), but I don't know the better way for read the file, I don't know if read the file line by line (in the moment I make that) is more fast that load all in memory and process that in arrays, and I don't know how read the text file in 'blocks' where each block it's the integration of different lines with the same value of time inside; get that 'blocks' its very easy with tools like MatchCollection when you have all in the string, but when you have all in a file, the only solution in my ignorance is read the file line by line. The example of my text file is something like: "1","07/01/2008 16:03:23.304","2:08:25.26N","76:38:38.27W","","8","869800000.00","10400000.00","-95.50" "2","07/01/2008 16:03:23.304","2:08:25.26N","76:38:38.27W","","8","869800000.00","10400000.00","-89.88" "3","07/01/2008 16:03:23.304","2:08:25.26N","76:38:38.27W","","8","869800000.00","10400000.00","-79.75" The most important parameter for my is the last column of each line, and the date/time, I gather all the values with the same time, and with that get a trace. Many of the other values are the same in all the txt file. Mike, thanks for your time, and any advice it's welcome Freddy Coal Show quoteHide quote "Mike" <unkn***@unknown.tv> wrote in message news:uy$bbTU6JHA.5932@TK2MSFTNGP03.phx.gbl... > My Engineering Opinion: > > This is an inefficient method for operating on a large block of memory > using a String class. > > While you might be able to LOAD a huge string, working with it is another > story because it assumes all sorts of STRING related working > relationships, unbound limits, temporary duplication, holding space of > memory. With a high frequency of such temporary operations, it is highly > inefficient and performance hits are realized. > > If this was C/C++ you have more power using pointers, so maybe you an > emulate the same functionality in VB.NET. It really all depends on what > you want to do and based on my high loading product experience, trying to > work with a huge string, especially one that is managed and wrapped with > OOPS, well, you just wouldn't do it unless you had a set practical limit. > 200mb? you are asking for a lot IMO. Loading is one thing, working with > it is entirely different set of issues. > > I know there is a tendency to use the tools, like a String class, to > handle any kind of length requirements. I'm fall into that too, but that > is not practical in many huge data cases. You need limits and a working > knowledge of how data is manipulated in memory by all these higher level > wrappers, classes, functions and methods. > > The point? > > There are huge data/array solutions but you need to do more than just use > the basic classes provided to you. For example, virtualize it using a > memory map, working in clusters/blocks of data, more usage of pointers, in > fact, I think there is a solution here using a custom stream class. > > .NET 4.0 now includes a System.IO.MemoryMappedFile class. That is going to > do wonders for high-end VB.NET development. I really wish MS would > provide it to .NET 2.0 environments. > > -- > > > Freddy Coal wrote: >> Thanks for all the answers, I get some results reading the file with the >> next code: >> >> Dim cadena As String = "" >> Dim dato As Array >> If File.Exists(ruta) = True Then >> dato = My.Computer.FileSystem.ReadAllBytes(ruta) >> cadena = System.Text.Encoding.GetEncoding(0).GetString(dato) >> Return cadena >> End If >> >> Now I have the memory error when I try to split the string in an array, >> which is the limit for the array? >> >> Thanks in advance. >> >> Freddy Coal >> >> "Freddy Coal" <freddyc***@gmail.com> wrote in message >> news:uT6WFaI6JHA.1092@TK2MSFTNGP06.phx.gbl... >>> Hi, I have an strange error; >>> >>> I have a 200Mb txt file, for speed I try to load that in memory (my PC >>> have 3Gb in Ram with Win XP 32Bit), my problem is trying to load that in >>> an string variable, I get the next error: >>> >>> Exception of type 'System.OutOfMemoryException' was thrown. >>> >>> My question is if the memory limit for the string is 2^31 (2.147.483.647 >>> more than 2Gb) why I can get that error when try to load a file of less >>> than the theorical limit of the 32bit operating system. >>> >>> For read the file I'm using the next code: >>> >>> Dim Lector As New StreamReader(Path_File, True) >>> Dim Cadena as string >>> >>> Cadena = Lector.ReadToEnd >>> >>> In the moment I read the file line by line, but this process is very >>> slow, because I need process each line instead of all the block (for >>> example for remove quotes or get data with Regex), if I can load all the >>> file, I can apply regex for make multiples process. >>> >>> Thanks in advance for any help. >>> >>> Freddy Coal >>> >> Freddy Coal wrote:
> I think to that load all in memory is inefficient (and require a very robust There shouldn't really be that much difference in speed, and your main > machine), but I don't know the better way for read the file, I don't know if > read the file line by line (in the moment I make that) is more fast that > load all in memory and process that in arrays, problem is your memory usage. If you read the file into a byte array, then decode it, then split it into lines, that means that you are using 1 GB of memory to read a 200 MB file. You should clearly do a basic processing of the data while reading the stream, so that you don't have three copies of all the data at once in memory. > and I don't know how read the You can't. Files are not line oriented (or even character oriented), you > text file in 'blocks' where each block it's the integration of different > lines with the same value of time inside; can't do any line based operations on a file. Use a StreamReader to read the file line by line. The FileStream will buffer the input, and the StreamReader will handle decoding and detecting line breaks. > get that 'blocks' its very easy There isn't reasonably any other way to do it for such a large file.> with tools like MatchCollection when you have all in the string, but when > you have all in a file, the only solution in my ignorance is read the file > line by line. > The example of my text file is something like: You should read the lines and parse each line into an object, which you > > "1","07/01/2008 > 16:03:23.304","2:08:25.26N","76:38:38.27W","","8","869800000.00","10400000.00","-95.50" > "2","07/01/2008 > 16:03:23.304","2:08:25.26N","76:38:38.27W","","8","869800000.00","10400000.00","-89.88" > "3","07/01/2008 > 16:03:23.304","2:08:25.26N","76:38:38.27W","","8","869800000.00","10400000.00","-79.75" > > The most important parameter for my is the last column of each line, and the > date/time, I gather all the values with the same time, and with that get a > trace. Many of the other values are the same in all the txt file. > then can easily work with. When you parse a string into numerical data, it will also take upp less memory. A string holding one line will take up about 220 bytes, while an object holding the parsed data would take up about 70 bytes. The class for parsing and holding the data could look something like this (guessing wildly about what the data is actually for...): Public Class TempData Private _id As Integer; Private _time As Date; Private _latitude As Double; Private _longitude As Double; Private _id2 As Integer; Private _x As Double; Private _y As Double; Private _temperature As Double; Public Sub New(data As String) Dim s As String() = data.Substring(1, data.Length - 2).Split(""",""") _id = Integer.Parse(s(0)) _time = DateTime.Parse(s(1)) _latitude = ParseCoordinate(s(2)) _longitude = ParseCoordinate(s(3)) _id2 = Integer.Parse(s(4)) _x = Double.Parse(s(5)) _y = Double.Parse(s(6)) _tempreature = = Double.Parse(s(7)) End Sub Public Property Id As Integer Get Return _id End Get End Property Public Property Time As Get Return _time End Get End Property Public Property Latitude As Get Return _latitude End Get End Property Public Property Longitude As Get Return _longitude End Get End Property Public Property Id2 As Get Return _id2 End Get End Property Public Property X As Get Return _x End Get End Property Public Property Y As Get Return _y End Get End Property Public Property Temperature As Get Return _temperature End Get End Property End Class Freddy Coal wrote:
> Thanks for all the answers, I get some results reading the file with Going the longer way by using My.Crap usually does not help. You should> the next code: > > Dim cadena As String = "" > Dim dato As Array > If File.Exists(ruta) = True Then > dato = My.Computer.FileSystem.ReadAllBytes(ruta) > cadena = System.Text.Encoding.GetEncoding(0).GetString(dato) > Return cadena > End If better make a straight call to IO.File.ReadAllBytes. Then you are using GetEncoding(0) which returns the default encoding. System.Text.Encoding.Default would to the same. But the main problem you have is that you don't seem to be aware of the encoding that is used in the file. This time it's the Default encoding, last time you've passed detectEncodingFromByteOrderMarks = true to the Streamreader. Which one is correct? I suspect that it's not even a pure text file that you want to read, is it? > Now I have the memory error when I try to split the string in an System.IO.File.ReadAllLines should work fine if you pass the appropriate> array, which is the limit for the array? encoding. Armin Armin Zingler wrote:
> I suspect that it's not even a pure text file that you want to read, Forget that. You wrote it is a text file.> is it? Armin Thanks Armin, Yes and No, I use that piece of code for read other files,
that not are pure text, but the size is minimun (less than 3Mb), and that code work great for me. You are right in your comment, the encoding is very important in some cases, but this not the case. Thanks for your comments Armin. Freddy Coal Show quoteHide quote "Armin Zingler" <az.nospam@freenet.de> wrote in message news:%23YKjmWV6JHA.1416@TK2MSFTNGP04.phx.gbl... > Armin Zingler wrote: >> I suspect that it's not even a pure text file that you want to read, >> is it? > > Forget that. You wrote it is a text file. > > Armin FC wrote:
> Thanks Armin, Yes and No, I use that piece of code for read other I don't understand you because, if I specify the correct encoding, I can > files, that not are pure text, but the size is minimun (less than > 3Mb), and that code work great for me. > > You are right in your comment, the encoding is very important in some > cases, but this not the case. call ReadAllLines without an exception. > Thanks for your comments Armin. Armin |
|||||||||||||||||||||||