|
web
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
Binary Read Method?I wish to extract embedded string data from a file using a Binary Read method. The following code sample is used in VB.NET and similar code is used in VB6 - (Assume variable declarations etc.) FileOpen(iFileIn, sInputFile, OpenMode.Binary, OpenAccess.Read) iRecordEndAddress = iRecordCount * iRecordSize For iRecordStartAddress = 1 To iRecordEndAddress Step iRecordSize FileGet(iFileIn, sData, iRecordStartAddress) sA = Trim(Strings.Left(sData, 8)) sB = Trim(Strings.Mid(sData, 10, 60)) .. .. .. sOutPutText &= sA & "," & sB & vbCrLf Next FileClose(iFileIn) On the same datafile the VB6 app does the job in <2 secs, however in VB.NET it takes >15 secs. Now I'm not getting into the issues surrounding performance between the two languages, but I would like to know what others suggest as the best/quickest way to perform such a task under VB.NET (2005). I've tried the obvious My.Computer.FileSystem.ReadAllBytes and FileStream methods however any possible speed advantages are lost in converting the input-stream back into String Characters for my OutPut Text - unless someone can give me a quick way to do that! Any suggestions (apart from going back to VB6) would be appreciated. ShaneO There are 10 kinds of people - Those who understand Binary and those who don't. I would be inclined to use a StreamReader something like this:
Dim _sr As New StreamReader(sInputFile) Dim _chars(iRecordSize - 1) as Char Dim _text As New StringBuilder While sr.Peek() >= 0 _sr.Read(_chars, 0, _chars.Length) _text.AppendFormat("{0},", (New String(_chars, 0, 8)).Trim) _text.AppendFormat("{0},", (New String(_chars, 9, 60)).Trim) ... ' For the last 'field', do not append a comma _text.AppendFormat("{0}", (New String(_chars, x, y)).Trim) _text.Append(Environment.NewLine) Loop _sr.Close() sOutPutText - _text.ToString You will find that repeated operations on a StringBuilder object are far more efficient than the equivalent operations on String objects. You could also create an array of 'field' lengths and an array of 'field' start positions and use those in a loop like this: ' Make sure that _fieldstarts and _fieldlengths are the same length Dim _fieldstarts As Integer = New Integer() {0, 9, ... , n} Dim _fieldlengths As Integer = New Integer() {8, 60, ... , n} While sr.Peek() >= 0 _sr.Read(_chars, 0, _chars.Length) Dim _i As Integer For _i = 0 To _fieldstarts.Length - 2 _text.AppendFormat("{0},", (New String(_chars, _fieldstarts(_i), _fieldlengths(_i))).Trim) Next When the inner loop finishes, _i points to the final element of _fieldstarts and _fieldlengths ' For the last 'field', do not append a comma, but do append a cr/lf pair _text.AppendFormat("{0}{1}", (New String(_chars, _fieldstarts(_i), _fieldlengths(_i))).Trim, Environment.NewLine) Loop Show quoteHide quote "ShaneO" <spc***@optusnet.com.au> wrote in message news:458b224f$0$5744$afc38c87@news.optusnet.com.au... > Hello, > > I wish to extract embedded string data from a file using a Binary Read > method. > > The following code sample is used in VB.NET and similar code is used in > VB6 - > > (Assume variable declarations etc.) > FileOpen(iFileIn, sInputFile, OpenMode.Binary, OpenAccess.Read) > iRecordEndAddress = iRecordCount * iRecordSize > For iRecordStartAddress = 1 To iRecordEndAddress Step iRecordSize > FileGet(iFileIn, sData, iRecordStartAddress) > sA = Trim(Strings.Left(sData, 8)) > sB = Trim(Strings.Mid(sData, 10, 60)) > . > . > . > sOutPutText &= sA & "," & sB & vbCrLf > Next > FileClose(iFileIn) > > > On the same datafile the VB6 app does the job in <2 secs, however in > VB.NET it takes >15 secs. Now I'm not getting into the issues surrounding > performance between the two languages, but I would like to know what > others suggest as the best/quickest way to perform such a task under > VB.NET (2005). > > I've tried the obvious My.Computer.FileSystem.ReadAllBytes and FileStream > methods however any possible speed advantages are lost in converting the > input-stream back into String Characters for my OutPut Text - unless > someone can give me a quick way to do that! > > Any suggestions (apart from going back to VB6) would be appreciated. > > ShaneO > > There are 10 kinds of people - Those who understand Binary and those who > don't. Stephany Young wrote:
Show quoteHide quote > I would be inclined to use a StreamReader something like this: Thank-you Stephany for your very thorough answer.> > Dim _sr As New StreamReader(sInputFile) > > Dim _chars(iRecordSize - 1) as Char > > Dim _text As New StringBuilder > > While sr.Peek() >= 0 > _sr.Read(_chars, 0, _chars.Length) > _text.AppendFormat("{0},", (New String(_chars, 0, 8)).Trim) > _text.AppendFormat("{0},", (New String(_chars, 9, 60)).Trim) > ... > ' For the last 'field', do not append a comma > _text.AppendFormat("{0}", (New String(_chars, x, y)).Trim) > _text.Append(Environment.NewLine) > Loop > > _sr.Close() > > sOutPutText - _text.ToString > I did try similar but still found the constant looping (around 100K records) was still clobbering performance. I will, however, follow your example more precisely and test if it works quicker. In the meantime, I've been experimenting again with using ReadAllBytes and have found some tweaks to gain some speed improvements. One in particular and as you mentioned, String objects are not too efficient on repeated operations, so simply removing the following line - sOutPutText &= sA & "," & sB & vbCrLf and modifying it to (an already Open File) - Print(iFileOut, sA & "," & sB & vbCrLf) has had a staggering 50% reduction in the overall execution time! (Now down to >5 secs with other tweaks). Do you know of a quick method to transfer a consecutive block of bytes (stored in a Byte Array) into a String? If I could find that I believe I'd be able to deliver satisfactory performance, as this is currently my bottleneck. I've looked at System.Text.Encoding.Unicode.GetString but can't seem to make it work properly! ShaneO There are 10 kinds of people - Those who understand Binary and those who don't. Glad I could help.
What would be interesting from your point of view is to find what 'bits' are taking the time. For example: How long does it take just to read the input file? Dim _start As DateTime = DateTime.Now Dim _sr As New StreamReader(sInputFile) Dim _chars(iRecordSize - 1) as Char While sr.Peek() >= 0 _sr.Read(_chars, 0, _chars.Length) Loop _sr.Close() Console.WriteLine(DateTime.Now.Subtract(_start).TotalMilliseconds) Then add in variius 'bits' and take note of the elepsed time. You will soon find where the bottlenecks are and can concentrate on techniques to reduce those. Show quoteHide quote "ShaneO" <spc***@optusnet.com.au> wrote in message news:458b519a$0$2917$afc38c87@news.optusnet.com.au... > Stephany Young wrote: >> I would be inclined to use a StreamReader something like this: >> >> Dim _sr As New StreamReader(sInputFile) >> >> Dim _chars(iRecordSize - 1) as Char >> >> Dim _text As New StringBuilder >> >> While sr.Peek() >= 0 >> _sr.Read(_chars, 0, _chars.Length) >> _text.AppendFormat("{0},", (New String(_chars, 0, 8)).Trim) >> _text.AppendFormat("{0},", (New String(_chars, 9, 60)).Trim) >> ... >> ' For the last 'field', do not append a comma >> _text.AppendFormat("{0}", (New String(_chars, x, y)).Trim) >> _text.Append(Environment.NewLine) >> Loop >> >> _sr.Close() >> >> sOutPutText - _text.ToString >> > Thank-you Stephany for your very thorough answer. > > I did try similar but still found the constant looping (around 100K > records) was still clobbering performance. I will, however, follow your > example more precisely and test if it works quicker. > > In the meantime, I've been experimenting again with using ReadAllBytes and > have found some tweaks to gain some speed improvements. One in particular > and as you mentioned, String objects are not too efficient on repeated > operations, so simply removing the following line - > > sOutPutText &= sA & "," & sB & vbCrLf > > and modifying it to (an already Open File) - > > Print(iFileOut, sA & "," & sB & vbCrLf) > > has had a staggering 50% reduction in the overall execution time! (Now > down to >5 secs with other tweaks). > > Do you know of a quick method to transfer a consecutive block of bytes > (stored in a Byte Array) into a String? If I could find that I believe > I'd be able to deliver satisfactory performance, as this is currently my > bottleneck. I've looked at System.Text.Encoding.Unicode.GetString but > can't seem to make it work properly! > > ShaneO > > There are 10 kinds of people - Those who understand Binary and those who > don't. Stephany Young wrote:
> Glad I could help. I thought you'd be interested in what I finally came up with. I used > mostly what you provided, plus a bit of modifying, and ended up with - (Assume some variable declarations) Dim chInputChars(iRecordSize - 1) As Char Dim sbOutPutText As New System.Text.StringBuilder Using srInputFile As StreamReader = New StreamReader(sInputFileName, System.Text.Encoding.ASCII) Do While srInputFile.Peek() >= 0 srInputFile.Read(chInputChars, 0, chInputChars.Length) sA = (New String(chInputChars, 0, 8)).Trim sB = (New String(chInputChars, 9, 60)).Trim sbOutPutText.Append(sA & "," & sB & vbCrLf) Loop srInputFile.Close() End Using My.Computer.FileSystem.WriteAllText(sOutPutFileName, sbOutPutText.ToString, False) I'm wrapping everything inside a "Using" statement as I'm actually reading from more than one file in this section of code, so it allows me to use the same Variable name (srInputFile) a little later. Also, the "System.Text.Encoding.ASCII" is critical otherwise the ".Read" statement wouldn't work properly (??) I also found it quicker to use ".Append" in the manner that I show, rather than ".AppendFormat". I've never ventured much into the StreamReader but now you've wetted my appetite I believe I'll use it wherever possible in future! Finally, the GREAT news is that the timing for what I'm doing is now <1 sec, which is more than twice as fast as what I was achieving in VB6 and around 25 times faster than where I was when I started this thread. I've also included the following routine that I use for timing sections of code, maybe someone will find it useful. ''' <summary> ''' First call Starts the CodeTimer. Second call returns elapsed Milliseconds. ''' Sample Usage: (on 2nd call) Debug.Print (TimeSection) ''' </summary> ''' <returns></returns> ''' <remarks></remarks> Function TimeSection() As Double If CodeTimer.IsRunning Then CodeTimer.Stop() Return CodeTimer.ElapsedMilliseconds Else CodeTimer = Stopwatch.StartNew End If End Function Thanks again Stephany. It's input like yours that makes these NG's worthwhile. ShaneO There are 10 kinds of people - Those who understand Binary and those who don't. Tres cool :)
Now it's time for Strings 101. (And no, I didn't mean 101 Strings which was the name of a very good orchestra for those who didn't know that, or didn't want to know that.) Because a String object is 'immutable' every time we do an operation that 'changes' it or assigns it's value to something else, we actually create a new string. In a lot of cases this is hardly noticable, however when we have a lot of such operations happening in a fairly short space of time (a tight loop for instance) the overhead inherent in handling strings soon makes it's presence felt. Take for example: Dim _s As String = (New String(chInputChars, 0, 8)).Trim The 'New String(chInputChars, 0, 8)' creates one string, the Trim method returns a second string and the assignment to _s creates yet a third string. Now multiply that by however many 'fields' you have in your 'record' and then multiply the result by the number of 'records' and the number of new String objects created inside the loop is not insignificant. If you have 10 'fields' and 100,000 'records' then that is 3,000,000 new strings. Not only are they created, they also have to be dealt to by the garbage collector. I assume from your code that you may have extraneous trailing whitespace on any given 'field' and that it, in fact, does need to be trimmed off. This means that you do need the Trim operation which needs a String object so there are 2 new strings per 'field' that you can't do away with. If the Trim operation is not, in fact necessary, then doing away with it will save 10 operations per 'record' which is 1,000,000 operations over the process. Now we have only 2,000,000 new strings which is a significant saving. Now the question has to be, are you doing anything else with the variables sA, sB, etc., or are you just using them as a convienience? If it is the latter then modifying: sbOutPutText.Append(sA & "," & sB & vbCrLf) to: sbOutPutText.Append(New String(chInputChars, 0, 8) & "," & New String(chInputChars, 9, 60) & vbCrLf) then for out 100,000 'records' of 10 'fields' each the number of new strings is now reduced to 1,000,000 for the entire process, an even more significant saving. So the loop would now become: Do While srInputFile.Peek() >= 0 srInputFile.Read(chInputChars, 0, chInputChars.Length) sbOutPutText.Append(New String(chInputChars, 0, 8) & "," & New String(chInputChars, 9, 60) & "," & ... & vbCrLf) Loop Try it and see how you get on. Show quoteHide quote "ShaneO" <spc***@optusnet.com.au> wrote in message news:458b6ea4$0$9774$afc38c87@news.optusnet.com.au... > Stephany Young wrote: >> Glad I could help. >> > I thought you'd be interested in what I finally came up with. I used > mostly what you provided, plus a bit of modifying, and ended up with - > > (Assume some variable declarations) > Dim chInputChars(iRecordSize - 1) As Char > Dim sbOutPutText As New System.Text.StringBuilder > > Using srInputFile As StreamReader = New StreamReader(sInputFileName, > System.Text.Encoding.ASCII) > Do While srInputFile.Peek() >= 0 > srInputFile.Read(chInputChars, 0, chInputChars.Length) > sA = (New String(chInputChars, 0, 8)).Trim > sB = (New String(chInputChars, 9, 60)).Trim > sbOutPutText.Append(sA & "," & sB & vbCrLf) > Loop > srInputFile.Close() > End Using > My.Computer.FileSystem.WriteAllText(sOutPutFileName, > sbOutPutText.ToString, False) > > > I'm wrapping everything inside a "Using" statement as I'm actually reading > from more than one file in this section of code, so it allows me to use > the same Variable name (srInputFile) a little later. Also, the > "System.Text.Encoding.ASCII" is critical otherwise the ".Read" statement > wouldn't work properly (??) I also found it quicker to use ".Append" in > the manner that I show, rather than ".AppendFormat". > > I've never ventured much into the StreamReader but now you've wetted my > appetite I believe I'll use it wherever possible in future! > > Finally, the GREAT news is that the timing for what I'm doing is now <1 > sec, which is more than twice as fast as what I was achieving in VB6 and > around 25 times faster than where I was when I started this thread. > > I've also included the following routine that I use for timing sections of > code, maybe someone will find it useful. > > ''' <summary> > ''' First call Starts the CodeTimer. Second call returns elapsed > Milliseconds. > ''' Sample Usage: (on 2nd call) Debug.Print (TimeSection) > ''' </summary> > ''' <returns></returns> > ''' <remarks></remarks> > Function TimeSection() As Double > If CodeTimer.IsRunning Then > CodeTimer.Stop() > Return CodeTimer.ElapsedMilliseconds > Else > CodeTimer = Stopwatch.StartNew > End If > End Function > > > Thanks again Stephany. It's input like yours that makes these NG's > worthwhile. > > ShaneO > > There are 10 kinds of people - Those who understand Binary and those who > don't. This is slow:
> sbOutPutText.Append(New String(chInputChars, 0, 8) & "," & New This is MUCH faster (about 300-400%):> String(chInputChars, 9, 60) & vbCrLf) sbOutPutText.Append(chInputChars, 0, 8) sbOutPutText.Append(","c) sbOutPutText.Append(chInputChars, 9, 60) sbOutPutText.Append(vbCrLf) Show quoteHide quote "Stephany Young" <noone@localhost> wrote in message news:eYqPyjZJHHA.3268@TK2MSFTNGP04.phx.gbl... > Tres cool :) > > Now it's time for Strings 101. (And no, I didn't mean 101 Strings which > was the name of a very good orchestra for those who didn't know that, or > didn't want to know that.) > > Because a String object is 'immutable' every time we do an operation that > 'changes' it or assigns it's value to something else, we actually create a > new string. In a lot of cases this is hardly noticable, however when we > have a lot of such operations happening in a fairly short space of time (a > tight loop for instance) the overhead inherent in handling strings soon > makes it's presence felt. > > Take for example: > > Dim _s As String = (New String(chInputChars, 0, 8)).Trim > > The 'New String(chInputChars, 0, 8)' creates one string, the Trim method > returns a second string and the assignment to _s creates yet a third > string. > > Now multiply that by however many 'fields' you have in your 'record' and > then multiply the result by the number of 'records' and the number of new > String objects created inside the loop is not insignificant. If you have > 10 'fields' and 100,000 'records' then that is 3,000,000 new strings. Not > only are they created, they also have to be dealt to by the garbage > collector. > > I assume from your code that you may have extraneous trailing whitespace > on any given 'field' and that it, in fact, does need to be trimmed off. > This means that you do need the Trim operation which needs a String object > so there are 2 new strings per 'field' that you can't do away with. > > If the Trim operation is not, in fact necessary, then doing away with it > will save 10 operations per 'record' which is 1,000,000 operations over > the process. Now we have only 2,000,000 new strings which is a significant > saving. > > Now the question has to be, are you doing anything else with the variables > sA, sB, etc., or are you just using them as a convienience? If it is the > latter then modifying: > > sbOutPutText.Append(sA & "," & sB & vbCrLf) > > to: > > sbOutPutText.Append(New String(chInputChars, 0, 8) & "," & New > String(chInputChars, 9, 60) & vbCrLf) > > then for out 100,000 'records' of 10 'fields' each the number of new > strings is now reduced to 1,000,000 for the entire process, an even more > significant saving. > > So the loop would now become: > > Do While srInputFile.Peek() >= 0 > srInputFile.Read(chInputChars, 0, chInputChars.Length) > sbOutPutText.Append(New String(chInputChars, 0, 8) & "," & New > String(chInputChars, 9, 60) & "," & ... & vbCrLf) > Loop > > Try it and see how you get on. > > > "ShaneO" <spc***@optusnet.com.au> wrote in message > news:458b6ea4$0$9774$afc38c87@news.optusnet.com.au... >> Stephany Young wrote: >>> Glad I could help. >>> >> I thought you'd be interested in what I finally came up with. I used >> mostly what you provided, plus a bit of modifying, and ended up with - >> >> (Assume some variable declarations) >> Dim chInputChars(iRecordSize - 1) As Char >> Dim sbOutPutText As New System.Text.StringBuilder >> >> Using srInputFile As StreamReader = New StreamReader(sInputFileName, >> System.Text.Encoding.ASCII) >> Do While srInputFile.Peek() >= 0 >> srInputFile.Read(chInputChars, 0, chInputChars.Length) >> sA = (New String(chInputChars, 0, 8)).Trim >> sB = (New String(chInputChars, 9, 60)).Trim >> sbOutPutText.Append(sA & "," & sB & vbCrLf) >> Loop >> srInputFile.Close() >> End Using >> My.Computer.FileSystem.WriteAllText(sOutPutFileName, >> sbOutPutText.ToString, False) >> >> >> I'm wrapping everything inside a "Using" statement as I'm actually >> reading from more than one file in this section of code, so it allows me >> to use the same Variable name (srInputFile) a little later. Also, the >> "System.Text.Encoding.ASCII" is critical otherwise the ".Read" statement >> wouldn't work properly (??) I also found it quicker to use ".Append" in >> the manner that I show, rather than ".AppendFormat". >> >> I've never ventured much into the StreamReader but now you've wetted my >> appetite I believe I'll use it wherever possible in future! >> >> Finally, the GREAT news is that the timing for what I'm doing is now <1 >> sec, which is more than twice as fast as what I was achieving in VB6 and >> around 25 times faster than where I was when I started this thread. >> >> I've also included the following routine that I use for timing sections >> of code, maybe someone will find it useful. >> >> ''' <summary> >> ''' First call Starts the CodeTimer. Second call returns elapsed >> Milliseconds. >> ''' Sample Usage: (on 2nd call) Debug.Print (TimeSection) >> ''' </summary> >> ''' <returns></returns> >> ''' <remarks></remarks> >> Function TimeSection() As Double >> If CodeTimer.IsRunning Then >> CodeTimer.Stop() >> Return CodeTimer.ElapsedMilliseconds >> Else >> CodeTimer = Stopwatch.StartNew >> End If >> End Function >> >> >> Thanks again Stephany. It's input like yours that makes these NG's >> worthwhile. >> >> ShaneO >> >> There are 10 kinds of people - Those who understand Binary and those who >> don't. > > Aha ... You spotted the deliberate mistake :)
Just goes to show how easy it is to throw strings about willy-nilly and not bother checking out all the overloads of methods that are available. Show quoteHide quote "Mudhead" <noth***@yourhouse.com> wrote in message news:%23O$cHudJHHA.4848@TK2MSFTNGP04.phx.gbl... > This is slow: > >> sbOutPutText.Append(New String(chInputChars, 0, 8) & "," & New >> String(chInputChars, 9, 60) & vbCrLf) > > This is MUCH faster (about 300-400%): > > sbOutPutText.Append(chInputChars, 0, 8) > sbOutPutText.Append(","c) > sbOutPutText.Append(chInputChars, 9, 60) > sbOutPutText.Append(vbCrLf) > > "Stephany Young" <noone@localhost> wrote in message > news:eYqPyjZJHHA.3268@TK2MSFTNGP04.phx.gbl... >> Tres cool :) >> >> Now it's time for Strings 101. (And no, I didn't mean 101 Strings which >> was the name of a very good orchestra for those who didn't know that, or >> didn't want to know that.) >> >> Because a String object is 'immutable' every time we do an operation that >> 'changes' it or assigns it's value to something else, we actually create >> a new string. In a lot of cases this is hardly noticable, however when we >> have a lot of such operations happening in a fairly short space of time >> (a tight loop for instance) the overhead inherent in handling strings >> soon makes it's presence felt. >> >> Take for example: >> >> Dim _s As String = (New String(chInputChars, 0, 8)).Trim >> >> The 'New String(chInputChars, 0, 8)' creates one string, the Trim method >> returns a second string and the assignment to _s creates yet a third >> string. >> >> Now multiply that by however many 'fields' you have in your 'record' and >> then multiply the result by the number of 'records' and the number of new >> String objects created inside the loop is not insignificant. If you have >> 10 'fields' and 100,000 'records' then that is 3,000,000 new strings. Not >> only are they created, they also have to be dealt to by the garbage >> collector. >> >> I assume from your code that you may have extraneous trailing whitespace >> on any given 'field' and that it, in fact, does need to be trimmed off. >> This means that you do need the Trim operation which needs a String >> object so there are 2 new strings per 'field' that you can't do away >> with. >> >> If the Trim operation is not, in fact necessary, then doing away with it >> will save 10 operations per 'record' which is 1,000,000 operations over >> the process. Now we have only 2,000,000 new strings which is a >> significant saving. >> >> Now the question has to be, are you doing anything else with the >> variables sA, sB, etc., or are you just using them as a convienience? If >> it is the latter then modifying: >> >> sbOutPutText.Append(sA & "," & sB & vbCrLf) >> >> to: >> >> sbOutPutText.Append(New String(chInputChars, 0, 8) & "," & New >> String(chInputChars, 9, 60) & vbCrLf) >> >> then for out 100,000 'records' of 10 'fields' each the number of new >> strings is now reduced to 1,000,000 for the entire process, an even more >> significant saving. >> >> So the loop would now become: >> >> Do While srInputFile.Peek() >= 0 >> srInputFile.Read(chInputChars, 0, chInputChars.Length) >> sbOutPutText.Append(New String(chInputChars, 0, 8) & "," & New >> String(chInputChars, 9, 60) & "," & ... & vbCrLf) >> Loop >> >> Try it and see how you get on. >> >> >> "ShaneO" <spc***@optusnet.com.au> wrote in message >> news:458b6ea4$0$9774$afc38c87@news.optusnet.com.au... >>> Stephany Young wrote: >>>> Glad I could help. >>>> >>> I thought you'd be interested in what I finally came up with. I used >>> mostly what you provided, plus a bit of modifying, and ended up with - >>> >>> (Assume some variable declarations) >>> Dim chInputChars(iRecordSize - 1) As Char >>> Dim sbOutPutText As New System.Text.StringBuilder >>> >>> Using srInputFile As StreamReader = New StreamReader(sInputFileName, >>> System.Text.Encoding.ASCII) >>> Do While srInputFile.Peek() >= 0 >>> srInputFile.Read(chInputChars, 0, chInputChars.Length) >>> sA = (New String(chInputChars, 0, 8)).Trim >>> sB = (New String(chInputChars, 9, 60)).Trim >>> sbOutPutText.Append(sA & "," & sB & vbCrLf) >>> Loop >>> srInputFile.Close() >>> End Using >>> My.Computer.FileSystem.WriteAllText(sOutPutFileName, >>> sbOutPutText.ToString, False) >>> >>> >>> I'm wrapping everything inside a "Using" statement as I'm actually >>> reading from more than one file in this section of code, so it allows me >>> to use the same Variable name (srInputFile) a little later. Also, the >>> "System.Text.Encoding.ASCII" is critical otherwise the ".Read" statement >>> wouldn't work properly (??) I also found it quicker to use ".Append" in >>> the manner that I show, rather than ".AppendFormat". >>> >>> I've never ventured much into the StreamReader but now you've wetted my >>> appetite I believe I'll use it wherever possible in future! >>> >>> Finally, the GREAT news is that the timing for what I'm doing is now <1 >>> sec, which is more than twice as fast as what I was achieving in VB6 and >>> around 25 times faster than where I was when I started this thread. >>> >>> I've also included the following routine that I use for timing sections >>> of code, maybe someone will find it useful. >>> >>> ''' <summary> >>> ''' First call Starts the CodeTimer. Second call returns elapsed >>> Milliseconds. >>> ''' Sample Usage: (on 2nd call) Debug.Print (TimeSection) >>> ''' </summary> >>> ''' <returns></returns> >>> ''' <remarks></remarks> >>> Function TimeSection() As Double >>> If CodeTimer.IsRunning Then >>> CodeTimer.Stop() >>> Return CodeTimer.ElapsedMilliseconds >>> Else >>> CodeTimer = Stopwatch.StartNew >>> End If >>> End Function >>> >>> >>> Thanks again Stephany. It's input like yours that makes these NG's >>> worthwhile. >>> >>> ShaneO >>> >>> There are 10 kinds of people - Those who understand Binary and those who >>> don't. >> >> > > Stephany Young wrote:
> Yes, the "strings" do have varying amounts of whitespace that needs to > Take for example: > > Dim _s As String = (New String(chInputChars, 0, 8)).Trim > > I assume from your code that you may have extraneous trailing whitespace... be removed. > You guessed it, there are other things being done with these variables. > Now the question has to be, are you doing anything else with the variables > sA, sB, etc., ...... They are being interrogated for certain values, possibly being altered, and then added to a Structure so I do need to separate these variables out for this purpose. One other question you might be able to answer for me - Using the same routine, how would you extract a numeric Double in addition to Strings? Any ideas? I've started to look at Buffer.BlockCopy to take the data from the Char Array and put it into a Byte Array and then convert to a Double variable, however, I wonder if you (or anyone else) knows of a simpler method? I don't believe there's a simple "ConvertCharArrayToDouble" method, if there is, I haven't found it!! Thank-you for all your assistance so far, it is greatly appreciated. ShaneO There are 10 kinds of people - Those who understand Binary and those who don't. Typical Aussie - Bowl an underarm when you're not looking :)
Up to now you have given the impression that the file contained 'records' of fixed length 'fields' of purely textual data. Now you are implying that the file contains the binary representation of various data types. Is this the case? If so, then you need to be reading the data from the file as bytes reather that chars or strings. You have also implied that the input files are not that big that you can't fit an entire file into memory. Realisticly, what is the biggest file (in bytes) that you need to deal with? Assuming that you can fit it into memory then the System.IO.File.ReadAllBytes() method will read the entire file into an array of bytes in a single chunk: Dim _bytes as Byte() = File.ReadAllBytes(_filename) You will also need a 'pointer' that always indicates the next byte to be dealt with: Dim _pointer as Integer = 0 Your processing loop now becomes: While _pointer < _bytes.Length End While Inside the loop, for each 'field', you need to deal with the appropriate number of bytes as the expected type and advance the pointer accordingly: While _pointer < _bytes.Length sA = Encoding.ASCII.GetString(_bytes, _pointer, 8).Trim _pointer += 9 sB = Encoding.ASCII.GetString(_bytes, _pointer, 60).Trim _pointer += 60 ... End While If you need to deal with a double then BitConverter is your friend: Dim _d As Double = BitConverter(_bytes, _pointer) _pointer += 8 You are not going to be able to expect lightning speed because of the processing that needs to be done. Refering back to your original post, I not that you also imply that the 'fields' are not contiguous. It appears that the first 'field' is from position 1 thru position 8 but the second 'field is from position 10 to position 69. If this is correct, what is the value of the byte at position 9 and what is it's purpose? Show quoteHide quote "ShaneO" <spc***@optusnet.com.au> wrote in message news:458c3b1d$0$16552$afc38c87@news.optusnet.com.au... > Stephany Young wrote: >> >> Take for example: >> >> Dim _s As String = (New String(chInputChars, 0, 8)).Trim >> >> I assume from your code that you may have extraneous trailing >> whitespace... > Yes, the "strings" do have varying amounts of whitespace that needs to be > removed. > >> >> Now the question has to be, are you doing anything else with the >> variables sA, sB, etc., ...... > You guessed it, there are other things being done with these variables. > They are being interrogated for certain values, possibly being altered, > and then added to a Structure so I do need to separate these variables out > for this purpose. > > One other question you might be able to answer for me - > > Using the same routine, how would you extract a numeric Double in addition > to Strings? Any ideas? I've started to look at Buffer.BlockCopy to take > the data from the Char Array and put it into a Byte Array and then convert > to a Double variable, however, I wonder if you (or anyone else) knows of a > simpler method? I don't believe there's a simple > "ConvertCharArrayToDouble" method, if there is, I haven't found it!! > > Thank-you for all your assistance so far, it is greatly appreciated. > > ShaneO > > There are 10 kinds of people - Those who understand Binary and those who > don't. Stephany Young wrote:
> Yes, for original file it's all Text, but I'm now trying to apply my > Up to now you have given the impression that the file contained 'records' of > fixed length 'fields' of purely textual data. > > Now you are implying that the file contains the binary representation of > various data types. > > Is this the case? > new-found StreamReader/StringBuilder methods to another file which contains both Text and Numeric fields. (I did mention I'd do this ;-) ) > If so, then you need to be reading the data from the file as bytes reather Yes, I agree.> that chars or strings. > > You have also implied that the input files are not that big that you can't In this case, the largest file should not exceed 200MB.> fit an entire file into memory. Realisticly, what is the biggest file (in > bytes) that you need to deal with? > > Assuming that you can fit it into memory then the Hmmmm..... Not all machines have sufficient "available" memory.> System.IO.File.ReadAllBytes() method will read the entire file into an array > of bytes in a single chunk: > > Dim _bytes as Byte() = File.ReadAllBytes(_filename) > > You are not going to be able to expect lightning speed because of the As also mentioned in an earlier post, an existing VB6 app (which I > processing that needs to be done. > developed some time ago) does this VERY fast, however the almost identical code in VB.NET is a dog. I have faith that, with the right method, VB.NET will equal or exceed the performance I've experienced under VB6. It's already been proven with the earlier code you so graciously provided. And before anyone asks, I'm converting this app to VB.NET because the clients require some considerable enhancements to the original app and .NET is clearly the best option for what they need. It's just letting me down in this single area. > Refering back to your original post, I not that you also imply that the The datafile is a proprietary format (hence the need for proprietary > 'fields' are not contiguous. It appears that the first 'field' is from > position 1 thru position 8 but the second 'field is from position 10 to > position 69. If this is correct, what is the value of the byte at position 9 > and what is it's purpose? Read methods) with fixed field lengths and uses chr$(0) as a delimiter between fields. I certainly don't expect you to write the code for me, your help so far has been invaluable as until now I've been reluctant to venture into the StreamReader and StringBuilder methods. I can clearly see the path I need to take from here. I guess I just needed a push in the right direction. ShaneO There are 10 kinds of people - Those who understand Binary and those who don't. Back to the original post again.
So the 'fields' in a 'record' are seperated by NUL, (&H0, 0x0, Chr(0), Chr$(0) or whatever you want to call it. Are the 'records' seperated by anything special, perhaps 2 consecutive NUL's? If so then I would be inclined to take a completely different approach that would be far more efficient. Show quoteHide quote "ShaneO" <spc***@optusnet.com.au> wrote in message news:458c687a$0$5744$afc38c87@news.optusnet.com.au... > Stephany Young wrote: >> >> Up to now you have given the impression that the file contained 'records' >> of fixed length 'fields' of purely textual data. >> >> Now you are implying that the file contains the binary representation of >> various data types. >> >> Is this the case? >> > Yes, for original file it's all Text, but I'm now trying to apply my > new-found StreamReader/StringBuilder methods to another file which > contains both Text and Numeric fields. (I did mention I'd do this ;-) ) > >> If so, then you need to be reading the data from the file as bytes >> reather that chars or strings. >> > Yes, I agree. > >> You have also implied that the input files are not that big that you >> can't fit an entire file into memory. Realisticly, what is the biggest >> file (in bytes) that you need to deal with? >> > In this case, the largest file should not exceed 200MB. > >> Assuming that you can fit it into memory then the >> System.IO.File.ReadAllBytes() method will read the entire file into an >> array of bytes in a single chunk: >> >> Dim _bytes as Byte() = File.ReadAllBytes(_filename) >> > Hmmmm..... Not all machines have sufficient "available" memory. > >> You are not going to be able to expect lightning speed because of the >> processing that needs to be done. >> > As also mentioned in an earlier post, an existing VB6 app (which I > developed some time ago) does this VERY fast, however the almost identical > code in VB.NET is a dog. I have faith that, with the right method, VB.NET > will equal or exceed the performance I've experienced under VB6. It's > already been proven with the earlier code you so graciously provided. And > before anyone asks, I'm converting this app to VB.NET because the clients > require some considerable enhancements to the original app and .NET is > clearly the best option for what they need. It's just letting me down in > this single area. > >> Refering back to your original post, I not that you also imply that the >> 'fields' are not contiguous. It appears that the first 'field' is from >> position 1 thru position 8 but the second 'field is from position 10 to >> position 69. If this is correct, what is the value of the byte at >> position 9 and what is it's purpose? > The datafile is a proprietary format (hence the need for proprietary Read > methods) with fixed field lengths and uses chr$(0) as a delimiter between > fields. > > I certainly don't expect you to write the code for me, your help so far > has been invaluable as until now I've been reluctant to venture into the > StreamReader and StringBuilder methods. I can clearly see the path I need > to take from here. I guess I just needed a push in the right direction. > > ShaneO > > There are 10 kinds of people - Those who understand Binary and those who > don't. Stephany Young wrote:
> Back to the original post again. No, the last field of one record is only separated by &H0 from the first > > So the 'fields' in a 'record' are seperated by NUL, (&H0, 0x0, Chr(0), > Chr$(0) or whatever you want to call it. > > Are the 'records' seperated by anything special, perhaps 2 consecutive > NUL's? > field of the next record. Each record is of a fixed length however. You're considering Binary Block-Grabs at the data? ShaneO There are 10 kinds of people - Those who understand Binary and those who don't. And the last 'record' in the file obeys the fixed-length rule, i.e., the
last byte in the file is &H0? And the next question is, what does the varying amount of whitespace consist of? Is it a sequence of spaces (&H20)? If so, does a sequence of 2 spaces occur anywhere else other than within the 'whitespace' areas? Show quoteHide quote "ShaneO" <spc***@optusnet.com.au> wrote in message news:458c6ded$0$9775$afc38c87@news.optusnet.com.au... > Stephany Young wrote: >> Back to the original post again. >> >> So the 'fields' in a 'record' are seperated by NUL, (&H0, 0x0, Chr(0), >> Chr$(0) or whatever you want to call it. >> >> Are the 'records' seperated by anything special, perhaps 2 consecutive >> NUL's? >> > No, the last field of one record is only separated by &H0 from the first > field of the next record. Each record is of a fixed length however. > You're considering Binary Block-Grabs at the data? > > ShaneO > > There are 10 kinds of people - Those who understand Binary and those who > don't. Stephany Young wrote:
> And the last 'record' in the file obeys the fixed-length rule, i.e., the Yes, and the file-sizes are evenly divisible by the number of Records.> last byte in the file is &H0? > Yes, whitespaces are &H20. No, sequential &H20's only occur in the Text > And the next question is, what does the varying amount of whitespace consist > of? Is it a sequence of spaces (&H20)? If so, does a sequence of 2 spaces > occur anywhere else other than within the 'whitespace' areas? areas as whitespace, but I can't rule-out &H20 appearing in consecutive bytes within a numeric field. If it's of any consequence, "Deleted" records are filled with &HFF, but still delimited with &H0. ShaneO There are 10 kinds of people - Those who understand Binary and those who don't. Whoa up!!!!!!!
We're talking about your textual file here, not your binary file. How an &H20's appear in consecutive bytes within a numeric field. Seeing as all your fields are strings, that's a contradiction in terms. I assume that your "Deleted" records commant also applies to your binary file and not your textual file. By your definition thus far, a 'record in your file lookes like: AAAAAAAA{0}BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB{0} .... {0} where {0} represents the NUL delimiter. A record may also look like: AAAA {0}BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB{0} .... {0} but will never look like: AA AAAA{0}BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB{0} .... {0} The way I would be inclined to approach this is: Dim _sb As New StringBuilder Dim _br as New BinaryReader(File.Open(infileName, FileMode.Open)) While _br.PeekChar() <> -1 Dim _fields As String() = Encoding.ASCII.GetString(_br.ReadBytes(_recsize)).TrimEnd(Char.MinValue).Split(Char.MinValue) For _i As Integer = 0 To _fields.Length -1 _fields(_i) = _fields(_i).Trim ' Do whatever else needs doing with the 'field' Next _sb.AppendLine(String.Join(",", _fields)) Loop _br.Close() File.WriteAllText(outfilename, _sb.ToString) _br.PeekChar() will return -1 when there are no more characters in the stream. Encoding.ASCII.GetString(_br.ReadBytes(_recsize)) reads the specified number of the bytes from the stream, (advancing the stream pointer), and converts it to a string. TrimEnd(Char.MinValue) strips the final &H0 from the string. Split(Char.MinValue) returns an array of strings from the string using &H0 as the split delimiter. The inner loop is used to trim any whitespace from all the strings in the array and makes each 'field' available for further processing. If the value of the 'field' needs to modified it can be done here. String.Join(",", _fields) create as comma-delimited string comprising all the array elements. The AppendLine method of the StringBuilder appends the supplied string and automatically add a NewLine. Now all you variables, (sA, sB, etc.) are not required, nor do you have to worry about where a given 'field' starts or how long it is. Each 'field' is determined by it's (0 based) position in the array, i.e., sA equates to position 0, sB equates to position 1, etc. Show quoteHide quote "ShaneO" <spc***@optusnet.com.au> wrote in message news:458c7c22$0$9776$afc38c87@news.optusnet.com.au... > Stephany Young wrote: >> And the last 'record' in the file obeys the fixed-length rule, i.e., the >> last byte in the file is &H0? > Yes, and the file-sizes are evenly divisible by the number of Records. > >> >> And the next question is, what does the varying amount of whitespace >> consist of? Is it a sequence of spaces (&H20)? If so, does a sequence of >> 2 spaces occur anywhere else other than within the 'whitespace' areas? > Yes, whitespaces are &H20. No, sequential &H20's only occur in the Text > areas as whitespace, but I can't rule-out &H20 appearing in consecutive > bytes within a numeric field. > > If it's of any consequence, "Deleted" records are filled with &HFF, but > still delimited with &H0. > > ShaneO > > There are 10 kinds of people - Those who understand Binary and those who > don't. Stephany Young wrote:
> Whoa up!!!!!!! Sorry, I forgot!> > We're talking about your textual file here, not your binary file. > No. Both the Binary & Textual files have &HFF as identifiers of a > I assume that your "Deleted" records commant also applies to your binary > file and not your textual file. Deleted Record. (I had to modify the .ASCII to be .UTF8 because of this). Show quoteHide quote > All the above is perfectly correct! But it can look like -> By your definition thus far, a 'record in your file lookes like: > > AAAAAAAA{0}BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB{0} > ... {0} > > where {0} represents the NUL delimiter. > > A record may also look like: > > AAAA {0}BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB{0} > ... {0} > > but will never look like: > > AA AAAA{0}BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB{0} > ... {0} AAAA {0}BBBBB BBBBBBBBBBBBBB {0}... {0} Which is why I read to the End of the Field, not just to the first &H20. (The above example demonstrates that I'm reading Customer/Supplier Codes followed by Customer/Supplier Names) > OK, it's going to take a bit of time to test what you've provided, but I > The way I would be inclined to approach this is: > will certainly see how it goes. In another post, I've already mentioned the amazing results of simply using the ReadAllBytes approach, at least for the smaller Text files, so the options here may be somewhat moot, but it will no doubt stand me in good stead for the Binary files! ShaneO There are 10 kinds of people - Those who understand Binary and those who don't. Stephany Young wrote:
> Typical Aussie - Bowl an underarm when you're not looking :) Oh, and as you're a Kiwi (?), I might sometimes refer to VBsux! :-)> ShaneO There are 10 kinds of people - Those who understand Binary and those who don't. And you can have fush n chups but not VBsex !!!!!!!!!!!!!!
Show quoteHide quote "ShaneO" <spc***@optusnet.com.au> wrote in message news:458c6af9$0$21086$afc38c87@news.optusnet.com.au... > Stephany Young wrote: >> Typical Aussie - Bowl an underarm when you're not looking :) >> > Oh, and as you're a Kiwi (?), I might sometimes refer to VBsux! :-) > > ShaneO > > There are 10 kinds of people - Those who understand Binary and those who > don't. Stephany Young wrote:
> And you can have fush n chups but not VBsex !!!!!!!!!!!!!! That's very funny - So funny I nearly fell off my Chilly-Bin!!!> ShaneO There are 10 kinds of people - Those who understand Binary and those who don't. That's not surprising seeing as it doesn't take much to amuse an Aussie :)
Show quoteHide quote "ShaneO" <spc***@optusnet.com.au> wrote in message news:458c7032$0$9771$afc38c87@news.optusnet.com.au... > Stephany Young wrote: >> And you can have fush n chups but not VBsex !!!!!!!!!!!!!! >> > That's very funny - So funny I nearly fell off my Chilly-Bin!!! > > ShaneO > > There are 10 kinds of people - Those who understand Binary and those who > don't. Stephany Young wrote:
> I've just tested this method on the "smaller" files (up to 20MB) and > Dim _bytes as Byte() = File.ReadAllBytes(_filename) > > You will also need a 'pointer' that always indicates the next byte to be > dealt with: > > Dim _pointer as Integer = 0 > > Your processing loop now becomes: > > While _pointer < _bytes.Length > End While > it's working even better than the StreamReader method! It now only takes 1/3 the amount of time of that method. Keep this up and the application will be so fast it will extract the data even before it's been launched!!!!!! ShaneO There are 10 kinds of people - Those who understand Binary and those who don't. Shane,
I have not any idea about the time aspect, but I surely would use the binary reader in your case. http://msdn2.microsoft.com/en-us/library/system.io.binaryreader.aspx I hope this helps, Cor Show quoteHide quote "ShaneO" <spc***@optusnet.com.au> schreef in bericht news:458b224f$0$5744$afc38c87@news.optusnet.com.au... > Hello, > > I wish to extract embedded string data from a file using a Binary Read > method. > > The following code sample is used in VB.NET and similar code is used in > VB6 - > > (Assume variable declarations etc.) > FileOpen(iFileIn, sInputFile, OpenMode.Binary, OpenAccess.Read) > iRecordEndAddress = iRecordCount * iRecordSize > For iRecordStartAddress = 1 To iRecordEndAddress Step iRecordSize > FileGet(iFileIn, sData, iRecordStartAddress) > sA = Trim(Strings.Left(sData, 8)) > sB = Trim(Strings.Mid(sData, 10, 60)) > . > . > . > sOutPutText &= sA & "," & sB & vbCrLf > Next > FileClose(iFileIn) > > > On the same datafile the VB6 app does the job in <2 secs, however in > VB.NET it takes >15 secs. Now I'm not getting into the issues surrounding > performance between the two languages, but I would like to know what > others suggest as the best/quickest way to perform such a task under > VB.NET (2005). > > I've tried the obvious My.Computer.FileSystem.ReadAllBytes and FileStream > methods however any possible speed advantages are lost in converting the > input-stream back into String Characters for my OutPut Text - unless > someone can give me a quick way to do that! > > Any suggestions (apart from going back to VB6) would be appreciated. > > ShaneO > > There are 10 kinds of people - Those who understand Binary and those who > don't. Cor Ligthert [MVP] wrote:
> Shane, Thank-you Cor, I was however able to resolve this one with help from > > I have not any idea about the time aspect, but I surely would use the binary > reader in your case. > > http://msdn2.microsoft.com/en-us/library/system.io.binaryreader.aspx > > I hope this helps, > > Cor > Stephany (see post above). There is certainly some merit in using the Binary Reader, and I did try this also, but for absolute simplicity (and speed) the StreamReader solution has resulted in exactly what I wanted. Regards, ShaneO There are 10 kinds of people - Those who understand Binary and those who don't.
cut, copy, and paste
Reservation algorithm Multithreading with DirectoryServices pseudo spin-lock design help DirectoryServices Cast Error? can send data to sql 2005, can not read data from sql 2005 with vb Strange RaiseEvent Problem What Happened To F5 Stopping Windows Service with Thread Console Application - run in hidden mode |
|||||||||||||||||||||||