Home All Groups Group Topic Archive Search About
Author
17 Mar 2006 12:22 AM
William Foster
Good morning all,

I have an issue where I am importing a large CSV into Visual Studio 2005
- Visual Basic; as always the creators of the CSV have included the
details of the data at the end file (Some 1.6 million records from the
beginning). I am doing some evaluation on the file before I undertake
the importation, currently this is being done by reading all rows until
I get to the end then returning the last 10 or so rows; does anyone know
how I can read from the bottom of the file up to negate the 1.6 million
cycles of the read procedure before I can get to the data I need.

Any assistance you may be able to provide would be greatly appreciated.

Yours sincerely,

William Foster

*** Sent via Developersdex http://www.developersdex.com ***

Author
17 Mar 2006 1:28 AM
ShaneO
William Foster wrote:
Show quoteHide quote
> Good morning all,
>
> I have an issue where I am importing a large CSV into Visual Studio 2005
> - Visual Basic; as always the creators of the CSV have included the
> details of the data at the end file (Some 1.6 million records from the
> beginning). I am doing some evaluation on the file before I undertake
> the importation, currently this is being done by reading all rows until
> I get to the end then returning the last 10 or so rows; does anyone know
> how I can read from the bottom of the file up to negate the 1.6 million
> cycles of the read procedure before I can get to the data I need.
>
> Any assistance you may be able to provide would be greatly appreciated.
>
> Yours sincerely,
>
> William Foster
>
> *** Sent via Developersdex http://www.developersdex.com ***

There's probably a simpler way, but I've done it before by first opening
the file in Binary mode, and as CSV files are generally line-delimited
by CrLf (Hex 0D 0A) characters, I start by obtaining the file length and
reading backwards (using SEEK) to find the second-last CrLf.

Between the second-last and the last CrLf would be your header
information you were looking for.

I trust this helps, although as already said, maybe someone else has a
better way.

ShaneO

There are 10 kinds of people - Those who understand Binary and those who
don't.
Author
17 Mar 2006 3:04 AM
William Foster
Shane,

Thanks for your assistance, I will give that a go, it sounds a bit
awkward, but it is much better than the way I am doing it.

Thank you !

Yours sincerely,

William Foster

*** Sent via Developersdex http://www.developersdex.com ***
Author
17 Mar 2006 3:40 AM
ShaneO
William Foster wrote:
> Shane,
>
> Thanks for your assistance, I will give that a go, it sounds a bit
> awkward, but it is much better than the way I am doing it.
>
> Thank you !
>
> Yours sincerely,
>
> William Foster
>
> *** Sent via Developersdex http://www.developersdex.com ***

Hello William,

For what it's worth, I've found the code I mentioned earlier.

You'll have to excuse the syntax, it was just a quick and dirty routine
I needed for a one-off.

If you copy & paste it (watch for line-wrapping!), then set the
filename, you'll find it will display your Header data as required.


Dim blPossibleEOL As Boolean = False
Dim byTemp As Byte = 0
Dim iFileNum As UInt16 = FreeFile()
Dim sA As String = "YOUR FILENAME HERE!!!"
Dim sHeader As String = ""

Dim iFileEnd As Integer = FileLen(sA) - 2
Dim iFileSeekPosition As Integer = iFileEnd
FileOpen(iFileNum, sA, OpenMode.Binary, OpenAccess.Read)

Do
     FileGet(iFileNum, byTemp, iFileSeekPosition)
     If blPossibleEOL Then
         If byTemp = &HD Then
             sHeader = StrDup(iFileEnd - (iFileSeekPosition + 1), " ")
             FileGet(iFileNum, sHeader, iFileSeekPosition + 2)
             Exit Do
         End If
     ElseIf byTemp = &HA Then
         blPossibleEOL = True
     Else
         blPossibleEOL = False
     End If
     iFileSeekPosition -= 1
Loop
FileClose(iFileNum)

Debug.Print(sHeader)


Hope this helps.

ShaneO

There are 10 kinds of people - Those who understand Binary and those who
don't.
Author
17 Mar 2006 7:26 AM
William Foster
Shane,

Thanks a lot ! That will save me a fair bit of work.

Yours sincerely,

William Foster

*** Sent via Developersdex http://www.developersdex.com ***
Author
18 Mar 2006 12:21 PM
William Foster
Shane,

I tried that routine out and it worked beautifully; it was amazingly
quick to search through on of my data files which was in excess of
700,000 rows.

I have just one more question, now that I have the last row, do you have
any ideas on how to capture the last sixteen rows?  I have tried a few
different things with your existing code but found that the FileGet must
work on some type of algorithm as it doesn't read sequentially and in
order to find the end of the file it only did 26 loops within the whole
file.

Thank you very much for your assistance so far it has been invaluable.

Yours sincerely,

William Foster

*** Sent via Developersdex http://www.developersdex.com ***
Author
18 Mar 2006 10:09 PM
ShaneO
William Foster wrote:
Show quoteHide quote
> Shane,
>
> I tried that routine out and it worked beautifully; it was amazingly
> quick to search through on of my data files which was in excess of
> 700,000 rows.
>
> I have just one more question, now that I have the last row, do you have
> any ideas on how to capture the last sixteen rows?  I have tried a few
> different things with your existing code but found that the FileGet must
> work on some type of algorithm as it doesn't read sequentially and in
> order to find the end of the file it only did 26 loops within the whole
> file.
>
> Thank you very much for your assistance so far it has been invaluable.
>
> Yours sincerely,
>
> William Foster
>
> *** Sent via Developersdex http://www.developersdex.com ***

William,

The following code will do what you want.  (Watch for line-wrapping!) -


Dim blPossibleEOL As Boolean = False
Dim byTemp As Byte = 0
Dim iFileNum As UInt16 = FreeFile()
Dim sA As String = "YOUR FILENAME HERE!!"
Dim sHeader As String = ""
Dim iCounter As UInt16 = 0

Dim iFileEnd As Integer = FileLen(sA) - 2
Dim iFileSeekPosition As Integer = iFileEnd
FileOpen(iFileNum, sA, OpenMode.Binary, OpenAccess.Read)

Do
     FileGet(iFileNum, byTemp, iFileSeekPosition)
     If blPossibleEOL Then
         If byTemp = &HD Then
             iCounter += 1
             If iCounter = 26 Then 'Change this for whatever number of
lines you are wanting.
                 sHeader = StrDup(iFileEnd - (iFileSeekPosition + 1), " ")
                 FileGet(iFileNum, sHeader, iFileSeekPosition + 2)
                 Exit Do
             Else
                 blPossibleEOL = False
             End If
         Else
             blPossibleEOL = False
         End If
     ElseIf byTemp = &HA Then
         blPossibleEOL = True
     Else
         blPossibleEOL = False
     End If
     iFileSeekPosition -= 1
Loop
FileClose(iFileNum)

Debug.Print(sHeader)

One thing to be careful of is that I have declared both "iFileEnd" and
"iFileSeekPosition" as Integer variables.  If the CSV file you're
reading is larger that approx 2GB you'll need to change that to Long
Integer (UInt64).

Hope this helps.

ShaneO

There are 10 kinds of people - Those who understand Binary and those who
don't.
Author
17 Mar 2006 2:09 AM
Homer J Simpson
"William Foster" <nospam@devdex.com> wrote in message
news:%23tX3vjVSGHA.424@TK2MSFTNGP12.phx.gbl...
> Good morning all,
>
> I have an issue where I am importing a large CSV into Visual Studio 2005
> - Visual Basic; as always the creators of the CSV have included the
> details of the data at the end file (Some 1.6 million records from the
> beginning). I am doing some evaluation on the file before I undertake
> the importation, currently this is being done by reading all rows until
> I get to the end then returning the last 10 or so rows; does anyone know
> how I can read from the bottom of the file up to negate the 1.6 million
> cycles of the read procedure before I can get to the data I need.

Have you tried MSWord?
Author
17 Mar 2006 3:06 AM
William Foster
Homer,

I was previously using Microsoft Excel, but I need to move to Visual
Studio for a number of reasons.  In Microsoft Word I would still need to
seek the last record by reading all rows anyway wouldn't I ? Or is there
a quick way ?

Thanks for you assistance.

Yours sincerely,

William Foster

*** Sent via Developersdex http://www.developersdex.com ***
Author
17 Mar 2006 4:25 AM
Homer J Simpson
"William Foster" <nospam@devdex.com> wrote in message
news:eUA9U$WSGHA.4264@TK2MSFTNGP11.phx.gbl...

> Homer,
>
> I was previously using Microsoft Excel, but I need to move to Visual
> Studio for a number of reasons.  In Microsoft Word I would still need to
> seek the last record by reading all rows anyway wouldn't I ? Or is there
> a quick way ?

I believe you could just go <CTRL>-<END> and get the last several lines.
Author
17 Mar 2006 7:31 AM
William Foster
Homer,

Thanks for the tip, however, I need to do it progammatically as it is
part of a larger program to clean a file, this first bit just assesses
the data.  There are in excess of 1.6 million records, with
73 fields per record so opening it up into Microsoft Word and then
getting the program to scroll to the bottom with your method will result
in a computer crash.   Kind of the way that Microsoft Excel is currently
dying whcih is why I am moving to Visual Studio.

Thanks anyway !

Yours sincerely,

William Foster

*** Sent via Developersdex http://www.developersdex.com ***
Author
17 Mar 2006 6:40 PM
Homer J Simpson
"William Foster" <nospam@devdex.com> wrote in message
news:eBajbTZSGHA.4384@tk2msftngp13.phx.gbl...

> Thanks for the tip, however, I need to do it progammatically as it is
> part of a larger program to clean a file, this first bit just assesses
> the data.  There are in excess of 1.6 million records, with
> 73 fields per record so opening it up into Microsoft Word and then
> getting the program to scroll to the bottom with your method will result
> in a computer crash.   Kind of the way that Microsoft Excel is currently
> dying whcih is why I am moving to Visual Studio.

OK. If it was a one off I'd try Word. You could also do this with vi on a
Unix system.

Otherwise the best method is to open the file, go to the end, back up some
and then read the data in. This is what Word and vi can do as both can
handle files far larger than main memory.