|
web
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
CSV File ImportGood morning all,
I have an issue where I am importing a large CSV into Visual Studio 2005 - Visual Basic; as always the creators of the CSV have included the details of the data at the end file (Some 1.6 million records from the beginning). I am doing some evaluation on the file before I undertake the importation, currently this is being done by reading all rows until I get to the end then returning the last 10 or so rows; does anyone know how I can read from the bottom of the file up to negate the 1.6 million cycles of the read procedure before I can get to the data I need. Any assistance you may be able to provide would be greatly appreciated. Yours sincerely, William Foster *** Sent via Developersdex http://www.developersdex.com *** William Foster wrote:
Show quoteHide quote > Good morning all, There's probably a simpler way, but I've done it before by first opening > > I have an issue where I am importing a large CSV into Visual Studio 2005 > - Visual Basic; as always the creators of the CSV have included the > details of the data at the end file (Some 1.6 million records from the > beginning). I am doing some evaluation on the file before I undertake > the importation, currently this is being done by reading all rows until > I get to the end then returning the last 10 or so rows; does anyone know > how I can read from the bottom of the file up to negate the 1.6 million > cycles of the read procedure before I can get to the data I need. > > Any assistance you may be able to provide would be greatly appreciated. > > Yours sincerely, > > William Foster > > *** Sent via Developersdex http://www.developersdex.com *** the file in Binary mode, and as CSV files are generally line-delimited by CrLf (Hex 0D 0A) characters, I start by obtaining the file length and reading backwards (using SEEK) to find the second-last CrLf. Between the second-last and the last CrLf would be your header information you were looking for. I trust this helps, although as already said, maybe someone else has a better way. ShaneO There are 10 kinds of people - Those who understand Binary and those who don't. Shane,
Thanks for your assistance, I will give that a go, it sounds a bit awkward, but it is much better than the way I am doing it. Thank you ! Yours sincerely, William Foster *** Sent via Developersdex http://www.developersdex.com *** William Foster wrote:
> Shane, Hello William,> > Thanks for your assistance, I will give that a go, it sounds a bit > awkward, but it is much better than the way I am doing it. > > Thank you ! > > Yours sincerely, > > William Foster > > *** Sent via Developersdex http://www.developersdex.com *** For what it's worth, I've found the code I mentioned earlier. You'll have to excuse the syntax, it was just a quick and dirty routine I needed for a one-off. If you copy & paste it (watch for line-wrapping!), then set the filename, you'll find it will display your Header data as required. Dim blPossibleEOL As Boolean = False Dim byTemp As Byte = 0 Dim iFileNum As UInt16 = FreeFile() Dim sA As String = "YOUR FILENAME HERE!!!" Dim sHeader As String = "" Dim iFileEnd As Integer = FileLen(sA) - 2 Dim iFileSeekPosition As Integer = iFileEnd FileOpen(iFileNum, sA, OpenMode.Binary, OpenAccess.Read) Do FileGet(iFileNum, byTemp, iFileSeekPosition) If blPossibleEOL Then If byTemp = &HD Then sHeader = StrDup(iFileEnd - (iFileSeekPosition + 1), " ") FileGet(iFileNum, sHeader, iFileSeekPosition + 2) Exit Do End If ElseIf byTemp = &HA Then blPossibleEOL = True Else blPossibleEOL = False End If iFileSeekPosition -= 1 Loop FileClose(iFileNum) Debug.Print(sHeader) Hope this helps. ShaneO There are 10 kinds of people - Those who understand Binary and those who don't. Shane,
Thanks a lot ! That will save me a fair bit of work. Yours sincerely, William Foster *** Sent via Developersdex http://www.developersdex.com *** Shane,
I tried that routine out and it worked beautifully; it was amazingly quick to search through on of my data files which was in excess of 700,000 rows. I have just one more question, now that I have the last row, do you have any ideas on how to capture the last sixteen rows? I have tried a few different things with your existing code but found that the FileGet must work on some type of algorithm as it doesn't read sequentially and in order to find the end of the file it only did 26 loops within the whole file. Thank you very much for your assistance so far it has been invaluable. Yours sincerely, William Foster *** Sent via Developersdex http://www.developersdex.com *** William Foster wrote:
Show quoteHide quote > Shane, William,> > I tried that routine out and it worked beautifully; it was amazingly > quick to search through on of my data files which was in excess of > 700,000 rows. > > I have just one more question, now that I have the last row, do you have > any ideas on how to capture the last sixteen rows? I have tried a few > different things with your existing code but found that the FileGet must > work on some type of algorithm as it doesn't read sequentially and in > order to find the end of the file it only did 26 loops within the whole > file. > > Thank you very much for your assistance so far it has been invaluable. > > Yours sincerely, > > William Foster > > *** Sent via Developersdex http://www.developersdex.com *** The following code will do what you want. (Watch for line-wrapping!) - Dim blPossibleEOL As Boolean = False Dim byTemp As Byte = 0 Dim iFileNum As UInt16 = FreeFile() Dim sA As String = "YOUR FILENAME HERE!!" Dim sHeader As String = "" Dim iCounter As UInt16 = 0 Dim iFileEnd As Integer = FileLen(sA) - 2 Dim iFileSeekPosition As Integer = iFileEnd FileOpen(iFileNum, sA, OpenMode.Binary, OpenAccess.Read) Do FileGet(iFileNum, byTemp, iFileSeekPosition) If blPossibleEOL Then If byTemp = &HD Then iCounter += 1 If iCounter = 26 Then 'Change this for whatever number of lines you are wanting. sHeader = StrDup(iFileEnd - (iFileSeekPosition + 1), " ") FileGet(iFileNum, sHeader, iFileSeekPosition + 2) Exit Do Else blPossibleEOL = False End If Else blPossibleEOL = False End If ElseIf byTemp = &HA Then blPossibleEOL = True Else blPossibleEOL = False End If iFileSeekPosition -= 1 Loop FileClose(iFileNum) Debug.Print(sHeader) One thing to be careful of is that I have declared both "iFileEnd" and "iFileSeekPosition" as Integer variables. If the CSV file you're reading is larger that approx 2GB you'll need to change that to Long Integer (UInt64). Hope this helps. ShaneO There are 10 kinds of people - Those who understand Binary and those who don't. "William Foster" <nospam@devdex.com> wrote in message Have you tried MSWord?news:%23tX3vjVSGHA.424@TK2MSFTNGP12.phx.gbl... > Good morning all, > > I have an issue where I am importing a large CSV into Visual Studio 2005 > - Visual Basic; as always the creators of the CSV have included the > details of the data at the end file (Some 1.6 million records from the > beginning). I am doing some evaluation on the file before I undertake > the importation, currently this is being done by reading all rows until > I get to the end then returning the last 10 or so rows; does anyone know > how I can read from the bottom of the file up to negate the 1.6 million > cycles of the read procedure before I can get to the data I need. Homer,
I was previously using Microsoft Excel, but I need to move to Visual Studio for a number of reasons. In Microsoft Word I would still need to seek the last record by reading all rows anyway wouldn't I ? Or is there a quick way ? Thanks for you assistance. Yours sincerely, William Foster *** Sent via Developersdex http://www.developersdex.com *** "William Foster" <nospam@devdex.com> wrote in message I believe you could just go <CTRL>-<END> and get the last several lines.news:eUA9U$WSGHA.4264@TK2MSFTNGP11.phx.gbl... > Homer, > > I was previously using Microsoft Excel, but I need to move to Visual > Studio for a number of reasons. In Microsoft Word I would still need to > seek the last record by reading all rows anyway wouldn't I ? Or is there > a quick way ? Homer,
Thanks for the tip, however, I need to do it progammatically as it is part of a larger program to clean a file, this first bit just assesses the data. There are in excess of 1.6 million records, with 73 fields per record so opening it up into Microsoft Word and then getting the program to scroll to the bottom with your method will result in a computer crash. Kind of the way that Microsoft Excel is currently dying whcih is why I am moving to Visual Studio. Thanks anyway ! Yours sincerely, William Foster *** Sent via Developersdex http://www.developersdex.com *** "William Foster" <nospam@devdex.com> wrote in message OK. If it was a one off I'd try Word. You could also do this with vi on a news:eBajbTZSGHA.4384@tk2msftngp13.phx.gbl... > Thanks for the tip, however, I need to do it progammatically as it is > part of a larger program to clean a file, this first bit just assesses > the data. There are in excess of 1.6 million records, with > 73 fields per record so opening it up into Microsoft Word and then > getting the program to scroll to the bottom with your method will result > in a computer crash. Kind of the way that Microsoft Excel is currently > dying whcih is why I am moving to Visual Studio. Unix system. Otherwise the best method is to open the file, go to the end, back up some and then read the data in. This is what Word and vi can do as both can handle files far larger than main memory. |
|||||||||||||||||||||||