Home All Groups Group Topic Archive Search About
Author
25 Mar 2005 9:05 PM
GeorgeAtkins
Using VB.NET in VS 2003.
This should be a simple routine, but it has me flummoxed.
I have to compare strings in two text files:
FILE1(srar) consists of lines of book titles.
FILE (srac) consists of multi-line “book records”, separated by blank rows.
The number of rows in each “book record” varies, as in this diagram:

1xxxxxx
xxxxxx
TitleRow
xxxxxx

2xxxxxx
TitleRow
xxxxxx
xxxxxx
xxxxxx


All I’m trying to do is to first read a title in File1, then read through
the entire File2 file, one “book record” at a time, looking for a matching
title. After other processing (not relevant here) and all book records have
been searched, go to the next File1 title and read all the book records in
File2 again. In short:  typical looping within a loop.

The problem is that I cannot get it to work correctly! I know the issue
involves the blank rows and how the streamreader works. Here are core
excerpts of my code. Any enlightenment is greatly appreciated!

Dim RowCntr, ARBldr, ARCntr, MARCRowCntr, z, StartPos, TabPos As Long
Dim C1, C2, BkTitle, ArTitle, Arline, Acline, ArWtr, AcWtr, arbk As String
Dim srAR As System.IO.StreamReader = New StreamReader(FILE1.txt,
Encoding.GetEncoding(1252))
Dim srAC As System.IO.StreamReader = New StreamReader(FILE2.txt,
Encoding.GetEncoding(1252))

' Loop through the AR file (FILE1), reading each book title
  For ARCntr = 0 To RdrArray.GetUpperBound(1)
      MarcCntr = 1 ' initialize marc record counter variable
    ‘ Read through FILE2, one “book record” at a time.
      Do  
         RowCntr = 1 ‘ reset  variable for next book record row counter
       ' Now, the Inner loop supposed to read all lines for a single book
record.
         Do 
           Acline = srAC.ReadLine
           LibList.Add(Acline) ' new field row
           If Acline=”” Then ‘ found the blank row
              LibList.Add(vbCr) ' new field row
              RowCntr += 1  ' set value of RowCntr
           End If
        Loop Until srAC.Peek = -1 
      Loop Until srAC.Peek = -1 ' of first DO. Get another book record
  Next ARCntr  ' of the original FOR loop. Get another book title

I hope this is enough information to work with. I'll reply with more info if
necessary.
Thanks again for any help!

George

Author
26 Mar 2005 9:11 PM
Rod Gill
Hi,

Can't give you code (I'm learning vb.net myself!) but I would:

Read the second file into memory ignoring all blank rows and ignoring any
non-title rows if possible.
I would save possible title rows into a collection you define to hold all
the data you need
Then you can search the list letting the code behind the collection do all
the searching for you. It should also run a lot faster.

--

Rod Gill


Show quoteHide quote
"GeorgeAtkins" <GeorgeAtk***@discussions.microsoft.com> wrote in message
news:83A27617-C674-4F56-BF82-B0C6D90788F4@microsoft.com...
> Using VB.NET in VS 2003.
> This should be a simple routine, but it has me flummoxed.
> I have to compare strings in two text files:
> FILE1(srar) consists of lines of book titles.
> FILE (srac) consists of multi-line "book records", separated by blank
> rows.
> The number of rows in each "book record" varies, as in this diagram:
>
> 1xxxxxx
> xxxxxx
> TitleRow
> xxxxxx
>
> 2xxxxxx
> TitleRow
> xxxxxx
> xxxxxx
> xxxxxx
>
>
> All I'm trying to do is to first read a title in File1, then read through
> the entire File2 file, one "book record" at a time, looking for a matching
> title. After other processing (not relevant here) and all book records
> have
> been searched, go to the next File1 title and read all the book records in
> File2 again. In short:  typical looping within a loop.
>
> The problem is that I cannot get it to work correctly! I know the issue
> involves the blank rows and how the streamreader works. Here are core
> excerpts of my code. Any enlightenment is greatly appreciated!
>
> Dim RowCntr, ARBldr, ARCntr, MARCRowCntr, z, StartPos, TabPos As Long
> Dim C1, C2, BkTitle, ArTitle, Arline, Acline, ArWtr, AcWtr, arbk As String
> Dim srAR As System.IO.StreamReader = New StreamReader(FILE1.txt,
> Encoding.GetEncoding(1252))
> Dim srAC As System.IO.StreamReader = New StreamReader(FILE2.txt,
> Encoding.GetEncoding(1252))
>
> ' Loop through the AR file (FILE1), reading each book title
>  For ARCntr = 0 To RdrArray.GetUpperBound(1)
>      MarcCntr = 1 ' initialize marc record counter variable
>    ' Read through FILE2, one "book record" at a time.
>      Do
>         RowCntr = 1 ' reset  variable for next book record row counter
>       ' Now, the Inner loop supposed to read all lines for a single book
> record.
>         Do
>           Acline = srAC.ReadLine
>           LibList.Add(Acline) ' new field row
>           If Acline="" Then ' found the blank row
>              LibList.Add(vbCr) ' new field row
>              RowCntr += 1  ' set value of RowCntr
>           End If
>        Loop Until srAC.Peek = -1
>      Loop Until srAC.Peek = -1 ' of first DO. Get another book record
>  Next ARCntr  ' of the original FOR loop. Get another book title
>
> I hope this is enough information to work with. I'll reply with more info
> if
> necessary.
> Thanks again for any help!
>
> George
Author
27 Mar 2005 2:27 AM
GeorgeAtkins
Thanks for the reply, Rod.
Unfortunately, there are reasons I cannot do this (otherwise, it would have
been a no-brainer). And I probably didn't go into sufficient detail about
this, so I apologize:

1. These "book records" must be written back out to a text file to be
imported into a database; hence, removing blank rows, extracting only certain
rows, etc., will do no good.
2. As for the blank rows, they must remain, as they are the legal
"separators" between the records. Otherwise, the import process (which also
involves a file format change that we don't need to talk about here) will
fail. However, I have tried stopping the loop when it hits a blank row and
programmatically putting the blank row back in before it is written out to a
file.

And the reason I'm trying to read these records one at a time is that I'm
dealing with tens of thousands of them and I fear running out of memory if I
try to read them all at one time. Remember each "record" contains numerous
rows of data.

In fact, after I finish processing a single record, I clear the array and
read in another record to process.

Perhaps another way for me to ask t his question is this:

With the outside For loop, looking through the AR book title files, what is
the most effective way for me to run an inner loop to read through the book
records file, one "book record" of rows at a time?

If anything else comes to mind, don't hesitate to respond! Thanks for your
ideas and time, Rod!

George


Show quoteHide quote
"Rod Gill" wrote:

> Hi,
>
> Can't give you code (I'm learning vb.net myself!) but I would:
>
> Read the second file into memory ignoring all blank rows and ignoring any
> non-title rows if possible.
> I would save possible title rows into a collection you define to hold all
> the data you need
> Then you can search the list letting the code behind the collection do all
> the searching for you. It should also run a lot faster.
>
> --
>
> Rod Gill
>
>
> "GeorgeAtkins" <GeorgeAtk***@discussions.microsoft.com> wrote in message
> news:83A27617-C674-4F56-BF82-B0C6D90788F4@microsoft.com...
> > Using VB.NET in VS 2003.
> > This should be a simple routine, but it has me flummoxed.
> > I have to compare strings in two text files:
> > FILE1(srar) consists of lines of book titles.
> > FILE (srac) consists of multi-line "book records", separated by blank
> > rows.
> > The number of rows in each "book record" varies, as in this diagram:
> >
> > 1xxxxxx
> > xxxxxx
> > TitleRow
> > xxxxxx
> >
> > 2xxxxxx
> > TitleRow
> > xxxxxx
> > xxxxxx
> > xxxxxx
> >
> >
> > All I'm trying to do is to first read a title in File1, then read through
> > the entire File2 file, one "book record" at a time, looking for a matching
> > title. After other processing (not relevant here) and all book records
> > have
> > been searched, go to the next File1 title and read all the book records in
> > File2 again. In short:  typical looping within a loop.
> >
> > The problem is that I cannot get it to work correctly! I know the issue
> > involves the blank rows and how the streamreader works. Here are core
> > excerpts of my code. Any enlightenment is greatly appreciated!
> >
> > Dim RowCntr, ARBldr, ARCntr, MARCRowCntr, z, StartPos, TabPos As Long
> > Dim C1, C2, BkTitle, ArTitle, Arline, Acline, ArWtr, AcWtr, arbk As String
> > Dim srAR As System.IO.StreamReader = New StreamReader(FILE1.txt,
> > Encoding.GetEncoding(1252))
> > Dim srAC As System.IO.StreamReader = New StreamReader(FILE2.txt,
> > Encoding.GetEncoding(1252))
> >
> > ' Loop through the AR file (FILE1), reading each book title
> >  For ARCntr = 0 To RdrArray.GetUpperBound(1)
> >      MarcCntr = 1 ' initialize marc record counter variable
> >    ' Read through FILE2, one "book record" at a time.
> >      Do
> >         RowCntr = 1 ' reset  variable for next book record row counter
> >       ' Now, the Inner loop supposed to read all lines for a single book
> > record.
> >         Do
> >           Acline = srAC.ReadLine
> >           LibList.Add(Acline) ' new field row
> >           If Acline="" Then ' found the blank row
> >              LibList.Add(vbCr) ' new field row
> >              RowCntr += 1  ' set value of RowCntr
> >           End If
> >        Loop Until srAC.Peek = -1
> >      Loop Until srAC.Peek = -1 ' of first DO. Get another book record
> >  Next ARCntr  ' of the original FOR loop. Get another book title
> >
> > I hope this is enough information to work with. I'll reply with more info
> > if
> > necessary.
> > Thanks again for any help!
> >
> > George
>
>
>
Author
27 Mar 2005 7:09 AM
Nick Malik [Microsoft]
You logic is strange.

Best to do this as psuedo-code first.  Then convert to the language.  I also
suggest that you pull out a seperate function to get the group of records
from the second file and return only the book title.

    Open File1
    Read a record from File1
    Do while not EOF(File1)
        FoundIt = false
        Open File 2
        BookTitle = GetBookTitle(File2)
        Do while not EOF(File2)
           if (File1.Record == BookTitle) then
               FoundIt = true
               break out of inner loop
          end if
          BookTitle = GetBookTitle(File2)
        Loop
        if (FoundIt) then
            ' do your processing
        end if
        Close File 2
        Read a record from File 1
    Loop


The inner call: GetBookTitle is pretty simple at this point:
      Function GetBookTitle(file File2) as string
           GetBookTitle = blank   ' make sure you return a valid value
           read a line from File2
           do wile line is not blank
              save line to array or other object
              if line is the title
                 GetBookTitle = line read
              end if
              read a line from File2
            loop
      End Function

Hopefully this will clear up the log jam.

BTW: if you are searching for items from the first set in the second set,
why not just load both of them into database tables and then do a simple
join?  It's a LOT more efficient and would go much quicker!

You mention also that these files can get big.  This algorithm, while it
will solve the immediate problem, will present you with a new one: it is not
efficient at all.  You will reread the entire large file 2 for each record
in File 2.  I don't know how many records are in File 1 or File 2, but if
you are afraid of running out of memory, I'd mention that this method will
have you running out of processor time and will put an amazing strain on the
garbage collector!

If you must do this kind of processing, is there any way you can presort
both files?  If so, you won't have to read each file from the beginning.
You can read from file1 until you pass where the first record would occur
alphabetically. Then, if you don't fine record 1, you move forward to
record2 and read forward from there.  each file is read once.   This is
considerably more efficient.  (It's how we used to do this in the old days,
where every cycle on a computer was counted and charged back to the person
running the job.

The best approach is still to load both table in a db and do a join.

HTH,
--
--- Nick Malik [Microsoft]
    MCSD, CFPS, Certified Scrummaster
    http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
   I do not answer questions on behalf of my employer.  I'm just a
programmer helping programmers.
--
Show quoteHide quote
"GeorgeAtkins" <GeorgeAtk***@discussions.microsoft.com> wrote in message
news:83A27617-C674-4F56-BF82-B0C6D90788F4@microsoft.com...
> Using VB.NET in VS 2003.
> This should be a simple routine, but it has me flummoxed.
> I have to compare strings in two text files:
> FILE1(srar) consists of lines of book titles.
> FILE (srac) consists of multi-line "book records", separated by blank
> rows.
> The number of rows in each "book record" varies, as in this diagram:
>
> 1xxxxxx
> xxxxxx
> TitleRow
> xxxxxx
>
> 2xxxxxx
> TitleRow
> xxxxxx
> xxxxxx
> xxxxxx
>
>
> All I'm trying to do is to first read a title in File1, then read through
> the entire File2 file, one "book record" at a time, looking for a matching
> title. After other processing (not relevant here) and all book records
> have
> been searched, go to the next File1 title and read all the book records in
> File2 again. In short:  typical looping within a loop.
>
> The problem is that I cannot get it to work correctly! I know the issue
> involves the blank rows and how the streamreader works. Here are core
> excerpts of my code. Any enlightenment is greatly appreciated!
>
> Dim RowCntr, ARBldr, ARCntr, MARCRowCntr, z, StartPos, TabPos As Long
> Dim C1, C2, BkTitle, ArTitle, Arline, Acline, ArWtr, AcWtr, arbk As String
> Dim srAR As System.IO.StreamReader = New StreamReader(FILE1.txt,
> Encoding.GetEncoding(1252))
> Dim srAC As System.IO.StreamReader = New StreamReader(FILE2.txt,
> Encoding.GetEncoding(1252))
>
> ' Loop through the AR file (FILE1), reading each book title
>  For ARCntr = 0 To RdrArray.GetUpperBound(1)
>      MarcCntr = 1 ' initialize marc record counter variable
>    ' Read through FILE2, one "book record" at a time.
>      Do
>         RowCntr = 1 ' reset  variable for next book record row counter
>       ' Now, the Inner loop supposed to read all lines for a single book
> record.
>         Do
>           Acline = srAC.ReadLine
>           LibList.Add(Acline) ' new field row
>           If Acline="" Then ' found the blank row
>              LibList.Add(vbCr) ' new field row
>              RowCntr += 1  ' set value of RowCntr
>           End If
>        Loop Until srAC.Peek = -1
>      Loop Until srAC.Peek = -1 ' of first DO. Get another book record
>  Next ARCntr  ' of the original FOR loop. Get another book title
>
> I hope this is enough information to work with. I'll reply with more info
> if
> necessary.
> Thanks again for any help!
>
> George
Author
27 Mar 2005 11:51 PM
GeorgeAtkins
A great reply, Nick. Thanks! You have some good ideas to consider.

However, let me try to clarify a few things. Sorry I did not mention all of
this earlier, but I didn't want me message to go on and on, and I thought
that my problem was simply one of finding the right syntax for the code.

Everything you say about using a DB or sorting data makes sense; however, I
have to deal with a big restraint: The file I am reading (so-called FILE2) is
composed of what are called MARC records (I referred to them earlier as "book
records" that are composed of a variable number of rows).

Each MARC record must be kept (and at least returned to) its original state.
The reason is that I have to take the updated MARC records and import them
back into their library database program, with my updates. (And the library
program is a proprietary, closed system, alas.)

Thus, it seems to me that extracting and sorting out the titles will still
force me to keep track of their relationship back to the MARC records in
order that I do my other processing (which is to edit a specific "field"
value in the MARC record of a matched book title).

As for creating joins on titles, that is a good idea, however, here is the
rub:
The book titles in file1 may not specifically match the titles in File2
(containing the MARC records). There can be variations based on data entry
errors, abbreviations, use of sub-titles, etc. What I've done is to
"normalize" titles on both sides by stripping out all spaces, same-casing all
text, substituting "foreign characters" for normal alphabetic ones (a for á,
etc.), and then doing substring matches.

I could read all of the data into database tables, but I'm not sure what it
buys me in the end, given the data "normalizing", pattern matching and
importing back into the library system that has to be done, in any event.
But, I'll take a look at it. Perhaps I'm just dense or too close to the
project.

Thanks again for the advise!

George Atkins


Show quoteHide quote
"Nick Malik [Microsoft]" wrote:

> You logic is strange.
>
> Best to do this as psuedo-code first.  Then convert to the language.  I also
> suggest that you pull out a seperate function to get the group of records
> from the second file and return only the book title.
>
>     Open File1
>     Read a record from File1
>     Do while not EOF(File1)
>         FoundIt = false
>         Open File 2
>         BookTitle = GetBookTitle(File2)
>         Do while not EOF(File2)
>            if (File1.Record == BookTitle) then
>                FoundIt = true
>                break out of inner loop
>           end if
>           BookTitle = GetBookTitle(File2)
>         Loop
>         if (FoundIt) then
>             ' do your processing
>         end if
>         Close File 2
>         Read a record from File 1
>     Loop
>
>
> The inner call: GetBookTitle is pretty simple at this point:
>       Function GetBookTitle(file File2) as string
>            GetBookTitle = blank   ' make sure you return a valid value
>            read a line from File2
>            do wile line is not blank
>               save line to array or other object
>               if line is the title
>                  GetBookTitle = line read
>               end if
>               read a line from File2
>             loop
>       End Function
>
> Hopefully this will clear up the log jam.
>
> BTW: if you are searching for items from the first set in the second set,
> why not just load both of them into database tables and then do a simple
> join?  It's a LOT more efficient and would go much quicker!
>
> You mention also that these files can get big.  This algorithm, while it
> will solve the immediate problem, will present you with a new one: it is not
> efficient at all.  You will reread the entire large file 2 for each record
> in File 2.  I don't know how many records are in File 1 or File 2, but if
> you are afraid of running out of memory, I'd mention that this method will
> have you running out of processor time and will put an amazing strain on the
> garbage collector!
>
> If you must do this kind of processing, is there any way you can presort
> both files?  If so, you won't have to read each file from the beginning.
> You can read from file1 until you pass where the first record would occur
> alphabetically. Then, if you don't fine record 1, you move forward to
> record2 and read forward from there.  each file is read once.   This is
> considerably more efficient.  (It's how we used to do this in the old days,
> where every cycle on a computer was counted and charged back to the person
> running the job.
>
> The best approach is still to load both table in a db and do a join.
>
> HTH,
> --
> --- Nick Malik [Microsoft]
>     MCSD, CFPS, Certified Scrummaster
>     http://blogs.msdn.com/nickmalik
>
> Disclaimer: Opinions expressed in this forum are my own, and not
> representative of my employer.
>    I do not answer questions on behalf of my employer.  I'm just a
> programmer helping programmers.
> --
> "GeorgeAtkins" <GeorgeAtk***@discussions.microsoft.com> wrote in message
> news:83A27617-C674-4F56-BF82-B0C6D90788F4@microsoft.com...
> > Using VB.NET in VS 2003.
> > This should be a simple routine, but it has me flummoxed.
> > I have to compare strings in two text files:
> > FILE1(srar) consists of lines of book titles.
> > FILE (srac) consists of multi-line "book records", separated by blank
> > rows.
> > The number of rows in each "book record" varies, as in this diagram:
> >
> > 1xxxxxx
> > xxxxxx
> > TitleRow
> > xxxxxx
> >
> > 2xxxxxx
> > TitleRow
> > xxxxxx
> > xxxxxx
> > xxxxxx
> >
> >
> > All I'm trying to do is to first read a title in File1, then read through
> > the entire File2 file, one "book record" at a time, looking for a matching
> > title. After other processing (not relevant here) and all book records
> > have
> > been searched, go to the next File1 title and read all the book records in
> > File2 again. In short:  typical looping within a loop.
> >
> > The problem is that I cannot get it to work correctly! I know the issue
> > involves the blank rows and how the streamreader works. Here are core
> > excerpts of my code. Any enlightenment is greatly appreciated!
> >
> > Dim RowCntr, ARBldr, ARCntr, MARCRowCntr, z, StartPos, TabPos As Long
> > Dim C1, C2, BkTitle, ArTitle, Arline, Acline, ArWtr, AcWtr, arbk As String
> > Dim srAR As System.IO.StreamReader = New StreamReader(FILE1.txt,
> > Encoding.GetEncoding(1252))
> > Dim srAC As System.IO.StreamReader = New StreamReader(FILE2.txt,
> > Encoding.GetEncoding(1252))
> >
> > ' Loop through the AR file (FILE1), reading each book title
> >  For ARCntr = 0 To RdrArray.GetUpperBound(1)
> >      MarcCntr = 1 ' initialize marc record counter variable
> >    ' Read through FILE2, one "book record" at a time.
> >      Do
> >         RowCntr = 1 ' reset  variable for next book record row counter
> >       ' Now, the Inner loop supposed to read all lines for a single book
> > record.
> >         Do
> >           Acline = srAC.ReadLine
> >           LibList.Add(Acline) ' new field row
> >           If Acline="" Then ' found the blank row
> >              LibList.Add(vbCr) ' new field row
> >              RowCntr += 1  ' set value of RowCntr
> >           End If
> >        Loop Until srAC.Peek = -1
> >      Loop Until srAC.Peek = -1 ' of first DO. Get another book record
> >  Next ARCntr  ' of the original FOR loop. Get another book title
> >
> > I hope this is enough information to work with. I'll reply with more info
> > if
> > necessary.
> > Thanks again for any help!
> >
> > George
>
>
>
Author
28 Mar 2005 12:50 AM
Nick Malik [Microsoft]
Hello George,

You still get some value out of reading the data from both files into
tables.
Look at it this way: after all of your normalizing and tweaking, you still
have to compare the value of the book title from one table to the title of
another.
So, when you are loading your data into the tables in SQL, create a new
column with a normalized title, just for comparison.

You can do this for both tables.  You can even create "many" normalized
possibilities for a single title by creating a detailed table.

Then the join is easy.

Either way, you have a hard problem.  Good luck.

--
--- Nick Malik [Microsoft]
    MCSD, CFPS, Certified Scrummaster
    http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
   I do not answer questions on behalf of my employer.  I'm just a
programmer helping programmers.
--
Show quoteHide quote
"GeorgeAtkins" <GeorgeAtk***@discussions.microsoft.com> wrote in message
news:75087A26-CDA2-40FE-AD3F-021C77DE2400@microsoft.com...
>A great reply, Nick. Thanks! You have some good ideas to consider.
>
> However, let me try to clarify a few things. Sorry I did not mention all
> of
> this earlier, but I didn't want me message to go on and on, and I thought
> that my problem was simply one of finding the right syntax for the code.
>
> Everything you say about using a DB or sorting data makes sense; however,
> I
> have to deal with a big restraint: The file I am reading (so-called FILE2)
> is
> composed of what are called MARC records (I referred to them earlier as
> "book
> records" that are composed of a variable number of rows).
>
> Each MARC record must be kept (and at least returned to) its original
> state.
> The reason is that I have to take the updated MARC records and import them
> back into their library database program, with my updates. (And the
> library
> program is a proprietary, closed system, alas.)
>
> Thus, it seems to me that extracting and sorting out the titles will still
> force me to keep track of their relationship back to the MARC records in
> order that I do my other processing (which is to edit a specific "field"
> value in the MARC record of a matched book title).
>
> As for creating joins on titles, that is a good idea, however, here is the
> rub:
> The book titles in file1 may not specifically match the titles in File2
> (containing the MARC records). There can be variations based on data entry
> errors, abbreviations, use of sub-titles, etc. What I've done is to
> "normalize" titles on both sides by stripping out all spaces, same-casing
> all
> text, substituting "foreign characters" for normal alphabetic ones (a for
> á,
> etc.), and then doing substring matches.
>
> I could read all of the data into database tables, but I'm not sure what
> it
> buys me in the end, given the data "normalizing", pattern matching and
> importing back into the library system that has to be done, in any event.
> But, I'll take a look at it. Perhaps I'm just dense or too close to the
> project.
>
> Thanks again for the advise!
>
> George Atkins
>
>
Author
28 Mar 2005 2:11 AM
GeorgeAtkins
Well, you make a compelling argument, Nick. Looks like I have a new approach
to try out. Thanks for the insights and methodology!

George

Show quoteHide quote
"Nick Malik [Microsoft]" wrote:

> Hello George,
>
> You still get some value out of reading the data from both files into
> tables.
> Look at it this way: after all of your normalizing and tweaking, you still
> have to compare the value of the book title from one table to the title of
> another.
> So, when you are loading your data into the tables in SQL, create a new
> column with a normalized title, just for comparison.
>
> You can do this for both tables.  You can even create "many" normalized
> possibilities for a single title by creating a detailed table.
>
> Then the join is easy.
>
> Either way, you have a hard problem.  Good luck.
>
> --
> --- Nick Malik [Microsoft]
>     MCSD, CFPS, Certified Scrummaster
>     http://blogs.msdn.com/nickmalik
>
> Disclaimer: Opinions expressed in this forum are my own, and not
> representative of my employer.
>    I do not answer questions on behalf of my employer.  I'm just a
> programmer helping programmers.
> --
> "GeorgeAtkins" <GeorgeAtk***@discussions.microsoft.com> wrote in message
> news:75087A26-CDA2-40FE-AD3F-021C77DE2400@microsoft.com...
> >A great reply, Nick. Thanks! You have some good ideas to consider.
> >
> > However, let me try to clarify a few things. Sorry I did not mention all
> > of
> > this earlier, but I didn't want me message to go on and on, and I thought
> > that my problem was simply one of finding the right syntax for the code.
> >
> > Everything you say about using a DB or sorting data makes sense; however,
> > I
> > have to deal with a big restraint: The file I am reading (so-called FILE2)
> > is
> > composed of what are called MARC records (I referred to them earlier as
> > "book
> > records" that are composed of a variable number of rows).
> >
> > Each MARC record must be kept (and at least returned to) its original
> > state.
> > The reason is that I have to take the updated MARC records and import them
> > back into their library database program, with my updates. (And the
> > library
> > program is a proprietary, closed system, alas.)
> >
> > Thus, it seems to me that extracting and sorting out the titles will still
> > force me to keep track of their relationship back to the MARC records in
> > order that I do my other processing (which is to edit a specific "field"
> > value in the MARC record of a matched book title).
> >
> > As for creating joins on titles, that is a good idea, however, here is the
> > rub:
> > The book titles in file1 may not specifically match the titles in File2
> > (containing the MARC records). There can be variations based on data entry
> > errors, abbreviations, use of sub-titles, etc. What I've done is to
> > "normalize" titles on both sides by stripping out all spaces, same-casing
> > all
> > text, substituting "foreign characters" for normal alphabetic ones (a for
> > á,
> > etc.), and then doing substring matches.
> >
> > I could read all of the data into database tables, but I'm not sure what
> > it
> > buys me in the end, given the data "normalizing", pattern matching and
> > importing back into the library system that has to be done, in any event.
> > But, I'll take a look at it. Perhaps I'm just dense or too close to the
> > project.
> >
> > Thanks again for the advise!
> >
> > George Atkins
> >
> >
>
>
>