|
web
newsgroups
|
|||||||||||||||||||||||
|
|||||||||||||||||||||||
name parser"mcnews" <mcour***@mindspring.com> schrieb: I suggest to describe in more detail what exactly you want to archieve.> anybody willing to share a name parsing routine? -- M S Herfried K. Wagner M V P <URL:http://dotnet.mvps.org/> V B <URL:http://dotnet.mvps.org/dotnet/faqs/> On Nov 21, 11:30 am, "Herfried K. Wagner [MVP]" <hirf-spam-me-
h***@gmx.at> wrote: > "mcnews" <mcour***@mindspring.com> schrieb: parse names> > > anybody willing to share a name parsing routine? > > I suggest to describe in more detail what exactly you want to archieve. > On Nov 21, 4:53 pm, mcnews <mcour***@mindspring.com> wrote:
> On Nov 21, 11:30 am, "Herfried K. Wagner [MVP]" <hirf-spam-me- As in turn johnsmith into "John Smith" or into "John" and "Smith"> > h***@gmx.at> wrote: > > "mcnews" <mcour***@mindspring.com> schrieb: > > > > anybody willing to share a name parsing routine? > > > I suggest to describe in more detail what exactly you want to archieve. > > parse names seperately? What exactly is the input? Firstname and lastname in a list like this: johnsmith johndoe ericsmith jackjackson or just a long string like this "johnjacksonericsmithjohndoephilliprosstaylor" and what would be the correct output exactly? Are there any deliminators, do some records have middle names as well? Is for parsing names from a document like "dear john b. smith" and you want just "john" or do you want "john b". do you want the full stop like this "John B.". Is it for the purpose of Case Correction like this: i heard that john smith isn't very well into i heard that John Smith isn't very well -- The way you right "parse names" as if it's so obvious makes you look obnoxious. Phill On Nov 21, 12:09 pm, Phillip Taylor <Phillip.Ross.Tay***@gmail.com>
wrote: Show quoteHide quote > On Nov 21, 4:53 pm, mcnews <mcour***@mindspring.com> wrote: i am obnoxious.> > > On Nov 21, 11:30 am, "Herfried K. Wagner [MVP]" <hirf-spam-me- > > > h***@gmx.at> wrote: > > > "mcnews" <mcour***@mindspring.com> schrieb: > > > > > anybody willing to share a name parsing routine? > > > > I suggest to describe in more detail what exactly you want to archieve. > > > parse names > > As in turn johnsmith into "John Smith" or into "John" and "Smith" > seperately? > > What exactly is the input? Firstname and lastname in a list like this: > > johnsmith > johndoe > ericsmith > jackjackson > > or just a long string like this > "johnjacksonericsmithjohndoephilliprosstaylor" and what would be the > correct output exactly? Are there any deliminators, do some records > have middle names as well? Is for parsing names from a document like > "dear john b. smith" and you want just "john" or do you want "john b". > do you want the full stop like this "John B.". Is it for the purpose > of Case Correction like this: > > i heard that john smith isn't very well > > into > > i heard that John Smith isn't very well > > -- > > The way you right "parse names" as if it's so obvious makes you look > obnoxious. > i know. anyway, then ames will always be lastname, firstname, middle init (no period). unless they don't have a mddle init. i am not sure about titles such as dr. or esq. just yet. On Nov 21, 11:30 am, "Herfried K. Wagner [MVP]" <hirf-spam-me-
h***@gmx.at> wrote: > "mcnews" <mcour***@mindspring.com> schrieb: Input Output Output Output Output Output> > > anybody willing to share a name parsing routine? > > I suggest to describe in more detail what exactly you want to archieve. > ------------------- ---------- ------ ------ --------- ------ Unparsed name Prefix First Middle Last Suffix =================== ========== ====== ====== ========= ====== Smith Smith Smith Sr. Smith Sr Mrs. Smith Mrs Smith Rev. Smith Jr. Rev Smith Jr Mr. and Mrs. E.Jones Mr and Mrs E Jones Mr. & Mrs. Bix, CPA Mr & Mrs Bix CPA Wilson, Mr & Mrs Jim Mr & Mrs Jim Wilson J. J.Johnson V J J Johnson V Sir T. S. Eliot Sir T S Eliot e e cummings, IV e e cummings IV ee cummings ee cummings Lt. Gen. C James Phd Lt Gen C James Phd W.E.B. DuBois W E B DuBois Du Pont, Jackie Jackie Du Pont Clyde Smith-Jones Clyde Smith-Jones Mike O'Donnell Mike O'Donnell O Donnell, Mike Mike O Donnell Jimmy Mac Donald Jimmy Mac Donald Mr. A. E. Von Sturm Mr A E Von Sturm Ms. Beverly D'Angelo Ms Beverly D'Angelo On Nov 21, 5:03 pm, mcnews <mcour***@mindspring.com> wrote:
Show quoteHide quote > On Nov 21, 11:30 am, "Herfried K. Wagner [MVP]" <hirf-spam-me-h...@gmx.at> wrote: There's a program called "MatchIT" which has a built in names database> > "mcnews" <mcour***@mindspring.com> schrieb: > > > > anybody willing to share a name parsing routine? > > > I suggest to describe in more detail what exactly you want to archieve. > > Input Output Output Output Output Output > ------------------- ---------- ------ ------ --------- ------ > Unparsed name Prefix First Middle Last Suffix > =================== ========== ====== ====== ========= ====== > Smith Smith > Smith Sr. Smith Sr > Mrs. Smith Mrs Smith > Rev. Smith Jr. Rev Smith Jr > Mr. and Mrs. E.Jones Mr and Mrs E Jones > Mr. & Mrs. Bix, CPA Mr & Mrs Bix CPA > Wilson, Mr & Mrs Jim Mr & Mrs Jim Wilson > J. J.Johnson V J J Johnson V > Sir T. S. Eliot Sir T S Eliot > e e cummings, IV e e cummings IV > ee cummings ee cummings > Lt. Gen. C James Phd Lt Gen C James Phd > W.E.B. DuBois W E B DuBois > Du Pont, Jackie Jackie Du Pont > Clyde Smith-Jones Clyde Smith-Jones > Mike O'Donnell Mike O'Donnell > O Donnell, Mike Mike O Donnell > Jimmy Mac Donald Jimmy Mac Donald > Mr. A. E. Von Sturm Mr A E Von Sturm > Ms. Beverly D'Angelo Ms Beverly D'Angelo as well as fuzzy matching for increased accuracy. Fuzzy matching is generally considered better than standard logical for this job so if anyone suggested any algorithm it would probably be a fuzzy language (so not VB). http://www.printsoft.co.uk/web/products_matchit.htm It has an API accessable to VB.NET that you can use if want to use the solution or, if it's a one off cleaning you need you might find it easier to just forward your records to a data cleaning company like CCR Data (http://www.ccr.co.uk/) who can do a one off clean using these types of tools for a one off charge. Just email them a sample of the records and the total number and they'll email back and give you a cost. The fuzzy algorithms are largely available in the public domain although implementing them and tweaking them isn't exactly an easy task. Phill |
|||||||||||||||||||||||