Home All Groups Group Topic Archive Search About
Author
21 Nov 2007 3:13 PM
mcnews
anybody willing to share a name parsing routine?
tia,
mcnewsxp

Author
21 Nov 2007 4:30 PM
Herfried K. Wagner [MVP]
"mcnews" <mcour***@mindspring.com> schrieb:
> anybody willing to share a name parsing routine?

I suggest to describe in more detail what exactly you want to archieve.

--
M S   Herfried K. Wagner
M V P  <URL:http://dotnet.mvps.org/>
V B   <URL:http://dotnet.mvps.org/dotnet/faqs/>
Author
21 Nov 2007 4:53 PM
mcnews
On Nov 21, 11:30 am, "Herfried K. Wagner [MVP]" <hirf-spam-me-
h***@gmx.at> wrote:
> "mcnews" <mcour***@mindspring.com> schrieb:
>
> > anybody willing to share a name parsing routine?
>
> I suggest to describe in more detail what exactly you want to archieve.
>

parse names
Author
21 Nov 2007 5:09 PM
Phillip Taylor
On Nov 21, 4:53 pm, mcnews <mcour***@mindspring.com> wrote:
> On Nov 21, 11:30 am, "Herfried K. Wagner [MVP]" <hirf-spam-me-
>
> h***@gmx.at> wrote:
> > "mcnews" <mcour***@mindspring.com> schrieb:
>
> > > anybody willing to share a name parsing routine?
>
> > I suggest to describe in more detail what exactly you want to archieve.
>
> parse names

As in turn johnsmith into "John Smith" or into "John" and "Smith"
seperately?

What exactly is the input? Firstname and lastname in a list like this:

johnsmith
johndoe
ericsmith
jackjackson

or just a long string like this
"johnjacksonericsmithjohndoephilliprosstaylor" and what would be the
correct output exactly? Are there any deliminators, do some records
have middle names as well? Is for parsing names from a document like
"dear john b. smith" and you want just "john" or do you want "john b".
do you want the full stop like this "John B.". Is it for the purpose
of Case Correction like this:

i heard that john smith isn't very well

into

i heard that John Smith isn't very well

--

The way you right "parse names" as if it's so obvious makes you look
obnoxious.

Phill
Author
21 Nov 2007 5:37 PM
mcnews
On Nov 21, 12:09 pm, Phillip Taylor <Phillip.Ross.Tay***@gmail.com>
wrote:
Show quoteHide quote
> On Nov 21, 4:53 pm, mcnews <mcour***@mindspring.com> wrote:
>
> > On Nov 21, 11:30 am, "Herfried K. Wagner [MVP]" <hirf-spam-me-
>
> > h***@gmx.at> wrote:
> > > "mcnews" <mcour***@mindspring.com> schrieb:
>
> > > > anybody willing to share a name parsing routine?
>
> > > I suggest to describe in more detail what exactly you want to archieve.
>
> > parse names
>
> As in turn johnsmith into "John Smith" or into "John" and "Smith"
> seperately?
>
> What exactly is the input? Firstname and lastname in a list like this:
>
> johnsmith
> johndoe
> ericsmith
> jackjackson
>
> or just a long string like this
> "johnjacksonericsmithjohndoephilliprosstaylor" and what would be the
> correct output exactly? Are there any deliminators, do some records
> have middle names as well? Is for parsing names from a document like
> "dear john b. smith" and you want just "john" or do you want "john b".
> do you want the full stop like this "John B.". Is it for the purpose
> of Case Correction like this:
>
> i heard that john smith isn't very well
>
> into
>
> i heard that John Smith isn't very well
>
> --
>
> The way you right "parse names" as if it's so obvious makes you look
> obnoxious.
>
i know.
i am obnoxious.

anyway, then ames will always be lastname, firstname, middle init (no
period).
unless they don't have a mddle init.
i am not sure about titles such as dr. or esq. just yet.
Author
21 Nov 2007 5:03 PM
mcnews
On Nov 21, 11:30 am, "Herfried K. Wagner [MVP]" <hirf-spam-me-
h***@gmx.at> wrote:
> "mcnews" <mcour***@mindspring.com> schrieb:
>
> > anybody willing to share a name parsing routine?
>
> I suggest to describe in more detail what exactly you want to archieve.
>
Input                 Output      Output  Output  Output     Output
-------------------   ----------  ------  ------  ---------  ------
Unparsed name         Prefix      First   Middle  Last       Suffix
===================   ==========  ======  ======  =========  ======
Smith                                             Smith
Smith Sr.                                         Smith      Sr
Mrs. Smith            Mrs                         Smith
Rev. Smith Jr.        Rev                         Smith      Jr
Mr. and Mrs. E.Jones  Mr and Mrs  E               Jones
Mr. & Mrs. Bix, CPA   Mr & Mrs                    Bix        CPA
Wilson, Mr & Mrs Jim  Mr & Mrs    Jim             Wilson
J. J.Johnson V                    J       J       Johnson    V
Sir T. S. Eliot       Sir         T       S       Eliot
e e cummings, IV                  e       e       cummings   IV
ee cummings                       ee              cummings
Lt. Gen. C James Phd  Lt Gen      C               James      Phd
W.E.B. DuBois                     W       E B     DuBois
Du Pont, Jackie                   Jackie          Du Pont
Clyde Smith-Jones                 Clyde           Smith-Jones
Mike O'Donnell                    Mike            O'Donnell
O Donnell, Mike                   Mike            O Donnell
Jimmy Mac Donald                  Jimmy           Mac Donald
Mr. A. E. Von Sturm   Mr          A       E       Von Sturm
Ms. Beverly D'Angelo  Ms          Beverly         D'Angelo
Author
21 Nov 2007 5:16 PM
Phillip Taylor
On Nov 21, 5:03 pm, mcnews <mcour***@mindspring.com> wrote:
Show quoteHide quote
> On Nov 21, 11:30 am, "Herfried K. Wagner [MVP]" <hirf-spam-me-h...@gmx.at> wrote:
> > "mcnews" <mcour***@mindspring.com> schrieb:
>
> > > anybody willing to share a name parsing routine?
>
> > I suggest to describe in more detail what exactly you want to archieve.
>
>  Input                 Output      Output  Output  Output     Output
> -------------------   ----------  ------  ------  ---------  ------
> Unparsed name         Prefix      First   Middle  Last       Suffix
> ===================   ==========  ======  ======  =========  ======
> Smith                                             Smith
> Smith Sr.                                         Smith      Sr
> Mrs. Smith            Mrs                         Smith
> Rev. Smith Jr.        Rev                         Smith      Jr
> Mr. and Mrs. E.Jones  Mr and Mrs  E               Jones
> Mr. & Mrs. Bix, CPA   Mr & Mrs                    Bix        CPA
> Wilson, Mr & Mrs Jim  Mr & Mrs    Jim             Wilson
> J. J.Johnson V                    J       J       Johnson    V
> Sir T. S. Eliot       Sir         T       S       Eliot
> e e cummings, IV                  e       e       cummings   IV
> ee cummings                       ee              cummings
> Lt. Gen. C James Phd  Lt Gen      C               James      Phd
> W.E.B. DuBois                     W       E B     DuBois
> Du Pont, Jackie                   Jackie          Du Pont
> Clyde Smith-Jones                 Clyde           Smith-Jones
> Mike O'Donnell                    Mike            O'Donnell
> O Donnell, Mike                   Mike            O Donnell
> Jimmy Mac Donald                  Jimmy           Mac Donald
> Mr. A. E. Von Sturm   Mr          A       E       Von Sturm
> Ms. Beverly D'Angelo  Ms          Beverly         D'Angelo

There's a program called "MatchIT" which has a built in names database
as well as fuzzy matching for increased accuracy. Fuzzy matching is
generally considered better than standard logical for this job so if
anyone suggested any algorithm it would probably be a fuzzy language
(so not VB).

http://www.printsoft.co.uk/web/products_matchit.htm

It has an API accessable to VB.NET that you can use if want to use the
solution or, if it's a one off cleaning you need you might find it
easier to just forward your records to a data cleaning company like
CCR Data (http://www.ccr.co.uk/) who can do a one off clean using
these types of tools for a one off charge. Just email them a sample of
the records and the total number and they'll email back and give you a
cost.

The fuzzy algorithms are largely available in the public domain
although implementing them and tweaking them isn't exactly an easy
task.

Phill