Note:
This thread on unique (primary) keys is from the Clipper echo of FidoNet:



 From:  ED DAVIS                  Sent: 06-10-92 15:59
   To:  PAUL WILFORD              Rcvd: -NO-
   Re:  HOW TO MAKE A UNIQUE ID

 PW> I often come up against a situation where a database contains
people's
 PW> names, birth dates, sex, and address.  For relating to other
datasets
 PW> (and other purposes), I want to use this information to create a
unique
 PW> identifier for the person using this information.  Note that it's
.....
 PW> Solutions like making an id of something like initials, birthdate,
sex,
 PW> and a sequential sequence number are the best I've come up with,
but

You can't use birthdate, at least not by itself, because of the high
probability of having 2 or more people with the same DOB.  Combining it
with sex, initials, etc is just plain too cumbersome.  More than
likely, your data sets already have SSN's - so use them!  They are
_supposed_ to be unique among LIVING individuals.  You also mention
that you most often need this unique id in hospital/patient datasets.
Here again, you probably already have unique data available to you in
the form of insurance policy numbers....but, admittedly, if the patient
changes insurance companies......

One option I have employed is the use of the full telephone number
(including area code) as a unique identifier.  The limitation here is
that is only useful for tracking households, not individuals.  Two
patients from the same family would blow you away.  But it may prove a
good starting point.

You should also be aware that people with different names could end up
with the same initials, so this, too, is not good data to work with.
Sorry I couldn't be more help.



 From:  ROBERT KELLEY             Sent: 06-10-92 15:26
   To:  PAUL WILFORD              Rcvd: -NO-
   Re:  HOW TO MAKE A UNIQUE ID F

Since your usual application is healthcare related, that means you will
likely be able to get access to the people's Social Security Number.
Although there ARE rare cases of duplicate SSN's, they work well as a
key for most applications, with a very high accuracy rate.

Insofar as fuzzy searching, I usually find it more useful to maintain
secondary indexes on fields like last name, telephone, zip code, etc.
Then use soft seek logic to get close or perform incremental searches.

regards,

    robert
===
progress?

 From:  TOM HELD                  Sent: 06-10-92 20:19
   To:  PAUL WILFORD              Rcvd: -NO-
   Re:  HOW TO MAKE A UNIQUE ID F

Paul,

I use a fairly simple method.  I use the first four letters of the last
name and the last four digits of the social security number to make an
8 character field.  It positively and economically identifies an
individual for use in relational databases.  If the last name is
shorter than four letters, append it with underline characters (_) or
                   dashes (-).

If the person's last name is "Duh", and the SSN ends in "1234", the
field would contain "DUH_1234".

Fewer characters would work nearly as well, but this length has the
advantage that the operator can usually tell at a glance to whom the
record belongs.

This system is handy in the military.  A person's SSN is considered
FOUO (for official use only), and cannot legally be stored on an
unclassified fixed hard disk - but the name and last four can!  A
related database on a FOUO floppy containing only sensitive information
may be inserted in the floppy drive during processing, as required, but
locked in a file cabinet at other times.

Precise calculations reveal that a duplication of this ID code between
2 random individuals will not occur within 6.125 jillion recyclations,
which takes us exactly 2.3 revelations beyond the second coming of Star
Trek, the 5th Generation - so replication, duplication or
misidentification is not of much concern to us right now, in this
dimension.
Tom


 From:  TERRY CARMEN              Sent: 06-10-92 22:46
   To:  PAUL WILFORD              Rcvd: -NO-
   Re:  HOW TO MAKE A UNIQUE ID F

As long as you have the person's data, you can add a unique identifier
to ithe record, such as a sequentially generated number which is
guaranteed to be unique.

The trick is to identify the patient on the next visit so you don't end
up with multiple records. The SSN is handy, but babies and illegal
immigrants don't ususally have one.

Terry

 From:  MIKE NELSON               Sent: 06-15-92 08:49
   To:  ED DAVIS                  Rcvd: -NO-
   Re:  (R)HOW TO MAKE A UNIQUE I

In a message to George Jereza <06-14-92 10:32> Ed Davis wrote:

ED>  But apply for a credit card - and you have to supply a SSN.  Why?
ED>  Interest is no longer tax deductable.

That was never the reason for wanting the ID.  If your name were John
Smith, you would really appreciate the value of having your SSN
associated with your account.  I used to work in credit and having the
SSN was a VERY valuable way to determine which accounts belonged to
whom.  You see, credit report bureaus do a 'close match' on credit
information... The SSN is a VERY valuable tool to determine which
credit file belongs to whom.   We used to have a mother and daughter
who had the same name (including middle) and lived at the same address.
 The mother's credit history was golden... the daughter's
rotten-to-the-core.  So the daughter would try to use the mother's
account.  SSN and date-of-birth were the only way to keep these things
separate.

It also comes in very handy when you have a Jr or Sr in the same
household. It often happens that the father opens an account before jr
is born so the generation indicator never gets attached to the file...
The SSN comes in VERY handy.

A few years ago, a lot of negative data showed up in my credit
profile... Providing the SSAN on my ap as well as the dispute saved my
bacon...


 From:  ROBERT KELLEY             Sent: 06-15-92 13:58
   To:  MIKE NELSON               Rcvd: -NO-
   Re:  (R)HOW TO MAKE A UNIQUE

    On <sunday, june 14, 1992>, <Mike Nelson> wrote to <Robert Kelley>,
    regarding "RE: HOW TO MAKE A UNIQUE":

  >RK>  ... access to the people's Social Security Number.
  >  >Although there ARE rare cases of duplicate SSN's,

MN>Well, that is fine, assuming that you indeed have the right SSAN.  I
do
  >the payroll for a client and recent discovered to our mutual
chagrin
  >that the wrong SSAN was entered in the original payroll program.

That is absolutely true.  If ANY key is dependant upon actual data, the
risk is run that the application will have to provide the ability to
correct keys across all applicable files.  Can be a PITA.

MN>I personally recommend using a key that is unique and generated by
the
  >database program

Yep, I agree.  That was the second method I mentioned as well. There
are many advantages ... and some disadvantages.

regards,

    robert
===



 From:  ANGUS SCOTT-FLEMING       Sent: 06-15-92 06:15
   To:  GEORGE JEREZA @ 914/20    Rcvd: 06-19-92 20:07
   Re:  (R)HOW TO MAKE A UNIQUE

   On 06-12-92, George Jereza @ 914/201 wrote to Ed Davis:

GJ > Hope you don't mind my dropping in?  Quick question.  I thought it
was
GJ > illegal to ask for social security numbers?  ... or... is that
just in
GJ > California?  Thanks in advance.

No, SSNs are limitted in where they can be used.  Credit apps, banks,
and
jobs can require them.  Medical offices, schools, driver's license
bureaus, etc. can't.  Lots of grey areas.  If you're in doubt, push
back
and refuse to give it to 'em.

 From:  ROBERT KELLEY             Sent: 06-15-92 18:10
   To:  ED DAVIS                  Rcvd: -NO-
   Re:  HOW TO MAKE A UNIQUE ID

    On <monday, june 15, 1992>, <Ed Davis> wrote to <Robert Kelley>,
    regarding "HOW TO MAKE A UNIQUE ID":

ED> RK> Database theorists would say that once a unique key is
  > RK> derived for a record, you NEVER change the key.  There
  > RK> is also a lot to be said for a numerical "serial
  > RK> number" which is not derived at ALL from underlying
  > RK> data, if the SSN is not available.

ED>Correct - but this too has its' share of problems - mainly that the
key
  >cannot be reproduced, if need be, if it isn't based on the record
  >itself. And, assigning a unique id in this manner is ok if you're
  >building the application and its maintenance routines from scratch.

That is usually the "cleaner" approach. <g>.  Yes I agree that it
creates its' own share of problems, mainly that one cannot easily do
ad-hoc browse/searches based on the key.  So secondary indexes become
critical to support any ad-hoc user searches.

ED>Trying to relate multiple old datasets, without the "unique keys"
is
  >next to impossible.

Using ANY method of generating keys, the above is next to impossible
<g>.

ED>I think everyone is agreed that using SSN's is the best solution -
but
  >what if the person doesn't have a SSN?

How about combining the two approaches.  Maybe a concatenated key
consisting of two fields.  The first field specifies whether the key a
derived from the SSN or is a computer-generated serial number.  Then
use a SSN if you have it in the second field, or generate a unique
serialized number if you don't?

regards,

    robert
===




 From:  MIKE NELSON               Sent: 06-16-92 16:08
   To:  ANGUS SCOTT-FLEMING       Rcvd: -NO-
   Re:  (R)HOW TO MAKE A UNIQUE

In a message to George Jereza <06-15-92 06:15> Angus Scott-Fleming
wrote:

AS>  GJ > illegal to ask for social security numbers?  ... or... is
that
AS>  just in
AS>  GJ > California?  Thanks in advance.
AS>

Au contraire on the driver's licenses... At least one state uses the
SSAN with an alpha character in front of it as the driver's license
number.

Credit grantors may ASK for the SSAN but you are under NO obligation to
provide it.

By the same token, if you are writing a database program that stores
SSANs, be prepared to change the field... The Soc. Sec. Admin. is
running out of numbers and they have declared that they will NOT reuse
the number of anyone who is deceased.  As far as I know, they have not
decided whether to add another digit or whether to insert alphas.



 From:  DAVID SOLLY               Sent: 06-15-92 22:51
   To:  PAUL WILFORD              Rcvd: -NO-
   Re:  HOW TO MAKE A UNIQUE ID

Paul...

  You might like to take a look at the Cutter-Sandbourn tables which
many
libraries use to convert author's names into a single initial plus a
number.
The system is flexible enough that you can add subtract numbers as you
need
them.

A typical C-S number looks something like this

Solly =   S6559
Wilford = W554673

I have a program which will encode a string up to 15 characters long
using the
Library of Congress variant of the Cutter-Sandbourn tables.  I could
pass it
on to you if you are interested.

If you just need a sequence of numbers which are unique but have
nothing to do
with the patient's name, dob or whatever, PW World published a program
which
creates a unique series of "filenames" derived from the date and time
setting
of the computer when run.

A typical example looks like

C6FA0800
C6FA0801
C6FA0802
C6FA0803 etc.

I have been using this system to solve a problem similar to yours.
Each
morning I print off a sheet of number and use them one-by-one until
done and
print off a second sheet if I need more.  I print a new sheet of
numbers every
morning even if I have not used up the numbers on yesterday's sheet.


Again, I can send you the program if you are interested.

Hope this helps

David Solly


... OFFLINE 1.22 * Bibliotheca Sagittarii, Ottawa, Canada



 From:  BRIAN ASTILL              Sent: 06-11-92 23:50
   To:  PAUL WILFORD              Rcvd: -NO-
   Re:  HOW TO MAKE A UNIQUE ID F

In a message of <Jun 08> Paul Wilford wrote to All:

 PW> (My usual application is a hospital clinic where patients make a
 PW> number of visits and datasets like Patient Information are related
 PW> to datasets like Clinic Results).

First question.  Have you thought of using the Social Security Number?
If you _can_ use it, you could precede it with the first few letters of
the person's last name, for indexing purposes (easier for us humans to
find the file for WILF123-456-789 than look for 123-456-789), and to
allow a quick search for 'WILF'.

Boondoggle BiBi,
-=>Brian<=-




 From:  ARTHUR MARSH              Sent: 06-15-92 20:40
   To:  TOM HELD                  Rcvd: -NO-
   Re:  HOW TO MAKE A UNIQUE ID F


 TH> Precise calculations reveal that a duplication of this ID code
between
 TH> 2 random individuals will not occur within 6.125 jillion
recyclations,
 TH> which takes us exactly 2.3 revelations beyond the second coming of

 TH> Star Trek, the 5th Generation - so replication, duplication or
 TH> misidentification is not of much concern to us right now, in this
 TH> dimension.

Sorry, duplicate SSN's of living persons have been found. It was
reported some months back in Communications of the ACM.



 From:  ANGUS SCOTT-FLEMING       Sent: 06-17-92 06:17
   To:  MIKE NELSON               Rcvd: -NO-
   Re:  (R)HOW TO MAKE A UNIQUE

   On 06-16-92, Mike Nelson wrote to Angus Scott-Fleming:

MN > In a message to George Jereza <06-15-92 06:15> Angus Scott-Fleming
wrote:

ASF>  GJ > illegal to ask for social security numbers?  ... or... is
that
ASF>  just in
ASF>  GJ > California?  Thanks in advance.
ASF>
ASF>  No, SSNs are limitted in where they can be used.  Credit apps,
banks,
ASF>  and
ASF>  jobs can require them.  Medical offices, schools, driver's
license
ASF>  bureaus, etc. can't.  Lots of grey areas.  If you're in doubt,
push

MN > Au contraire on the driver's licenses... At least one state
MN > uses the SSAN with an alpha character in front of it as
MN > the driver's license number.

Just because they do it that way doesn't make it legal.

MN > By the same token, if you are writing a database program that
stores
MN > SSANs, be prepared to change the field... The Soc. Sec. Admin. is
MN > running out of numbers and they have declared that they
MN > will NOT reuse the number of anyone who is deceased.  As
MN > far as I know, they have not decided whether to add
MN > another digit or whether to insert alphas.

Great.  Another one bites the dust.  Like the IRS, they'll probably
spring
this on us _after_ we've had to make adjustments for it.  And they'll
make
it retroactive, assigning a new number consisting of your old number
plus
the third letter of your maternal grandfather's cat's third kitten's
first
name.  And they'll use this as an excuse to lose everyone's "benefits"
from this gigantic Ponzi scheme.  GRRR.  If only we could opt out of
the
SS system _now_ I'd forgo all the "benefits" I've save up - and I'd
like
to retire in less than 20 years.

Yours in freedom,
                                       -=< Angus S-F >=-

 From:  ANGUS SCOTT-FLEMING       Sent: 06-17-92 06:17
   To:  PAUL WILFORD              Rcvd: -NO-
   Re:  (R)HOW TO MAKE A UNIQUE

   On 06-16-92, Paul Wilford wrote to Bill Clark:

PW > What about sex?  At least people don't get that wrong a
PW > lot.  How have you used this?

No. But even _that_ isn't fixed <grin> any more (or is it called "being
fixed" when your sex is changed?  I'm sooo confused!).  And all
programmers _have to_ (well, should) take into account the exception
that
breaks the rule or their systems will burp, hiccup, and die.

Yours in freedom,
                                       -=< Angus S-F >=-

 From:  BILL DEVINNEY             Sent: 06-17-92 04:10
   To:  MIKE NELSON               Rcvd: -NO-
   Re:  (R)HOW TO MAKE A UNIQUE I

** here's what Mike Nelson said to Robert Kelley on 12 Jun 92  18:16:00
 >
 >I personally recommend using a key that is unique and generated
 >by the database program--that way you can change any data you
 >want in the database without worrying about 'losing' data...
We use SS numbers as id numbers.  We deal with aliens who may not have
a social security number, and, assign them a temporary number.  When
they get a ss number, however, it may not be picked up by the opeartor
and entered as a new record.  Even when the social security number is
substituted for a temporary id, we have two years of inactive records
which need to be changed as well.  The ss number as main key is part of
a dbms given to us by the state department of education.  Not my idea.
I'm going to change to a unique, computer generated number and convert
the file to the state's structure for annual reporting purposes.  I've
experienced this headache long enough, as have many of you.

Now that I've decided to do that, I guess I need to do the conversion
and test for possible duplicates.  Counting the records with unique on
and off would result in different results if there's a duplicate.  Am I
going in the right direction?

Regards

 From:  ED DAVIS                  Sent: 06-17-92 11:11
   To:  ROBERT KELLEY             Rcvd: -NO-
   Re:  HOW TO MAKE A UNIQUE ID

 RK> How about combining the two approaches.  Maybe a
 RK> concatenated key consisting of two fields.  The first
 RK> field specifies whether the key a derived from the SSN
 RK> or is a computer-generated serial number.  Then use a
 RK> SSN if you have it in the second field, or generate a
 RK> unique serialized number if you don't?

This is probably the best approach.  The more we discuss the problem of
unique ID's, the more I realize that any field we would base a key on
(name, address, phone number, telephone, etc.) are all fields
containing data that could (and most likely will) change, thereby
complicating our applications.  And trying to generate keys on portions
of these fields increases the possibility of duplication.  So, maybe a
unique serialized number, with no relation to the data, is the best
solution after all......


 From:  PAUL WILFORD              Sent: 06-20-92 11:39
   To:  BRIAN ASTILL              Rcvd: -NO-
   Re:  HOW TO MAKE A UNIQUE ID F

 BA> In a message of <Jun 08> Paul Wilford wrote to All:

 PW> (My usual application is a hospital clinic where patients make a
 PW> number of visits and datasets like Patient
 PW> Information are related
 PW> to datasets like Clinic Results).

 BA> First question.  Have you thought of using the Social Security
Number?
 BA> If you _can_ use it, you could precede it with the first few
letters
 BA> of the person's last name, for indexing purposes
 BA> (easier for us humans to find the file for WILF123-
 BA> 456-789 than look for 123-456-789), and to allow a
 BA> quick search for 'WILF'.

                    BA> Boondoggle BiBi,
-=>Brian<=-




Something like the SSN would be great, except that not everyone has one
-
so it's not reliable.  While it may work for some applications, I'd
like
to find a better - and more universal - method.  I wonder if some fancy
encoding method exists whereby a long string of birth date, names
and so on could be converted to a unique id number through some
formula.


 From:  PAUL WILFORD              Sent: 06-20-92 11:48
   To:  DAVID SOLLY               Rcvd: -NO-
   Re:  HOW TO MAKE A UNIQUE ID

 DS>
 DS>   You might like to take a look at the Cutter-
 DS> Sandbourn tables which many
 DS> libraries use to convert author's names into a
 DS> single initial plus a number.
 DS> The system is flexible enough that you can add
 DS> subtract numbers as you need
 DS> them.
 DS>
That's the sort of thing that I had been looking for.  In fact I went
to the Vancouver Public Library to see how librarians dealt with the
problem of duplicate names but only found rules about eliminating
duplicates by putting a date after the author's name - beyond which the
rule was that you live with any further duplications.  I'd like very
much
to see your program code - can you post it?
 *
Paul.


 From:  TOM HELD                  Sent: 06-21-92 21:16
   To:  ARTHUR MARSH              Rcvd: -NO-
   Re:  HOW TO MAKE A UNIQUE ID F

Arthur,

15 Jun 92, Arthur Marsh writes to Tom Held:

 TH>> Precise calculations reveal that a duplication of this ID code
 TH>> between 2 random individuals will not occur within 6.125 jillion
 TH>> recyclations, which takes us exactly 2.3 revelations beyond the
 TH>> second coming of Star Trek, the 5th Generation - so replication,
 TH>> duplication or misidentification is not of much concern to us
 TH>> right now, in this dimension.

 AM> Sorry, duplicate SSN's of living persons have been found. It was
 AM> reported some months back in Communications of the ACM.

I did not suggest the use of SSN by itself, but a combination of
surname and SSN.  Did any of the people with the duplicate SSNs have
the same last name as well?

Tom


 From:  ANGUS SCOTT-FLEMING       Sent: 06-22-92 06:22
   To:  MIKE NELSON               Rcvd: -NO-
   Re:  (R)HOW TO MAKE A UNIQUE

   On 06-21-92, Mike Nelson wrote to Angus Scott-Fleming:

MN > In a message to Jim Lynch <06-19-92 06:19> Angus Scott-Fleming
wrote:

ASF>  I thought this code fragment was germane to the discussion and
worth
ASF>  sharing.  I saw it in a page torn out of the May92 DBMS.  It is
used
ASF>  to
ASF>  generate unique IDs. I have taken the liberty of dHanging the
ASF>  variable
ASF>  names and translating the code slightly.  I believe the original
ASF>  article
ASF>  was a Fox article.
ASF>
                   ASF>    cLN     = SubStr(LASTNAME,1,3)
ASF>    cFN     = SubStr(FIRSTNAME,1,2)
ASF>    nSeq    = 0
ASF>    lUnique = .F.
ASF>    do while .not. lUnique:
ASF>      nSeq = nSeq + 1
ASF>      CUSTID = LN + FN + right("000"+trim(transform(nSeq,"999")),3)
ASF>      lUnique = .not. seek(CUSTID)
ASF>    enddo
ASF>
ASF>  Author (I only saw last page of article, it had been ripped out
of
ASF>  magazine to give me a copy of an ad on that page GRRR) claimed
that:
ASF>
ASF>    "...In a recently completed client application with more than
ASF>  300,000
ASF>  entries in the CUSTOMER table, the highest sequence number in the
ASF>  database
ASF>  was 12!

MN > That is fine as long as the person doesn't change their last name.

Well, you can still use the original index code if you've built it into
the system as a separate field, which is implied in the code fragment
above.  When you don't FIND someone who claims to be in the database,
you
just have to ask her if she's changed her <grin> surname recently.

MN > I'm setting up an accounts payable program which does a
MN > similar thing except that I use 3 letters of the company
MN > name... Which raises a problem...
MN >
MN > Would I use MON-001 for Montgomery Wards or WAR-001 or MWA-001?
MN > Would I use JCP-001 for J C Pennys       or PEN-001 or J_C-001?
MN >
MN > I've decided to arbitrarily use the 'surname' if the
MN > business contains first and last name (similar to what the
MN > phone book does).

I'd use JCP-001 in the example above, since people generally think of
them
as "J C Penneys" rather than PENNEYs.  I sure wouldn't use J_C out of
laziness as I see no reason to make a data-entry clerk use two keys
                   (Shift
is required to make the "_").  But your rule should work fine as long
as
people don't end up making two addresses for such companies.

Yours in freedom,
                                       -=< Angus S-F >=-

 * JABBER v1.1 #75 * Mind own business, always cut cards


 From:  ROBERT KELLEY             Sent: 06-22-92 12:17
   To:  ED DAVIS                  Rcvd: -NO-
   Re:  HOW TO MAKE A UNIQUE ID

    On <sunday, june 21, 1992>, <Ed Davis> wrote to <Robert Kelley>,
    regarding "HOW TO MAKE A UNIQUE ID":

ED>...  The more we discuss the problem of
  >unique ID's, the more I realize that any field we would base a key
on
  >(name, address, phone number, telephone, etc.) are all fields
containing
  >data that could (and most likely will) change,

I'd have to agree.

regards,

    robert

 From:  MIKE NELSON               Sent: 06-21-92 15:23
   To:  ED DAVIS                  Rcvd: -NO-
   Re:  (R)HOW TO MAKE A UNIQUE I

In a message to Mike Nelson <06-17-92 11:18> Ed Davis wrote:

ED>  So the SSN is used in just the manner that many of us have
proposed -
ED>  as a "near perfect" identifier.  I knew it was widely used, and
ED>  almost always requested on things involving any form of financing
or
ED>  tax reporting.

I *STILL* would not use it as a key for relating databases, though...

Let me throw additional reasons:

1)  It is not at all uncommon for a person to use their SPOUSE's SSAN
for id--especially in single-income families...

2)  The very REAL mistake of the data-entry clerk putting the WRONG
SSAN in.
One of the major problems with SSAN is that it is NOT check-digitable.

3)  If the person doesn't provide their SSAN and you generate a
pseudo-SSAN, there is the possiblity of creating one that is already in
use (or will be in the future).

I've seen fraud cases where someone transposed SSAN or knew enough
about SOUNDEX principals to fool the credit bureaus into pulling
perfectly legitimate credit reports--but NOT of the person who applied
for credit.

I remember such a fraud case about 10 years ago that cost creditors
well over $100,000 in fraudulent charge.  My employer was lucky... we
only got stuck for about $200 but some of the major department stores
got hit for over $2000--not to mention banks on VISA/MasterCard and a
car loan.

Furthermore, I am a firm believer that ALL data going into a database
SHOULD be verified--either by a second dataentry operator or by a blind
                   double-entry system.
