Discussion: more about keys
Let’s look again at the relation scheme diagram for Customers and Orders.
Other views of this diagram: Large image - Data dictionary (text)
Remember that a super key is any set of attributes whose values, taken together, uniquely identify each row of a table—and that a primary key is the specific super key set of attributes that we picked to serve as the unique identifier for rows of this table. We are showing these attributes in the scheme diagram for convenience, understanding that keys are a property of the table (relation), not of the scheme.
Candidate keys
Before picking the pk, we need to identify any candidate key or keys that we can find for this table. A ck is a minimal super key; “minimal” means that if you take away any one attribute from the set, it is no longer a super key. If you add one attribute to the set, it is no longer minimal (but it’s still a super key). The word “candidate” simply means that this set of attributes could be used as the primary key, since we don’t want any more attributes in the pk than are necessary.
Whether a set of attributes constitutes a ck or not depends entirely on the data in the table—not just on whatever data happens to be in the table at the moment, but on any set of data that could realistically be in this table over the life of the database. Does the Customers pk {cFirstName, cLastName, cPhone} meet this test?
Every table must have at least one candidate key; if you can’t find one, your design isn’t finished. Former heavyweight boxing champion George Foreman is said to have named every one of his sons “George Foreman.” If they were all customers of ours, living at the same address with the same phone number, we would need at least one more descriptive attribute—probably the birth date—to distinguish between them.
PK size might matter
You’re probably thinking that it’s a real nuisance and waste of space to copy all three of the Customer pk attributes to make the fk in Orders. If so, you’re right. Remember that we purposely designed the Customers table without considering its association with Orders. Now that it’s the parent in a one-to-many association, we have to ask if the pk is small enough to be copied into the child table. In some tables, it will be— so we’re done. In this one, it isn’t—so we’ll have to make up a pk that is small enough. There are two types of “made up” primary keys:
Do not automatically add “ID numbers” (surrogate keys) or substitute keys to a table until you are sure that:
These rules apply to surrogate and substitute keys that you (and your co-workers) add to your own tables. However, you might find that a class type already has an attribute that appears to be a surrogate or substitute key, but has been defined by someone else—usually a standards-setting organization or a government agency. We call this attribute an external key. In the external organization’s database, there is a candidate key for it, whether or not you have access to it or include its value(s) in your own database. You may use it as a descriptive attribute in both UML and the relation scheme diagram. Like other descriptive attributes, it might or might not become part of a candidate key in your database. We will encounter many of these (such as the zip code, UPC, and ISBN) in our examples.
• There is one special case of an external key that requires careful handling in your database design: the United States social security number (SSN). Originally intended for use only to identify social security participants, it has now become so over-used as an identifier that access to it poses risks of serious damage to individuals, even including identity theft. Please do not ever use the SSN in your database unless you are required to do so by law (for example, to file tax information). Even then, do not use it as a primary key that would be viewable to everyone who can access your database.
Revising the relation scheme
All of what we have done here applies to the relation scheme and tables only—the UML class diagram doesn’t change (unless we have to add descriptive attributes). Our revised scheme, with surrogate pk customer ID, looks like this:
Other views of this diagram: Large image - Data dictionary (text)
In the Orders scheme, the custID fk still represents the ck attributes from Customers—it tells us which customer placed the order. In the child table, it can be used as if it were a descriptive attribute in a candidate key or primary key, as shown here.
The Customers and Orders tables can be joined exactly as before, only this time the sole join attribute (intersection of the schemes) is the custID:
| custid | cfirstname | clastname | cphone | cstreet | czipcode |
|---|---|---|---|---|---|
| 1234 | Tom | Jewett | 714-555-1212 | 10200 Slater | 92708 |
| 5678 | Alvaro | Monge | 562-333-4141 | 2145 Main | 90840 |
| 9012 | Wayne | Dick | 562-777-3030 | 1250 Bellflower | 90840 |
| custid | orderdate | soldby |
|---|---|---|
| 5678 | 2003-07-14 | Patrick |
| 9012 | 2003-07-14 | Patrick |
| 5678 | 2003-07-18 | Kathleen |
| 5678 | 2003-07-20 | Kathleen |
| custid | cfirstname | clastname | cphone | cstreet | czipcode | orderdate | soldby |
|---|---|---|---|---|---|---|---|
| 5678 | Alvaro | Monge | 562-333-4141 | 2145 Main | 90840 | 2003-07-14 | Patrick |
| 9012 | Wayne | Dick | 562-777-3030 | 1250 Bellflower | 90840 | 2003-07-14 | Patrick |
| 5678 | Alvaro | Monge | 562-333-4141 | 2145 Main | 90840 | 2003-07-18 | Kathleen |
| 5678 | Alvaro | Monge | 562-333-4141 | 2145 Main | 90840 | 2003-07-20 | Kathleen |
Copyright © 2002–2006, by Tom Jewett. Links to this site are welcome and encouraged. Individual copies may be printed for non-commercial classroom or personal use; however, this material may not be reposted to other web sites or newsgroups, or included in any printed or electronic publication, whether modified or not, without specific permission from the author.


