SQL technique: join types
All of the joins that you have seen so far have used the natural join syntax—for example, to produce a list of customers and dates on which they placed orders. Remember that if this syntax is available, it will automatically pick the join attributes as those with the same name in both tables (intersection of the schemes). It will also produce only one copy of those attributes in the result table.
SELECT cFirstName, cLastName, orderDate FROM customers NATURAL JOIN orders;
• The join does not consider the pk and fk attributes you have specified. If there are any non-pk/fk attributes that have the same names in the tables to be joined, they will also be included in the intersection of the schemes, and used as join attributes in the natural join. The results will certainly not be correct! This problem might be especially difficult to detect in cases where many natural joins are performed in the same query. Fortunately, you can always specify the join attributes yourself, as we describe next.
• Another keyword that produces the same results (without the potential attribute name problem) is the inner join. With this syntax, you may specify the join attributes in a USING clause. (Multiple join attributes in the USING clause are separated by commas.) This also produces only one copy of the join attributes in the result table. Like the NATURAL JOIN syntax, the USING clause is not supported by all systems.
SELECT cFirstName, cLastName, orderDate FROM customers INNER JOIN orders USING (custID);
• The most widely-used (and most portable) syntax for the inner join substitutes an ON clause for the USING clause. This requires you to explicitly specify not only the join attribute names, but the join condition (normally equality). It also requires you to preface (qualify) the join attribute names with their table name, since both columns will be included in the result table. This is the only syntax that will let you use join attributes that have different names in the tables to be joined. Unfortunately, it also allows you to join tables on attributes other than the pk/fk pairs, which was a pre-1992 way to answer queries that can be written in better ways today.
SELECT cFirstName, cLastName, orderDate FROM customers INNER JOIN orders ON customers.custID = orders.custID;
• You can save a bit of typing by specifying an alias for each table name (such as c and o in this example), then using the alias instead of the full name when you refer to the attributes. This is the only syntax that will let you join a table to itself, as we will see when we discuss recursive relationships.
SELECT cFirstName, cLastName, orderDate FROM customers c INNER JOIN orders o ON c.custID = o.custID;
• All of the join statements above are specified as part of the 1992 SQL standard, which was not widely supported for several years after that. In earlier systems, joins were done with the 1986 standard SQL syntax. Although you shouldn’t use this unless you absolutely have to, you just might get stuck working on an older database. If so, you should recognize that the join condition is placed confusingly in the WHERE clause, along with all of the tests to pick the right rows:
SELECT cFirstName, cLastName, orderDate FROM customers c, orders o WHERE c.custID = o.custID;
One important effect of all natural and inner joins is that any unmatched PK value simply drops out of the result. In our example, this means that any customer who didn’t place an order isn’t shown. Suppose that we want a list of all customers, along with order date(s) for those who did place orders. To include the customers who did not place orders, we will use an outer join, which may take either the USING or the ON clause syntax.
SELECT cFirstName, cLastName, orderDate FROM customers c LEFT OUTER JOIN orders o ON c.custID = o.custID;
• Notice that for customers who placed no orders, any attributes from the Orders table are simply filled with NULL values.
• The word “left” refers to the order of the tables in the FROM clause (customers on the left, orders on the right). The left table here is the one that might have unmatched join attributes—the one from which we want all rows. We could have gotten exactly the same results if the table names and outer join direction were reversed:
SELECT cFirstName, cLastName, orderDate FROM orders o RIGHT OUTER JOIN customers c ON o.custID = c.custID;
• An outer join makes sense only if one side of the relationship has a minimum cardinality of zero (as Orders does in this example). Otherwise, the outer join will produce exactly the same result as an inner join (for example, between Orders and OrderLines).
• The SQL standard also allows a FULL outer join, in which unmatched join attributes from either side are paired with null values on the other side. You will probably not have to use this with most well-designed databases.
Multiple joins in a query are evaluated left-to-right in the order that you write them, unless you use parentheses to force a different evaluation order. (Some database systems require parentheses in any case.) The schemes of the joins are also cumulative in the order that they are evaluated; in RA, this means that
• It is especially important to remember this rule when outer joins are mixed with other joins in a query. For example, if you write:
SELECT cFirstName, cLastName, orderDate, UPC, quantity FROM customers LEFT OUTER JOIN orders USING (custID) NATURAL JOIN orderlines;
you will lose the customers who haven’t placed orders. They will be retained if you force the second join to be executed first:
SELECT cFirstName, cLastName, orderDate, UPC, quantity FROM customers LEFT OUTER JOIN (orders NATURAL JOIN orderlines) USING (custID);
Other join types
For sake of completeness, you should also know that if you try to join two tables with no join condition, the result will be that every row from one side is paired with every row from the other side. Mathematically, this is a Cartesian product of the two tables, as you have seen before. It is almost never what you want. In pre-1992 syntax, it is easy to do this accidently, by forgetting to put the join condition in the WHERE clause:
SELECT cFirstName, cLastName, orderDate FROM customers, orders;
• If your system is backward-compatible (most are), you might actually try this just to prove to yourself that the result is pure nonsense. However, if you ever have an occasion to really need a Cartesian product of two tables, use the new cross join syntax to prove that you really mean it. Notice that this example still produces nonsense.
SELECT cFirstName, cLastName, orderDate FROM customers CROSS JOIN orders;
• It is possible, but confusing, to specify a join condition other than equality of two attributes; this is called a non-equi-join. If you see such a thing in older code, it probably represents a WHERE clause or subquery in disguise.
• You may also hear the term self join, which is nothing but an inner or outer join between two attributes in the same table. We’ll look at these when we discuss recursive relationships.
Copyright © 2002–2006, by Tom Jewett. Links to this site are welcome and encouraged. Individual copies may be printed for non-commercial classroom or personal use; however, this material may not be reposted to other web sites or newsgroups, or included in any printed or electronic publication, whether modified or not, without specific permission from the author.