SQL technique: union and minus
Set operations on tables
Some students initially think of the join as being a sort of union between two tables. It’s not (except for the schemes). The join pairs up data from two very different tables. In RA and SQL, union can operate only on two identical tables. Remember the Venn-diagram representation of the union and minus operations on sets. Union includes members of either or both sets (with no duplicates). Minus includes only those members of the set on the left side of the expression that are not contained in the set on the right side of the expression.
• Both sets, R and S, have to contain objects of the same type. You can’t union or minus sets of integers with sets of characters, for example. All sets, by definition, are unordered and cannot contain duplicate elements.
• SQL and RA set operations treat tables as sets of rows. Therefore, the tables on both sides of the union or minus operator have to have at least the same number of attributes, with corresponding attributes being of the same data type. It’s usually cleaner and more readable if you just go ahead and give them the same name using the AS keyword.
Union
For this example, we will add a Suppliers table to our sales data entry model. “A supplier is a company from which we purchase products that we will re-sell.” Each supplier suppliers zero to many products; each product is supplied by one and only one supplier. The supplier class attributes include the company name and address, plus the name and phone number of the representative from whom we make our purchases.
• We would like to produce a listing that shows the names and phone numbers of all people we deal with, whether they are suppliers or customers. We need rows from both tables, but they have to have the same attribute list. Looking at the relation scheme, we find corresponding first name, last name, and phone number attributes, but we still need to show what company each of the supplier representatives works for.
• We can create an extra column in the query output for the Customers table by simply giving it a name and filling it with a constant value. Here, we’ll use the value 'Customer' to distinguish these rows from supplier representatives. SQL uses the column names of the first part of the union query as the column names for the output, so we will give each of them aliases that are appropriate for the entire set of data.
• Build and test each component of the union query individually, then put them together. The ORDER BY clause has to come at the end.
SELECT cLastName AS "Last Name", cFirstName AS "First Name",
cPhone as "Phone", 'Customer' AS "Company"
FROM customers
UNION
SELECT repLName, repFName, repPhone, sCompanyName
FROM suppliers
ORDER BY "Last Name";
| Last Name | First Name | Phone | Company |
|---|---|---|---|
| Bradley | Jerry | 888-736-8000 | Industrial Tool Supply |
| Dick | Wayne | 562-777-3030 | Customer |
| Jewett | Tom | 714-555-1212 | Customer |
| Monge | Alvaro | 562-333-4141 | Customer |
| O'Brien | Tom | 949-567-2312 | Bosch Machine Tools |
Minus
Sometimes you have to think about both what you do want and what you don’t want in the results of a query. If there is a WHERE clause predicate that completely partitions all rows of interest (the result set) into those you want and those you don’t want, then you have a simple query with a test for inequality.
• The multiplicity of an association can help you determine how to build the query. Since each product has one and only one supplier, we can partition the Products table into those that are supplied by a given company and those that are not.
SELECT prodName, sCompanyName
FROM Products NATURAL JOIN Suppliers
WHERE sCompanyName <> 'Industrial Tool Supply';
• Contrast this to finding customers who did not make purchases in 2002. Because of the optional one-to-many association between Customers and Orders, there are actually four possibilities:
• If you try to write this as a simple test for inequality,
SELECT DISTINCT cLastName, cFirstName, cStreet, cZipCode
FROM Customers NATURAL JOIN Orders
WHERE TO_CHAR(orderDate, 'YYYY') <> '2002';
• We can show in set notation what we need to do:
There are two ways to write this in SQL.
• The easiest syntax in this case is to compare only the customer IDs. We’ll use the NOT IN set operator in the WHERE clause, along with a subquery to find the customer ID of those who did made purchases in 2002.
SELECT cLastName, cFirstName, cStreet, cZipCode
FROM Customers
WHERE custID NOT IN
(SELECT custID
FROM Orders
WHERE TO_CHAR(orderDate, 'YYYY') = '2002');
• We can also use the MINUS operator to subtract rows we don’t want from all rows in Customers. (Some versions of SQL use the keyword EXCEPT instead of MINUS.) Like the UNION, this requires the schemes of the two tables to match exactly in number and type of attributes.
SELECT cLastName, cFirstName, cStreet, cZipCode
FROM Customers
MINUS
SELECT cLastName, cFirstName, cStreet, cZipCode
FROM Customers NATURAL JOIN Orders
WHERE TO_CHAR(orderDate, 'YYYY') = '2002';
Other set operations
SQL has two additional set operators. UNION ALL works like UNION, except it keeps duplicate rows in the result. INTERSECT operates just like you would expect from set theory; again, the schemes of the two tables must match exactly.
Copyright © 2002–2006, by Tom Jewett. Links to this site are welcome and encouraged. Individual copies may be printed for non-commercial classroom or personal use; however, this material may not be reposted to other web sites or newsgroups, or included in any printed or electronic publication, whether modified or not, without specific permission from the author.


