After covering aggregation in the previous section, I will now go over one of the most important SQL concepts — joins. Often times, the data we need are not just from one table but instead span over multiple ones. We use joins to combine multiple tables containing common identifiers (i.e. unique ID). Note that we can also join a table to itself.
I will first go over the core concepts of SQL joins, which include some venn diagram visual representation that will hopefully help with your understanding. Then, I will cover some specific examples using the different types of joins. Lastly, I will introduce UNION, a concept similar to joins.
The Syntax of JOIN
SELECT p.first_name, p.last_name, a.city, a.state
FROM person p
JOIN address a
ON p.person_ID = a.person_ID
The above query combines the person and the address tables with person_ID as the common identifier. Note: we rename (provide aliases) both tables as “p” and “a” respectively, which can be especially useful when dealing with long table names.
Joins can be broken down into two types: inner joins and outer joins. Mastering the key differences can come a long way when dealing with different use cases. I will provide some examples that illustrate why one type of join might be the superior option in certain use cases.
INNER JOIN outputs the intersection of 2 tables, which includes only the elements that are common in both tables. In other words, rows from either table that are unmatched in the other table are not returned.
Here’s an example where using INNER JOIN is the appropriate method. Let’s say we want to count the number of unique users/day who logged in from both iPhone and web.
SELECT i.date, COUNT(DISTINCT i.user_ID) AS num_users
FROM iPhone i
JOIN web w
ON i.user_ID = w.user_ID
AND i.date = w.date
Why do we use INNER JOIN? Because we want to display the users from both iPhone and web (i.e. elements that are common in both tables). Note that JOIN and INNER JOIN are equivalent and can be used interchangeably.
Unmatched rows in one or both tables can be returned in an outer join. We will cover the three types of outer joins: LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN.
LEFT JOIN returns all rows in table1 in the FROM clause and only matching rows from table2 (after the LEFT JOIN clause).
Let’s say we want to return a list of cities without any users:
SELECT cities.name, users.ID
LEFT JOIN users
ON cities.ID = users.city_ID
WHERE users.ID IS NULL
Why not use an INNER JOIN here? Because joining by city_ID will display all of the cities that have users, and all of the cities without users will be filtered out. Using a LEFT JOIN will keep all rows from the cities table, even if a city doesn’t contain any user (in which user_ID is NULL). Ultimately, we use the WHERE clause to include only those cities that do not have any user.
Similarly, RIGHT JOIN returns all rows from table2 (after the RIGHT JOIN clause) and only matching rows from table1 in the FROM clause. We rarely use RIGHT JOIN in practice because we can achieve the same results by switching the order of the two tables in a LEFT JOIN.
FULL OUTER JOIN
FULL JOIN displays the union of 2 tables, which includes all unmatched rows from both tables. Regardless of which rows have null values, all of our data will be displayed.
While joins allow you to combine two tables horizontally (side-by-side), UNION will stack tables vertically. More specifically, UNION allows you to stack one SELECT statement on top of another.
There are two types of UNION: UNION only appends distinct values while UNION ALL appends all values including duplicates.
SQL has two strict rules for appending data using UNION: there must be the same number of columns, and the selected columns must have the same data types in the same order.
(SELECT date, user1, message_count
(SELECT date, user2, message_count
The above query returns the number of messages by each user by date.