How To Join With A WHERE Clause In SQL
A WHERE
clause is the most common and straightforward way to filter SQL query results. It’s often used in conjunction with a WHERE
clause, although the logic can get a little more complicated.
To join with a WHERE
clause, simply place the WHERE
clause after the join(s) but before any other clauses. Inner joins may also act like filters, which may help you to simplify your WHERE
clause. Take particular care if you WHERE
clauses filters on the results of an outer join, since these often produce NULL values.
All examples will refer to the following tables:
EMPLOYEE
name | role_id |
---|---|
Bob | 10 |
Jane | 20 |
Andrew | 20 |
Susan | 30 |
New hire TBD | (null) |
(null) | 100 |
ROLE
role_id | title |
---|---|
10 | Sales Rep |
20 | Accountant |
90 | President |
100 | (null) |
If you aren’t familiar with basic JOIN
syntax, then start with this overview of which join to use in SQL.
Example: SQL JOIN with WHERE
Let’s say we want to retrieve all male employees (i.e. Bob and Andrew) plus their titles if available. There are multiple solutions, but one is LEFT JOIN
with WHERE
as follows:
SELECT
e.name
,r.title
FROM employee e
LEFT JOIN role r
ON r.role_id = e.role_id
WHERE e.name IN ('Bob', 'Andrew')
name | title |
---|---|
Bob | Sales Rep |
Andrew | Accountant |
Logically, joins run as part of the FROM
clause, and the FROM
clause runs before WHERE
. In other words, the query above:
- Takes all employees.
- Looks up their role, if any.
- Filters to employees named Bob or Andrew.
Filtering with INNER JOIN vs. WHERE
As mentioned earlier, inner joins can have their own filtering effects. For instance, we could get an identical result by rewriting the previous query as follows:
SELECT
e.name
,r.title
FROM employee e
INNER JOIN role r
ON r.role_id = e.role_id
AND e.name IN ('Bob', 'Andrew')
Previously, a match only depended on role_id
. Now, we’ve added a join condition so that a match also depends on name
. INNER JOIN
discards non-matching rows, so we lose rows without a matching name—exactly the same effect as the WHERE
clause in the previous example!
So, in this situation, INNER JOIN
and WHERE
are logically interchangeable.
That reveals two common design patterns. If we want to filter TABLE_A to rows that match TABLE_B, then we can accomplish this with either:
SELECT *
FROM table_a
WHERE <join column> IN (
SELECT <join column>
FROM table_b
) x
Or:
SELECT table_a.*
FROM table_a
[INNER] JOIN table_b
ON table_b.<join column> = table_a.<join column>
In the first query, SELECT *
is sufficient because there is only one table in the FROM
clause. Rows simply cannot come from anywhere else.
In the second query, we specify SELECT table_a.*
because there are two tables in the FROM
clause. We only want to filter by the existence of a match in table_b
, not actually return that match.
(There are several more options, too. Many of them use the EXISTS
operator, which is beyond the scope of this article.)
Handling NULL with WHERE & JOIN
When a WHERE
clause refers to a join, think carefully about potential NULL values. Here are the most common implications to consider.
First, an inner join will exclude rows where the join field is NULL, since it’s impossible to match on unknown values:
SELECT e.*
FROM employee e
[INNER] JOIN role r
ON r.role_id = e.role_id
name | role_id |
---|---|
Bob | 10 |
Jane | 20 |
Andrew | 20 |
(null) | 100 |
EMPLOYEE originally contains a row where role_id
is NULL. the inner join above omits that row even though we didn’t explicitly filter out NULLs. In other words, it would be redundant to add WHERE e.role_id IS NOT NULL
to the query.
Second, the WHERE
clause may return different results if it filters on the first table’s join field versus the second table’s join field. This usually arises with NOT
filters in particular.
Let’s say we want name
and title
for all employees whose role_id
is not 20. We can filter on role_id
in EMPLOYEE:
SELECT
e.name
,r.title
FROM employee e
LEFT JOIN role r
ON r.role_id = e.role_id
WHERE e.role_id <> 20
name | title |
---|---|
Bob | Sales Rep |
Susan | (null) |
(null) | (null) |
Or we can filter on role_id
in ROLE:
SELECT
e.name
,r.title
FROM employee e
LEFT JOIN role r
ON r.role_id = e.role_id
WHERE r.role_id <> 20
name | title |
---|---|
Bob | Sales Rep |
(null) | (null) |
Notice that Susan is in the first result set but not in the second result set. This often causes confusion in real-world queries, since the difference in logic is subtle but critical.
The first query filters on the first table in the join (EMPLOYEE) so it’s independent from the results of the join. Susan’s role_id
is 30 in EMPLOYEE. Obviously, 30 is definitively not 20, so it passes the <> 20
filter on EMPLOYEE.
The second query filters on the second table in the join (ROLE) so it’s dependent upon the results of the join. Susan’s role_id
does not exist in ROLE, so it’s passed to the filter as NULL, meaning “unknown.” An unknown value may or may not be 30—we can’t definitively say either way—so it fails the <> 20
filter on ROLE.
In short, if your WHERE
clause filters on the results (the “look-up” side) of a left join, then you may be filtering on implicit NULL values. These might theoretically pass the filter in terms of business context, yet still fail it because SQL treats NULL as unknown, not as a specific value!
As always, it’s essential to think about the logical order of execution: FROM
and JOIN
first, followed by WHERE
.