To CTE or not to CTE: The Case for Subqueries
CTEs have been getting a lot of attention but subqueries offer distinct advantages when using iterative query writing style.
I've been seeing a lot of criticism of subqueries in favor of CTEs and wanted to make the counter argument. As always, the choice between the two depends on your specific use case and you shouldn't listen to anyone who preaches absolutes, including me.
The crux of my argument is that subqueries are easier to iterate with when writing the first version of a query. Similar to writing code, we start with a basic query and gradually add complexity, checking the results at each until we get to where we want.
The benefit of using a subquery is due to its constraints. A common approach is to select a subquery in a database IDE and execute it to examine the intermediate results. Then, you wrap that inside another query and repeat the process. Although CTEs may be easier to read, when one CTE references another, you lose the ability to do this execute-review loop as each individual CTE block is not self-contained.
Everything is always clearer with an example so here are two queries that yield the identical results. The first query uses CTEs, while the second uses subqueries.
-- Using CTEs
WITH EmployeeProjectHours AS (
SELECT
EmployeeID,
COUNT(DISTINCT ProjectID) AS ProjectsCount,
SUM(HoursWorked) AS TotalHours
FROM Projects
GROUP BY EmployeeID
),
HighContributors AS (
SELECT
EPH.EmployeeID,
E.DepartmentID,
EPH.TotalHours
FROM EmployeeProjectHours EPH
JOIN Employees E ON EPH.EmployeeID = E.EmployeeID
WHERE EPH.TotalHours > 100
)
SELECT
HC.EmployeeID,
HC.DepartmentID,
HC.TotalHours
FROM HighContributors HC
JOIN Departments D ON HC.DepartmentID = D.DepartmentID
WHERE D.DepartmentName IN ('Engineering', 'Product');
-- Using subqueries
SELECT
HC.EmployeeID,
HC.DepartmentID,
HC.TotalHours
FROM
(
SELECT
EPH.EmployeeID,
E.DepartmentID,
EPH.TotalHours
FROM
(
SELECT
EmployeeID,
COUNT(DISTINCT ProjectID) AS ProjectsCount,
SUM(HoursWorked) AS TotalHours
FROM Projects
GROUP BY EmployeeID
) AS EPH
JOIN Employees E ON EPH.EmployeeID = E.EmployeeID
WHERE EPH.TotalHours > 100 -- Employees with more than 100 hours
) AS HC
JOIN Departments D ON HC.DepartmentID = D.DepartmentID
WHERE D.DepartmentName IN ('Engineering', 'Product');
Both of the above can be optimized further but I wanted to highlight the differences. While the query utilizing CTEs may offer better readability, the subquery-based version is more convenient from a workflow perspective. There’s something magical about being able to select the inner subquery, hit “Cmd + Enter,” and instantly see results. This real-time feedback is just not possible when using referencing CTEs. Although this example is simple enough for both queries to be easily understood and followed, one can imagine how complex it can get as they evolve.
At the end of the day, both subqueries and CTEs can achieve the same results. They should be treated as tools and depending on the situation, you may want to use one or the other. Once you start optimizing your queries for performance you’ll want to attempt both anyways since the query optimizers often work in mysterious ways. Treat subqueries and CTEs as tools in your SQL toolbelt and experiment with both.