SQL Intermediate Cheatsheet — Aggregate Functions

The intermediate SQL tutorial will go over 2 main areas: aggregate functions (aggregate data using summary statistics) and joins (combine 2 or more tables). I will cover various aggregate functions in this section, ranging from COUNT and SUM to GROUP BY and CASE WHEN. In the next part, I will also go over the various types of joins, such as INNER JOIN and LEFT JOIN.

These cheatsheets focus on first presenting the theories and then applying them to real-world examples to illustrate how the concepts work in practice.

Aggregate Functions

The specific functions I will cover are COUNT, SUM, DISTINCT, GROUP BY, HAVING, and CASE. There are also other basic aggregate functions such as MIN/MAX and AVG. Specifically, MIN(price), MAX(price), and AVG(price) will output the minimum, maximum, and average price from a given table.


COUNT all rows

FROM table_name

COUNT(*) will output a single number which equals the total number of rows in the entire table. (Note: using COUNT(1) has the same effect as COUNT(*) )

COUNT individual columns

FROM table_name

The query above returns the number of rows in which user_ID is not null.


FROM table_name

This returns the total unique values of the user_ID column.


The query below will calculate the sum of the volume column from a table with Apple stock prices.

SELECT SUM(volume)
FROM apple_stock_price

Note that aggregators like SUM only aggregate vertically. Calculation across rows can be done using simple arithmetic.


GROUP BY is often used with the aforementioned aggregate functions: COUNT, SUM, AVG, MIN, MAX.

SELECT year, month, SUM(volume)
FROM apple_stock_price
GROUP BY year, month
ORDER BY year, month

The above query will output the total number of Apple shares traded (grouped by) each month, sorted in (ascending) chronological order.

SELECT product, COUNT(*)
FROM transactions
GROUP BY product

The query above will calculate the number of transactions for (grouped by) each product.


SELECT year, month, MAX(price)
FROM apple_stock_price
GROUP BY year, month
HAVING MAX(price) > 400

Here we’re finding every month in which Apple stock worked its way over $400.

SELECT product, AVG(price)
FROM transactions
GROUP BY product
HAVING AVG(price) <= 200

This query above will output which product had an average price of $200 or less.


SELECT name, school_year, CASE WHEN school_year IN ('senior', 'junior') THEN 'yes' ELSE NULL END AS is_senior 
FROM ncaa_players

The query above can be interpreted as the following:

  1. CASE WHEN checks each row to see if school_year is ‘junior’ or ‘senior.
  2. For a given row: if the conditional statement is true, ‘yes’ gets printed in the newly created is_senior column. If false, there will be a null value in the is_senior column.
  3. Meanwhile, SQL will output all values in the name and school_year columns.

Here are two specific examples that illustrate the power of CASE WHEN.

SELECT CASE WHEN state IN ('CA', 'WA', 'OR') THEN 'west coast' WHEN state = 'TX' THEN 'texas' ELSE 'other' END AS region, COUNT(1) AS player_count
FROM ncaa_players
WHERE weight > 200

Count the number of players weighing more than 200 pounds for West Coast, Texas, and the other regions. (Note: COUNT(1) and GROUP BY 1 both refer to aggregation on the first column, which is the region classification).

SELECT state, COUNT(CASE WHEN year = 'junior' THEN 1 ELSE NULL END) AS junior_count, COUNT(CASE WHEN year = 'senior' THEN 1 ELSE NULL END) AS senior_count, COUNT(1) AS total_players_by_state
FROM ncaa_players
GROUP BY state
ORDER BY total_players_by_state DESC

Display the number of juniors and seniors by state, as well as the total number of players in each state. Order the output results such that states with the most players come first.



M.S.E Data Science @Johns Hopkins University with a B.S. in Applied Mathematics. Previously @MongoDB, current Data Science Intern @EA