SQL stands for Structured Query Language, which is one of the most important coding languages to learn especially for Data Scientists and Data Analysts. Its main use case is to query and analyze data from a relational database, which consists of rows (records) and columns (features). For example, we can have a table of transactions, with each row indicating a different customer transaction, and columns such as user_ID, transaction_date, and transaction_amount.
I’ll be breaking down some important SQL concepts into beginner, intermediate, and advanced sections. In the beginner portion, we will cover the basic structures such as SELECT, WHERE, and ORDER BY. We’ll also go over various comparison and logical operators used in the WHERE clause to filter your data.
These basic concepts are the foundational building blocks for writing complex queries, so mastering the basic syntax and structure is definitely important but don’t worry, they’re relatively intuitive.
Basic SQL Syntactic Structure
GROUP BY column_na
ORDER BY column_name
Does this look confusing? Let’s go over what each line means in terms of querying the specific data we want.
We use SELECT to specify the data we would like to pull from a database.
SELECT all columns
SELECT specific columns
SELECT column1, column2, column3
We use the WHERE clause to set up specified filters for the values we want to pull.
Filtering results can be done in a number of ways using comparison and logical operators. Let’s dive into what we can specify in the WHERE condition.
Comparison operators are used to compare values- for example, if we only want to look at the rows where the customer’s first name is Peter. I’ll demonstrate examples of comparison operators using a table of customer transactions.
WHERE first_name = 'PETER' -- exact match
WHERE first_name != 'PETER' -- everything except PETER
WHERE transaction_amount > 100 -- greater than 100
WHERE transaction_amount < 100 -- less than 100
WHERE user_ID >= 50 -- greater than or equal to 50
WHERE user_ID <= 90 -- less than or equal to 50
We can use logical operators to apply multiple filters/operators (i.e. using AND/OR). We’ll also go over some crucial logical operators, which are LIKE (match similar values), IN (specify a list of values to include), BETWEEN (select rows within a certain range), IS NULL (select rows with no data in a given column), and NOT (selects rows that do not match some condition).
WHERE product LIKE 'A%' -- values starting with capital A
WHERE product LIKE '%a' -- values ending with lowercase A
WHERE product LIKE '%hair%' -- values containing "hair" in any position
WHERE product ILIKE 'a%A' -- values starting and ending with
'a', note: ILIKE is not case
WHERE ranking IN (1,2,3) -- values in the top 3 rankings
WHERE first_name NOT IN ('ROB','BOB') -- exclude both valuesBETWEEN
WHERE year BETWEEN 2000 AND 2020 -- values between 2020 and 2020NOT
WHERE product NOT ILIKE '%a%' -- exclude products that contain 'a'
WHERE user_ID IS NOT NULL -- include rows with non-null userIDIS NULL
WHERE product IS NULL -- include rows with no product namesMultiple Conditions
WHERE FIRSTNAME = 'BOB' AND LASTNAME = 'SMITH' -- both conditions
WHERE FIRSTNAME = 'BOB' OR FIRSTNAME = 'JASON' -- either condition
ORDER BY is a useful function to present your results in sorted order.
ORDER BY transaction_amount DESC
Here we’re pulling all transaction details from the highest to the lowest transaction amount (in descending order). Note that ORDER BY sorts values in ascending order by default.
ORDER BY transaction_amount, year
Now we’re sorting the results by transaction amount, and then by year (in ascending order).
We use LIMIT to specify how many rows we want to display in the final results. Let’s say we want to output the top 5 customers who spent the most money in a single transaction:
ORDER BY transaction_amount DESC
Hopefully you found this tutorial helpful in beginning your SQL journey. In the next intermediate section, I’ll go over two very important concepts: aggregation and joins. Aggregate functions (such as COUNT, SUM, and GROUP BY) are used to aggregate data similar to how you would in an Excel pivot table, which helps to process data using summary statistics. Joins are used to combine multiple tables that are related to one another- this is especially applicable, since we rarely only work with one table in the real world.