SQL stands for Structured Query Language, which is one of the most important coding languages to learn especially for Data Scientists and Data Analysts. Its main use case is to query and analyze data from a relational database, which consists of rows (records) and columns (features). For example, we can have a table of transactions, with each row indicating a different customer transaction, and columns such as user_ID, transaction_date, and transaction_amount.

Image for post
Image for post

I’ll be breaking down some important SQL concepts into beginner, intermediate, and advanced sections. In the beginner portion, we will cover the basic structures such as SELECT, WHERE, and ORDER BY. We’ll also go over various comparison and logical operators used in the WHERE clause to filter your data.

These basic concepts are the foundational building blocks for writing complex queries, so mastering the basic syntax and structure is definitely important but don’t worry, they’re relatively intuitive.

Basic SQL Syntactic Structure

SELECT column_name(s)
FROM table_name
WHERE some_condition
GROUP BY column_na
HAVING some_aggregate_condition
ORDER BY column_name
LIMIT N

Does this look confusing? Let’s go over what each line means in terms of querying the specific data we want.

SELECT

We use SELECT to specify the data we would like to pull from a database.

SELECT *
FROM table_name
SELECT column1, column2, column3
FROM table_name

WHERE

We use the WHERE clause to set up specified filters for the values we want to pull.

SELECT *
FROM table_name
WHERE some_condition

Filtering results can be done in a number of ways using comparison and logical operators. Let’s dive into what we can specify in the WHERE condition.

Comparison operators are used to compare values- for example, if we only want to look at the rows where the customer’s first name is Peter. I’ll demonstrate examples of comparison operators using a table of customer transactions.

WHERE first_name = 'PETER'      -- exact match
WHERE first_name != 'PETER' -- everything except PETER
WHERE transaction_amount > 100 -- greater than 100
WHERE transaction_amount < 100 -- less than 100
WHERE user_ID >= 50 -- greater than or equal to 50
WHERE user_ID <= 90 -- less than or equal to 50

We can use logical operators to apply multiple filters/operators (i.e. using AND/OR). We’ll also go over some crucial logical operators, which are LIKE (match similar values), IN (specify a list of values to include), BETWEEN (select rows within a certain range), IS NULL (select rows with no data in a given column), and NOT (selects rows that do not match some condition).

LIKE
WHERE product LIKE 'A%' -- values starting with capital A
WHERE product LIKE '%a' -- values ending with lowercase A
WHERE product LIKE '%hair%' -- values containing "hair" in any position
WHERE product ILIKE 'a%A' -- values starting and ending with
'a', note: ILIKE is not case
sensitive
IN
WHERE ranking IN (1,2,3) -- values in the top 3 rankings
WHERE first_name NOT IN ('ROB','BOB') -- exclude both values
BETWEEN
WHERE year BETWEEN 2000 AND 2020 -- values between 2020 and 2020
NOT
WHERE product NOT ILIKE '%a%' -- exclude products that contain 'a'
WHERE user_ID IS NOT NULL -- include rows with non-null userID
IS NULL
WHERE product IS NULL -- include rows with no product names
Multiple Conditions
WHERE FIRSTNAME = 'BOB' AND LASTNAME = 'SMITH' -- both conditions
WHERE FIRSTNAME = 'BOB' OR FIRSTNAME = 'JASON' -- either condition

ORDER BY

ORDER BY is a useful function to present your results in sorted order.

SELECT *
FROM transactions
ORDER BY transaction_amount DESC

Here we’re pulling all transaction details from the highest to the lowest transaction amount (in descending order). Note that ORDER BY sorts values in ascending order by default.

SELECT *
FROM transactions
ORDER BY transaction_amount, year

Now we’re sorting the results by transaction amount, and then by year (in ascending order).

LIMIT

We use LIMIT to specify how many rows we want to display in the final results. Let’s say we want to output the top 5 customers who spent the most money in a single transaction:

SELECT *
FROM transactions
ORDER BY transaction_amount DESC
LIMIT 5

Overview

Hopefully you found this tutorial helpful in beginning your SQL journey. In the next intermediate section, I’ll go over two very important concepts: aggregation and joins. Aggregate functions (such as COUNT, SUM, and GROUP BY) are used to aggregate data similar to how you would in an Excel pivot table, which helps to process data using summary statistics. Joins are used to combine multiple tables that are related to one another- this is especially applicable, since we rarely only work with one table in the real world.

Written by

M.S.E Data Science @Johns Hopkins University with a B.S. in Applied Mathematics. Previously @MongoDB, current Data Science Intern @EA

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store