2020 has truly been a rollercoaster ride of a year so far - a worldwide pandemic, economic downturn, racial unrest following the death of George Floyd, just to name a few. In light of recent events, I wanted to analyze online user comments to uncover the different types of online hate speech and extract insightful patterns. This analysis was based on web scraping comments from Disqus, an online public comment sharing platform where users create profiles to participate in interactive conversations.


Overview — Topic Modeling

Topic Modeling is an unsupervised Natural Language Processing (NLP) method which aims to discover the abstract “topics” (themes) that…

Window functions are similar to aggregate functions in that they compute aggregate values by performing calculations across a set of rows related to the current row. Unlike aggregate functions, using window functions does not group the result set (into a single output row) but instead retains the same number of rows as the input.

Does this sound confusing? Let’s dissect window functions by going over the basic structures and a few examples.

Window Function Basics

SELECT duration, 
SUM(duration) OVER (ORDER BY start_time) AS running_total
FROM bikeshare_stats

The above query sums duration seconds over the entire result set, in order by start time.

As we begin trying to answer more in-depth questions about the data, our SQL queries will become more complex as well. In the beginner cheatsheet, I covered the basic structures of SQL in terms of querying from one table. In the intermediate portion, I explained aggregate functions and joins which are used to aggregate data using summary statistics and combine multiple tables.

In my cheatsheets, I first present the theories and then apply them to real-world examples to illustrate how the concepts work in practice.


The advanced SQL cheatsheet will cover two main areas: subqueries and window functions. …

After covering aggregation in the previous section, I will now go over one of the most important SQL concepts — joins. Often times, the data we need are not just from one table but instead span over multiple ones. We use joins to combine multiple tables containing common identifiers (i.e. unique ID). Note that we can also join a table to itself.

I will first go over the core concepts of SQL joins, which include some venn diagram visual representation that will hopefully help with your understanding. Then, I will cover some specific examples using the different types of joins…

The intermediate SQL tutorial will go over 2 main areas: aggregate functions (aggregate data using summary statistics) and joins (combine 2 or more tables). I will cover various aggregate functions in this section, ranging from COUNT and SUM to GROUP BY and CASE WHEN. In the next part, I will also go over the various types of joins, such as INNER JOIN and LEFT JOIN.

These cheatsheets focus on first presenting the theories and then applying them to real-world examples to illustrate how the concepts work in practice.

Aggregate Functions

SQL is excellent at aggregating data similar to how you might in…

SQL stands for Structured Query Language, which is one of the most important coding languages to learn especially for Data Scientists and Data Analysts. Its main use case is to query and analyze data from a relational database, which consists of rows (records) and columns (features). For example, we can have a table of transactions, with each row indicating a different customer transaction, and columns such as user_ID, transaction_date, and transaction_amount.

I’ll be breaking down some important SQL concepts into beginner, intermediate, and advanced sections. In the beginner portion, we will cover the basic structures such as SELECT, WHERE, and…

Understanding how to utilize algorithms ranging from random forest to TextRank for different use cases

Machine Learning has been popularized in recent years due to its scalability and accuracy in recognizing patterns and predicting outcomes. It has already been widely adopted in various industries, ranging from detecting fraudulent transactions to predicting customer churn.

There are so many readily available Machine Learning models out there that it can get quite overwhelming to learn, especially for those who are relatively new to data science. …

Alexander Chang

M.S.E Data Science @Johns Hopkins University with a B.S. in Applied Mathematics. Previously @MongoDB, current Data Science Intern @EA

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store