Most web applications rely on storing their data in a relational database, consisting of tables which are comprised of columns and rows. PostgreSQL, MySQL and SQLite are the most popular and established relational databases, and luckily, they have a LOT in common.
1 — Foundation of Relational Databases
Before we dive into our workshop project, we will spend some time to lay the foundation for relational databases and SQL. Learning whether a given task is best handled by your database or application layer is a big part of ensuring your apps perform well under heavy loads.Agenda
Welcome and Tech Check
We’ll get to know each other and ensure everyone is set up for the workshop project.
Relational Algebra and Codd’s Relational Model
In 1970, Edgar Codd invented a new way to model large, organized and shared piles of data using the expressive semantics provided by relational algebra. Today, virtually all relational databases are still based on these fundamental principles. We’ll cover the conceptual models behind tables, columns, result sets and “joins”.
Structured Query Language
Virtually all relational databases use some variant of a (mostly) declarative programming language called Structured Query Language (SQL) to perform operations. We’ll learn what SQL looks like, and try writing a few statements together.
Three forms of SQL databases
We’ll focus on a few types of databases, all of which fit our definition of “SQL Databases”.
- Hosted databases exist as a completely independent system component, often running on their own server. PostgreSQL and MySQL are among the most popular hosted relational database products today.
- Embedded databases are often packaged with an application instance, often as a file on disk. From the outside world, it’s hard or impossible to separate “app” from “database”. We’ll be working extensively with SQLite - a very popular relational embedded database widely used in mobile, desktop and web applications.
- Spreadsheets can also be regarded as databases. Although more limited than the other types of databases, a surprising amount of SQL syntax can be used to perform advanced queries and calculations! We’ll be using the google visualization API, which allows us to “query” a google spreadsheet using a SQL-like syntax and get JSON back.
DB Management Tools
We’ll look at a few tools that will help us on our journey to learn more about SQL databases.
2 — Retrieving Data
The first thing we will learn is how to get data out of a database in a variety of ways. We will begin with the simplest possible queries; move on to filtering our result set; join tables together to retrieve the data we are interested as quickly and easily as possible.Agenda
SELECTing a collection of data
SELECT is the best and easiest way to begin working with a SQL database! But as we’ll see later on, it is by far the most complex type of query we’ll encounter.
EXERCISE: Selecting Columns
Selecting all columns in a table is generally inappropriate for a production app. We’ll explicitly pick which columns we need for several collections of data that our app needs, and witness the improved performance gained by making this simple change.
Filtering via WHERE clauses
It is often undesirable to work with all tuples or “rows” from a given table. Adding a WHERE clause to our SELECT query allows us to specify one or more criteria for filtering the result set, down to only what we are interested in.
EXERCISE 2 - Filtering via WHERE clauses
We’ll add WHERE clauses to the collection queries for two pages on our app.
- On the products list page, we’ll allow the user to filter by those products that need to be reordered, those that are discontinued, or the full list,
- On the customer list page, we’ll add a rudimentary search field and use a LIKE clause to find matching rows.
Break for Lunch
LIMITing and ORDERing the result set
Particularly when working with large collections of data, it is important to be able to sort data the way we want and paginate or scroll through the results. We’ll learn how to use LIMIT and OFFSET to retrieve the records of interest, and ORDER BY to sort.
EXERCISE: Sorting and Paging
In our example app, the orders page has over 16,000 records. This is way too much data to show to users all at once. Even looking at an individual customer’s orders is a bit overwhelming. We’ll use the existing user interface for sorting and pagination, and modify the “orders list” and “customer orders list” queries as appropriate.
3 — Querying Across Tables
Time to put our newfound knowledge of relational algebra into practice! One of the great advantages of a relational database is the ability to mix tables together in queries, and aggregate or group across columns. Databases are built to do this kind of work, so it’s often much faster to build the right query than to move similar logic into our application code.Agenda
Inner and outer JOINs
There are five types of joins in most relational database systems, but we can get away with focusing almost entirely on the two categories: INNER and OUTER joins. We’ll learn about the distinction between these two types, and how to pick the right join operation for the job.
EXERCISE: JOIN to replace ids with names
There are several places in our app where alphanumeric IDs are shown to users. Humans prefer referring to things by names, so let’s use JOIN to transform these references into records that are more user-friendly!
Aggregate Functions and GROUP BY
Often, we are interested in summary data that is aggregated over a result set (example: “give me the number of products we have in each category”). Through using GROUP BY, we can define the records we are interested in. We can use aggregate functions like sum, count, group_concat to aggregate over duplicate data.
EXERCISE: Aggregate Functions and GROUP BY
There are several places where some additional aggregate information is needed in order to “fix” the currently broken experience. Firstly, we need to get the subtotal of an order’s line items and display it prominently at the bottom of the order page. Then, we’ll count and concatenate aggregate results as we group records on the employee, customer, and product list pages.
4 — Creating, Updating and Deleting
Now that we have gotten used to the different ways to retrieve data from our database, we will learn how to create, manipulate and destroy records.Agenda
Creating and Deleting Records
CREATE and DELETE are considerably simpler than the SELECT statement we have been working with so far. More often than not, you’ll be building these queries with values entered by users, so this is a great time to discuss SQL injection attacks and how we can defend against them.
EXERCISE: Creating and Updating Orders
We’ll build the proper queries for creating new orders and updating existing ones, being sure to avoid susceptibility to SQL injection attacks.
Transactions allow a sequence of SQL statements to be grouped together and treated by the database as one “all or nothing” unit. This important tool allows us to achieve an even higher level of data consistency and integrity - through the assurance that the entire transaction will either complete, or the database will be left totally unaffected.
We’ll use a transaction to update our SQL statement for creating a new order.
Wrap up and review
We’ll review everything we have covered so far, and set our sights on tomorrow’s topics.
5 — The Schema Evolves
Over time, you will often need to update the schema or “shape” of your data to meet your application’s needs. In this unit, we will learn about using migrations to manage these changes. We can apply database-level constraints via the schema to ensure that even if our application logic misbehaves, our data is always consistent.Agenda
As a database-driven system evolves, we often need to make changes to its schema. We’ll discuss best practices for treating a database as a semi-free-standing system component and learn how to commit schema changes to a git repository along with our source code, to provide ourselves with a reliable way to maintain multiple environments across a team.
When we create an index in a database, we are telling it to do some bookkeeping as records are added and updated. When the time comes to search in a particular way, results can be retrieved quickly and directly using indices.
Boost JOIN performance via INDEXes
You may have noticed that our database query times increased over the last few exercises. One contributor to this problem has to do with the JOINs we added in exercise 4. Adding an index will tell the database to keep track of particular slices of data at all times, and should dramatically improve these JOINed queries
To ensure data integrity, sometimes we have the option to put constraints directly on a database. For example, if one record refers to another, we can require that the other record actually exists. We’ll look at several different database-level constraints we can put in place, including NOT NULL, UNIQUE indices and foreign keys.
EXERCISE: Adding DB constraints
We’ll add a few constraints to our database to ensure that even if our business logic runs into problems, only consistent and valid records can be stored.
We’ll recap everything we have learned, and provide some resources for further learning.