Database:Adding constraints to prevent garbage data

From Juneday education
Revision as of 15:00, 10 November 2018 by Rikard (Talk | contribs) (Using a check constraint for validating dates in SQLite3)

Jump to: navigation, search

Meta

Expand using link to the right to see the meta section with instructor's notes.

Description

This topic deals with adding constraints to columns when creating a table. The constraints restrict the allowed values for the column, in order to prevent us from inserting junk data into the table.

Prepare

Expect the main lecture on constraints to take 25-30 minutes. As usual, we recommend that you read the online text, see the video lecture and review the exercises before giving this lecture.

If you go for also the optional extra lecture, expect it to take around 10 minutes but prepare some tables as a proof of concept and show it live after the lecture.

Topics addressed or used in the online exercises

  • FOREIGN KEY constraints
  • CHECK constraints (including checks for well-formed dates)
  • The functions STRFTIME and DATETIME
  • Constraint failure error messages
  • Using the IN operator for explicit enum-like constraints
    • The optional extra lecture shows an alternative approach
  • NOT NULL constraints
  • UNIQUE constraints
  • Globbing using the GLOB function
    • Also basic globbing in bash, which uses the same syntax
  • Check-your-progress questions with answers on a different page

Heads-up

Think about

  • This lecture uses syntax and constructs that are specific to SQLite3 - Other DBMSes have a different syntax and often more types of constraints
  • Explain that a key can be a foreign key without a constraint (as we have seen in the lecture about JOIN). The FOREIGN KEY constraint is only a tool to ensure consistency so that we can guarantee that a foreign key refers to an existing row in the foreign table
  • Globbing can be done either using the GLOB function or the GLOB infix operator
    • The order of the glob expression and the string to be tested are different between the function and operator - this is confusing to students (and the authors!)
  • A colum with a PRIMARY KEY modifier gets the UNIQUE constraint implicitly
    • This does not mean that PRIMARY KEY is a constraint - it is a modifier which includes the UNIQUE constraint
    • There are many use cases for UNIQUE constraints for columns which are not primary keys
  • Using a CHECK constraint for testing date well-formedness is rather compicated in SQLite compared to other DBMSes that are more strongly typed. SQLite accepts basically anything as the value for a column of type DATETIME. Make sure you read up on how to do the check using the strftime and datetime functions, so that you can explain this to the students (even if we think they will learn it hands-on in the online exercises)
  • One way to create a kind of ENUM-like construct in SQLite is shown in the optional extra lecture. Prepare some tables which demonstrate this in action and show it live, if you opt to give also the extra lecture.

MariaDB difference warning!

Syntax difference ahead! Click on the image to see MariaDB version.

The syntax in this chapter differs a lot from the syntax used in MariaDB. The syntax in this chapter works for SQLite3. If you want to see how you can do constraints in MariaDB, please see This page

Constraints

Unless otherwise specified, this chapter uses syntax and functionality of SQLite3. This chapter starts with some theory and examples. Then we link the lecture (as videos and PDF). After this chapter, move on to the next with exercises.

Introduction

This chapter deals with adding constraints to columns when creating a table. The constraints restricts the allowed values for the column, in order to prevent us from inserting junk data into the table. See the videos, read the PDF, check out the external links for further readin and head on to the exercises in the next chapter.

Foreign key constraints

SQlite syntax diagram

The purpose of foreign key constraints is to ensure referential integrity, which is to say that a foreign key in a table can only contain values that really exist in the referred table.

Consider the following tables:

CREATE TABLE artist(a_id integer primary key, a_name text);
CREATE TABLE track(t_id integer, t_name text, a_id);

The idea is to have the track table refer to the artist table by connecting rows on the a_id column, so that a track in the track table has a name and refers to an artist of the artist table using the a_id column.

For instance, we can have the following state of the database:

sqlite> SELECT t_name, a_name FROM track NATURAL JOIN artist;
t_name                a_name              
--------------------  --------------------
space oddity          bowie               
cortez the killer     neil young          

sqlite> SELECT * FROM artist;
a_id                  a_name              
--------------------  --------------------
1                     bowie               
2                     neil young          

sqlite> SELECT * FROM track;
t_id                  t_name                a_id      
--------------------  --------------------  ----------
1                     space oddity          1         
2                     cortez the killer     2         
sqlite>

As you can see, the track table uses an a_id for representing the track's artist, and that column obviously refers to the artist table, so that a track with a_id of 2, refers to "neil young" which is the row in artist with an a_id of 2.

But what happens if we INSERT a new row in track with an a_id of 3? There is no artist with such an a_id in the artist table. We will end up with an inconsistency. Referential integrity is lost.

The solution is to use a foreign key constraint which ensures that only such a_id values are allowed in the artist (referring) table that are present in the artist</table>. In other words, we don't want to allow a track with <code>a_id of 3, because there is no artist in the artist table that has a_id 3.

This is the syntax for the create table syntax for the track table with a foreign key constraint:

CREATE TABLE track(t_id INTEGER PRIMARY KEY, t_name TEXT, a_id REFERENCES ARTIST);

However, in SQLite3, foreign key constraints are turned off (not enforced) by default, so we also need to turn them on (for every connection or session to the database):

sqlite> PRAGMA FOREIGN_KEYS=ON;

Now, if we try to insert a track with an invalid a_id, we will get an error (which is what we want rather than creating an inconsistency):

sqlite> INSERT INTO track(t_name, a_id) VALUES('monkey stole my face', 3);
Error: FOREIGN KEY constraint failed

Foreign key constraints - advanced

In SQLite (and most other database mangement systems), there are ways to defer the enforcing of foreign key constraints.

Let's say we really want to add a track with name 'monkey stole my face' (which is a brilliant song, by the way) with an a_id of 3, before we add the artist with a_id 3. It is not a consistency as long as we remember to add the artist with a_id 3, we are just doing it in the "wrong" order.

We can use transactions in combination with another pragma for this. If we start a new transaction, set foreign keys to deferred and insert the track before the artist, this can be done. Deferring the foreign key constraints means to wait with the validation until we commit the transaction. Here's what it would look like:

sqlite> BEGIN TRANSACTION;
sqlite> PRAGMA DEFER_FOREIGN_KEYS=ON;
sqlite> INSERT INTO track(t_name, a_id) VALUES('monkey stole my face', 3);
sqlite> INSERT INTO artist(a_name) VALUES('mighty boosh');
sqlite> COMMIT;

If you want foreign keys to work like the above (being deferrable by default), they you could have created the track table as folows:

CREATE TABLE track(t_id INTEGER PRIMARY KEY, t_name TEXT, a_id REFERENCES artist DEFERRABLE INITIALLY DEFERRED);

Using deferrable foreign keys only works in a transaction. This is because doing two inserts as an atomic action requires a transaction.

Note: Deferring is an advance concept we include here for completeness. It's quite advanced for an introductory database course. Also, this section is specific to SQLite3. Look up the documentation if you are using a DBMS other than SQLite, in order to learn the rules and syntax for that DBMS.

Using a check constraint for validating dates in SQLite3

SQlite syntax diagram

SQLite3 is very liberal when it comes to types. In fact, you can even put text (strings) into a column of type INTEGER (if it is not a primary key). So what about dates in columns of type DATETIME? It pretty much also suffer from this loose typing. But you can use the CHECK function together with a date formatting function and the DATETIME function. This is what such a constraint looks like in SQLite3:

CREATE TABLE authors(
        authorid INTEGER PRIMARY KEY,
        name TEXT,
        born DATETIME
         check(
           born IS STRFTIME( '%Y-%m-%d', DATETIME(born) )
         )
);

The way that works, is that we specify a dateformat for our dates. The format %Y-%m-%d means YYYY-MM-DD as in for instance 2018-05-17. The function STRFTIME takes a format string and date as arguments. We use the DATETIME function takes a date string and tries to parse it to a date.

If we'd try to insert, e.g. "9/11/10", the constraint would do the following check:

  1. take 9/11/10 and convert it to a date (fails and returns null)
  2. convert the result of the above (NULL) to a string, using the format string '%Y-%m-%d' (fails - evaluates to NULL)
  3. compare the result to the string to be inserted "9/11/10" which also fails

The constraint let's us specify the exact input format of dates. Internally, the dates inserted (if valid according to our pattern) will be stored as dates, but only date strings compliant to our date format will be accepted. Another example:

We'll try to insert the date string '2018-05-17 07:30:00' (which is a string SQLite3 can parse into a date). However, the string doesn't match our specified date format (we didn't want the hour, minute and seconds part!). This will happen:

  1. Take "2018-05-17 07:30:00" and convert it do a date (works - we'll get the date 2018-05-17 07:30:00 back!)
  2. convert 2018-05-17 07:30:00 to a string, using our date format '%Y-%m-%d' - which returns '2018-05-17'
  3. compare the string '2018-05-17' to the original '2018-05-17 07:30:00' - which returns false

The check fails, since we got false back from the last step.

Force integer values to INTEGER columns

As we mentioned before, SQLite3 is very loosely typed and even accepts strings on columns whose type is INTEGER (if the column is not a primary key).

If we really want to only accept actual integer values in SQLite3 for columns whose type is INTEGER, we need a constraint also for this.

This is what such a constraint could look like:

CREATE TABLE example (
  column_name INTEGER CHECK(TYPEOF(column_name) = 'integer')
);

The result of trying to insert a string value to the column_name column would result in the following error:

sqlite> INSERT INTO example VALUES(3);
sqlite> INSERT INTO example VALUES('Some text');
Error: CHECK constraint failed: example

Certain text formats using GLOB

You can use the GLOB function (or operator) to create a constraint for checking that a text follows a GLOB pattern.

You can learn about the GLOB function and globbing in the exercises in the next chapter (follow the next-link at the end of this page).

Here's an example for a (not very complete) GLOB constraint for Swedish personal identification numbers:

CREATE TABLE person(id INTEGER PRIMARY KEY,
                    id_num TEXT CHECK( GLOB('[0-9][0-9][0-1][0-9][0-3][0-9]-[0-9][0-9][0-9][0-9]', id_num) ));

The above requires six numbers followed by a hyphen followed by four numbers. In addition, the third number can only be 0 or 1 (the first number of the month needs to be 0 or 1 - there are no months whose number start with 2 (23 is not a valid month), the fifth number must be between 0 and 3 (the day number cannot start with e.g. 4 - day 48 doesn't exist).

The expression isn't complete since it would accept e.g. 001339-1234 (month 13, day 39), but you get the picture.

The brackets is the syntax for expressing one single character and all the valid characters as a list. That means [abc] means "one character from the list 'a', 'b', 'c'". In order be more expressive, intervals of characters are allowed in the list (as long as they represent the orders of characters in the ascii table. So we could also say [a-c] to express the same thing. We can also combine intervals: [a-cf-g] meaning "one character from the list 'a', 'b', 'c', 'f', 'g'".

Naming a constraint for better error messages

Using the GLOB example above, we'd get an error message like this:

CREATE TABLE person(
  id INTEGER PRIMARY KEY,
  id_num TEXT check(GLOB('[0-9][0-9][0-1][0-9][0-3][0-9]-[0-9][0-9][0-9][0-9]', id_num)));

insert into person(id_num) values ('711144-0000');
Error: CHECK constraint failed: person

We could name the constraint for at better error message:

CREATE TABLE person(
  id INTEGER PRIMARY KEY,
  id_num TEXT
   constraint valid_id_num
  CHECK(GLOB('[0-9][0-9][0-1][0-9][0-3][0-9]-[0-9][0-9][0-9][0-9]', id_num))
);

INSERT INTO person(id_num) VALUES('711144-0000');
Error: CHECK constraint failed: valid_id_num

Not accepting NULL

If you insert data into a table and leave out a column and its value, the default behavior is that the row gets NULL as the value for the left out columns.

If you don't want to accept NULL for a column (meaning that the row always must have an actual value for a column), you just add a NOT NULL constraint:

sqlite> CREATE TABLE beer(beer_id INTEGER PRIMARY KEY, name TEXT NOT NULL);

sqlite> INSERT INTO beer(beer_id) VALUES(1);
Error: NOT NULL constraint failed: beer.name

Force a column to have unique values in the table

If you don't want to allow duplicate values for a column in your table, you can add a UNIQUE constraint. Note that columns that are PRIMARY KEY get this implicitly - primary keys are always unique in order to make sense.

sqlite> CREATE TABLE users(user_id INTEGER PRIMARY KEY,
                           email TEXT UNIQUE,
                           user_name TEXT UNIQUE NOT NULL);

sqlite> INSERT INTO users(email, user_name) VALUES('rikard@gu.se', 'xrazor');

sqlite> INSERT INTO users(email, user_name) VALUES('rikard@gu.se', 'xhesa'); --email must be UNIQUE
Error: UNIQUE constraint failed: users.email

sqlite> INSERT INTO users(email, user_name) VALUES('henrik@gu.se', 'xrazor'); --name must be UNIQUE
Error: UNIQUE constraint failed: users.user_name

-- What if we leave out user_name? Then it will be set to NULL 
sqlite> INSERT INTO users(email) VALUES('henrik@gu.se'); --email must be UNIQUE and NOT NULL
Error: NOT NULL constraint failed: users.user_name

Videos

Links

Further reading

Where to go next

Next page contains Exercises on constraints

« PreviousBook TOCNext »

Work in progress

Remove this section when the page is production ready.

TODO:

  • Record English version of the videos (medium priority)
  • Create PDF and video for the MariaDB triggers alternative (low priority) DONE!