Database:Exercise - Adding constraints to prevent garbage data

From Juneday education
Jump to: navigation, search

Work in progress

Remove this section when the page is production-ready.

Exercises for Constraints

Notes on typography

Commands are written in a fixed width font and commands issued in a terminal in your computer shell are prefixed with a $.

Commands issued inside the SQLite3 interactive shell are prefixed with sqlite>

Example: List the files in the current directory using the command ls:

$ ls

Issue the .tables command inside the SQLite3 interactive shell:

sqlite> .tables

Purpose of this exercise

We will learn how to put constraints on columns when creating a table, in order to make sure that only acceptable values may be inserted (or updated) for rows in the table. There are not so many different constraints one may use with SQLite3, so we are going to focus on a few useful ones.

Task 1 - FOREIGN KEY constraint

First off, create a directory for this exercise. You may call it Ex09 or something similar. Use cd to enter this new directory and start working there.

As we learned in the lecture and exercise about JOIN (connecting tables), we can have columns referencing columns in a different table. As an example we had a books table which referenced the publisherid of a publishers table. But what if we insert a book with a publisherid that doesn’t exist in the publishers table? We would end up having inconsistent data.

The solution to this problem is to add a "foreign key" which limits the valid values for the publisherid columns to only such values that are also values already in the publisher table for the related column there. Or put differently, the books table would now only accept valid IDs for the publisherid column - such values which are publisherid values in the publisher table.

Let’s start with the publishers table. Open an interactive session with SQLite3 for a new database called my_books. Create the publishers table with the following columns:

  • publisherid INTEGER PRIMARY KEY
  • name TEXT

Expand using link to the right to see the syntax.

sqlite> CREATE TABLE publishers
        (publisherid INTEGER PRIMARY KEY, name TEXT);

Next, create the books table and add a foreign key which references the publisherid of the publishers table. You can find the syntax here, here and here. There are also examples in the video lecture and slides.

Expand using link to the right to see the syntax.

sqlite> CREATE TABLE books
         (author TEXT,
          title TEXT,
          isbn TEXT PRIMARY KEY,
          publisherid INTEGER,
                      FOREIGN KEY(publisherid)
                      REFERENCES publishers(publisherid)
        );

Let’s add some publishers! Add one publisher called "Bonnier", then one called "Books R us".

Expand using link to the right to see the syntax.

sqlite> INSERT INTO publishers(name) VALUES('Bonnier');
sqlite> INSERT INTO publishers(name) VALUES('Books R us');

Now, before we can use the foreign key constraint support, we need to enable it for this session with SQLite3, using the PRAGMA FOREIGN_KEYS=TRUE; statement.

Expand using link to the right to see the syntax.

sqlite> PRAGMA FOREIGN_KEYS=TRUE;

You didn't see that one coming, eh?

Next, we can try to enter a book with an invalid publisherid (an ID missing from the publishers table). Insert a book with the following values:

  • author should be 'Antonio Venitiana del Rabina'
  • title should be 'Grand Grimoire'
  • isbn should be '0-0-0-0-0-6'
  • publisherid should be 666 (which is invalid!)

Expand using link to the right to see the syntax.

sqlite> INSERT INTO books(author, title, isbn, publisherid)
             VALUES('Antonio Venitiana del Rabina',
                    'Grand Grimoire','0-0-0-0-0-6',
                    666);

What was the error message you got?

Expand using link to the right to see the answer.

Error: foreign key constraint failed

Let’s add some books with valid publisher ids as well, so that you feel comfortable working with foreign keys. First check what the valid publisherids are.

Expand using link to the right to see the syntax.

sqlite> SELECT * FROM publishers;
1|Bonnier
2|Books R us

So, we have 1 for Bonnier, and, 2 for Books R us. Let’s add some books!

Expand using link to the right to see the syntax.

sqlite> INSERT INTO books (author, title, isbn, publisherid) VALUES('Astral Dingren', 'Peppe Longsocks', '0-0-0-0-0-1', 1);
sqlite> INSERT INTO books (author, title, isbn, publisherid) VALUES('Hesa and Razor', 'Databases are fun', '0-0-0-0-0-2', 2);
sqlite> INSERT INTO books (author, title, isbn, publisherid) VALUES('Lennie Alanis', 'What the heck are constraints?', '0-0-0-0-0-3', 2);

We can confirm that the two tables are related, using a joining select statement, asking for all books with the publisher name looked up from the publishers table. You should try this, in order to practice SELECT with JOIN statements.

Expand using link to the right to see the syntax.

sqlite> .headers on        -- include the column names in the result
sqlite> .mode columns      -- align columns
sqlite> .width 15 30 13 10 -- column widths
sqlite> SELECT author, title, isbn, name AS 'Publisher'
          FROM books
  NATURAL JOIN publishers;
author           title                           isbn           Publisher 
---------------  ------------------------------  -------------  ----------
Astral Dingren   Peppe Longsocks                 0-0-0-0-0-1    Bonnier   
Hesa and Razor   Databases are fun               0-0-0-0-0-2    Books R us
Lennart Alanson  What the heck are constraints?  0-0-0-0-0-3    Books R us

Or using the alternative syntax for joins:

sqlite> SELECT b.author, b.title, b.isbn, p.name AS 'Publisher'
          FROM books b, publishers p
         WHERE b.publisherid = p.publisherid;
author           title                           isbn           Publisher 
---------------  ------------------------------  -------------  ----------
Astral Dingren   Peppe Longsocks                 0-0-0-0-0-1    Bonnier   
Hesa and Razor   Databases are fun               0-0-0-0-0-2    Books R us
Lennart Alanson  What the heck are constraints?  0-0-0-0-0-3    Books R us

Task 2 - Valid and consistent dates

Choosing TEXT as the type for dates is not a good idea. The TEXT data type will accept all kinds of strings representing a date or something completely different. Therefore, for dates, it would be great if we used a constraint that would accept only dates in a format provided by us.

For checking dates, we can use the CHECK function as a constraint, and check that the input value for a date, when passed to some date functions, fulfills the format of our choice.

Let’s start by looking at some date formats:

sqlite> .headers on
sqlite> .mode column
sqlite> SELECT DATETIME('2016-01-05');

DATETIME('2016-01-05')        
------------------------------
2016-01-05 00:00:00

sqlite> SELECT DATETIME('Måndag 3e december 2017');

DATETIME('Måndag 3e december 2017')
------------------------------

The first string, ‘2016-01-05’, is accepted by the datetime function, which returns a slightly longer representation of that same date. The second one, however, is not accepted and nothing is returned.

Can we format the longer date string to a date string of format YYYY-MM-DD? Yes, we can, using the strftime function:

sqlite> SELECT STRFTIME('%Y-%m-%d', '2016-01-05 00:00:00');

STRFTIME('%Y-%m-%d', '2016-01--05 00:00:00')
------------------------------
2016-01-05

OK, good. What if we combine the two functions?

sqlite> SELECT STRFTIME('%Y-%m-%d',DATETIME('2016-01-05'));

STRFTIME('%Y-%m-%d',DATETIME('2016-01-05'))
------------------------------
2016-01-05

That’s good! We get the same string back, which we can use in a check constraint. We want to check that any given string, when passed first to the datetime function, and then to the strftime function with the format '%Y-%m-%d' results in the same string. This way we can create a check constraint that checks if an input string conforms with a date of that format (YYYY-MM-DD is produced by the formatting string %Y-%m-%d).

Let’s run some tests interactively, before we start!

sqlite> SELECT '2016-01-05' IS
                STRFTIME('%Y-%m-%d', DATETIME('2016-01-05'));

'2016-01-05' IS STRFTIME('%Y-%m-%d', DATETIME('2016-01-05'))
---------------------------------------------------------------
1
sqlite>

The interactive shell answers with 1 (which means true!).

We can also try it with the .mode line directive:

sqlite> .mode line   -- turn off column mode
sqlite> SELECT '2016-01-05' IS
                STRFTIME('%Y-%m-%d', DATETIME('2016-01-05'));

'2016-01-05' IS STRFTIME('%Y-%m-%d', DATETIME('2016-01-05')) = 1
---------------------------------------------------------------

The interactive shell again answers with 1 (which means true!).

sqlite> SELECT '1/5/2016' IS STRFTIME('%Y-%m-%d', DATETIME('2016-01-05'));

'1/5/2016' IS STRFTIME('%Y-%m-%d', DATETIME('2016-01-05')) = 0

Now the shell answers with 0 (false). Which is what we wanted, since '1/5/2016' is not formatted as we specified '%Y-%m-%d'!

Now it’s time to create a table with a date column that checks that all input strings for the date column conform to our format. Use the checkconstraint which is described here together with the syntax above.

The aim is for you to create a table with a DATETIME column with a check constraint which uses the syntax above. We'll give you the syntax once, since this is rather complicated to write the first time:

sqlite> CREATE TABLE authors(
        authorID INTEGER PRIMARY KEY,
        name TEXT,
        born DATETIME
         CHECK(
           born IS STRFTIME( '%Y-%m-%d', DATETIME(Born) )
         )
        );

Does it work? There’s only one way to know, testing it yourself! Insert three authors with different formats for the born date:

sqlite> INSERT INTO authors(name, born) VALUES('Oscar Wilde', '1854-10-16');
sqlite> INSERT INTO authors(name, born) VALUES('Sture Dahlström', '1922-12-28');
sqlite> INSERT INTO authors(name, born) VALUES('Arne Anka', '1922/12/28');

One of the INSERT statements will fail with an error message. What is the error message?

Expand using link to the right to see the answer.

Error: CHECK constraint failed: authors

What INSERT statement failed and why?

Expand using link to the right to see the answer.

The INSERT with a date of '1922/12/28' failed, because its date format does not conform with our mandated date format.

Practice writing check constraints for different date formats. You will need to know how to do this for one of the assignments.

Note: SQLite uses ISO 8601 date format, which means you can only use formats compliant with ISO 8601. Here are some example formats:

  • 2018-11-02 - which is a date
  • 2018-11-02T13:31:08+00:00 - which is a UTC-datetime
  • 2018-11-02T13:31:08Z - which is another UTC-datetime format
  • 2018-W44 - which is 2018, week number 44

For the corresponding format codes, see sqlite manual - date functions.

Task 3 - Enumerations (kind of)

Information
MySQL/MariaDB does have an enum type. See here for an introduction.

SQLite3 does not have an Enumeration type, where you may list enumerated values. To enforce a constraint to allow only one of an enumerated list of values, you can use two methods. The first method is to use a second table for the enumerated values, and use a foreign key to restrict values to be one of the values in that table. Let’s try that for an enumeration of price types.

Create a table price_types with only one column named pricetype of type TEXT and modifier PRIMARY KEY. Insert four rows with the values

  • 'Low price'
  • 'Nice price'
  • 'Mid price'
  • 'High price'

Note the the column referred to by a foreign key must be UNIQ (for instance primary key). For this reason, the pricetype column of the price_types table should be declared as the primary key for the table.

Also select all rows from that table to confirm it worked.

Expand using link to the right to see the syntax.

sqlite> CREATE TABLE price_types(pricetype Text PRIMARY KEY);
sqlite> INSERT INTO price_types VALUES('Low price'), ('Nice price'), ('Mid price'), ('High price');
sqlite> select * from price_types;
Low price
Nice price
Mid price
High price

(Note that inserting many rows like in the insert statement above, requries SQLite3 of version 3.7.11 or later - if you have an older version, break it up to four distinct insert statements)

Now, we can create a books2 table, that has a column pricetype with a constraint to only allow values that come from the price_types table. For simplicity, the books2 table only has the Title (TEXT) and pricetype (TEXT) columns, so that we may focus on the pricetype constraint. The constraint for the pricetype column should be a foreign key referencing the price_types table. Review task 1 if you don't remember the syntax for foreign keys.

Expand using link to the right to see the syntax.

sqlite> CREATE TABLE books2(title TEXT, pricetype TEXT, FOREIGN KEY(pricetype) REFERENCES price_types(pricetype));

Next insert one book with the values 'The C Programming Language', 'Nice price', which should work fine, and one book with the values 'Programming Language Technology', 'Outrageous price', which should fail because of the illegal pricetype.

Expand using link to the right to see the syntax.

sqlite> PRAGMA FOREIGN_KEYS=TRUE;
sqlite> INSERT INTO books2
 VALUES('The C Programming Language', 'Nice price');
sqlite> INSERT INTO books2
 VALUES('Programming Language Technology', 'Outrageous price');
Error: FOREIGN KEY constraint failed
sqlite>

The advantage of this method of using a second table for enumerations, is that you may add new values dynamically later on, and the constraint will still work as expected. Add a row to the price_types table with the value 'Outrageous price'. Then try to insert the book that earlier failed from the test above.

Expand using link to the right to see the syntax.

sqlite> INSERT INTO price_types VALUES('Outrageous price');
sqlite> INSERT INTO books2
 VALUES('Programming Language Technology', 'Outrageous price');
sqlite>

The second method we may use as an alternative to Enumerations, is to create a check constraint, that checks that a value is in a named list of permitted values. The drawback is that it makes it harder to add new permitted values later on. The syntax is using the IN conditional operator. Here's the syntax but you should do this in order to try it out.

sqlite> CREATE TABLE books3(title TEXT,
                            pricetype TEXT
                            CHECK(pricetype IN ('Low price','Nice price','Mid price','High price')));

Try inserting the same two books from above into books3.

Expand using link to the right to see the syntax.

sqlite> INSERT INTO books3
 VALUES('The C Programming Language', 'Nice price');
sqlite> INSERT INTO books3
 VALUES('Programming Language Technology', 'Outrageous price');
Error: CHECK constraint failed: books3

Hint: see Simulating enum types in SQLite as a form of constraint slides for some examples.

Task 4 - NOT NULL constraint

Sometimes it is handy not to allow NULL values in a table, such as when a row makes no sense if a column value is missing. An obvious example would be primary keys. To set a constraint for a column in order to disallow NULL values (values left out) you use the NOT NULL constraint. Just add NOT NULL after the column name and type in the CREATE TABLE statement.

Create a table highscores with the following columns:

  • player TEXT NOT NULL
  • score INTEGER NOT NULL

Expand using link to the right to see the syntax.

sqlite> CREATE TABLE highscores(player TEXT NOT NULL, score INTEGER NOT NULL);

Try to insert a row into higscores with the values NULL and 88. Did it work? And then try with the values 'Adam' and NULL. Did that work?

Expand using link to the right to see the syntax.

sqlite> INSERT INTO highscores VALUES(NULL,88);
Error: NOT NULL constraint failed: highscores.player
sqlite> INSERT INTO highscores VALUES('Adam',NULL);
Error: NOT NULL constraint failed: highscores.score

Task 5 - UNIQUE constraint

Often, one wants some column to only contain unique values, such as for the primary key. There is a way to create a constraint for only allowing unique values, also for non-primary key columns. Just add the constraint UNIQUE to the column in the CREATE TABLE statement.

Create a table persons with the following columns:

  • name TEXT NOT NULL
  • personal_identity_number TEXT UNIQUE NOT NULL

Expand using link to the right to see the syntax.

sqlite> CREATE TABLE persons
(name TEXT NOT NULL,
 personal_identity_number TEXT UNIQUE NOT NULL);

Try to insert two persons into persons with different name but the same personal_identity_number. Did it work?

Expand using link to the right to see the syntax.

For instance this won't work:

sqlite> INSERT INTO persons VALUES('Kalle Anka', '340611-0010');
sqlite> INSERT INTO persons VALUES('Farmor Anka', '340611-0010');
Error: UNIQUE constraint failed: persons.personal_identity_number

Task 6 - Text matching using GLOB()

If we look at the personal identity number above, apart from all numbers being unique, we’d also want to limit numbers to be of the form NNNNNN-NNNN. We could actually accomplish that using a text matching function called GLOB().

Hint: This exercise is useful for solving handin 02 ;-)

The function GLOB takes a pattern and a string and returns 0 if the string doesn’t match the pattern and 1 if it does. A * works like a wildcard character and means "anything". A few examples:

sqlite> SELECT GLOB('a*', 'abc');
1
sqlite> SELECT GLOB('a*', 'bc');
0
sqlite> SELECT GLOB('a*', 'a');
1

You can list a set of characters using brackets, where for instance [0-9] means "one of the characters 0 1 2 3 4 5 6 7 8 9" and [a-z] means "any of the characters between 'a' and 'z' (in the ascii table)". A few examples:

sqlite> SELECT GLOB('[a-z]', 'a123');
0
sqlite> SELECT GLOB('[a-z]', 'a');
1
sqlite> SELECT GLOB('[a-z]*', 'a123');
1
sqlite> SELECT GLOB('[abc]*', 'a123');
1
sqlite> SELECT GLOB('[abc]*', 'b123');
1
sqlite> SELECT GLOB('[abc]*', 'c123');
1
sqlite> SELECT GLOB('[abc]*', 'd123');
0

So, if we want to check that a personal identification number is correct, we could use this knowledge forming a pattern matching our NNNNNN-NNNN pattern. Here's an example:

sqlite> SELECT
 glob('[0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]', 
      '711124-0000');
1

The answer 1 means TRUE as before.

Note that we could theoretically add even more logic to the check, to see that it is a valid date etc, but this is something that becomes a little extreme and quite complex. So we skip that part and leave it as a challenge for the reader.

If we are satisfied with our simpler check, we can proceed to create a constraint for a personal identification column in a table. Drop the old persons table and create a new one with the same fields as before (Name and PersonalIdentificationNumber), but this time, add a check constraint with a glob expression which ensures a valid PersonalIdentificationNumber using the pattern above.

Expand using link to the right to see the syntax.

sqlite> DROP TABLE IF EXISTS persons;
sqlite> CREATE TABLE persons(
   name TEXT,
   personal_identification_number TEXT PRIMARY KEY NOT NULL
    CHECK (
     GLOB('[0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]', 
     personal_identification_number)
    )
);

Now, try to insert a few names (of your free choice) but use the following values for personal_identification_number:

  • '111111-0000'
  • '111111-00a0'
  • '1234567-0000'
  • '123456-12345'
  • 'A23456-1234'

Which ones fail and why? Does any of the values work? Which? Why?

Expand using link to the right to see the syntax.

sqlite> INSERT INTO persons(name, personal_identification_number) VALUES('Kalle Anka', '111111-0000');
sqlite> INSERT INTO persons(name, personal_identification_number) VALUES('Kalle Anka', '111111-00a0');
Error: CHECK constraint failed: persons
sqlite> INSERT INTO persons(name, personal_identification_number) VALUES('Kalle Anka', '1234567-0000');
Error: CHECK constraint failed: persons
sqlite> INSERT INTO persons(name, personal_identification_number) VALUES('Kalle Anka', '123456-12345');
Error: CHECK constraint failed: persons
sqlite> INSERT INTO persons(name, personal_identification_number) VALUES('Kalle Anka', 'A23456-1234');
Error: CHECK constraint failed: persons

Only the first one works, because it is the only one matching the pattern of the glob expression '[0-9][0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]'.

Check your progress

If you want to check your progress, here's a challenge for you. You'll find the answer on a different page (link below) in order to keep you from peeking at the suggested solution.

Check 1

Create a table user with the following columns:

  • user_id (INTEGER and PRIMARY KEY)
  • email (TEXT and a GLOB constraint for a plausible email address, and also only unique values)
  • user_name (should be unique and not null)

The email GLOB constraint should check that the emails inserted

  • starts with a lower case letter or a number before the @
  • has at least one @ (if we could check that it only had exactly one @ that'd be great but we can't using globbing with one single expression)
  • starts with a lower case letter or a number after the @
  • has at least one . after the @
  • doesn't end with a .

The following inserts should fail:

sqlite> INSERT INTO users(email, user_name) VALUES('rikard@gu.se.', 'xrazor'); -- ends with a dot
Error: CHECK constraint failed: valid_email
sqlite> INSERT INTO users(email, user_name) VALUES('rikardgu.se.', 'xrazor');  -- ends with a dot and no @
Error: CHECK constraint failed: valid_email
sqlite> INSERT INTO users(email, user_name) VALUES('rikardgu.se', 'xrazor');   -- no @
Error: CHECK constraint failed: valid_email
sqlite> INSERT INTO users(email, user_name) VALUES('-rikard@gu.se', 'xrazor'); -- doesn't start with a number or letter
Error: CHECK constraint failed: valid_email
sqlite> INSERT INTO users(email, user_name) VALUES('rikard@guse', 'xrazor');   -- no dot after the @
Error: CHECK constraint failed: valid_email

But the following should be accepted:

sqlite> INSERT INTO users(email, user_name) VALUES('rikard@gu.se', 'xrazor');
sqlite> INSERT INTO users(email, user_name) VALUES('1rikard@gu.se', '1razor');
sqlite> INSERT INTO users(email, user_name) VALUES('1rikard@127.0.0.1', '2razor');
sqlite> INSERT INTO users(email, user_name) VALUES('rikard1@gu.se', 'xrazor1');

Check 2 challenge

Drop the table from Check 1 and create a new table which uses two GLOB expressions in the constraint (joined with e.g. AND) in order to prevent emails with more than one @. You can use the expression from the previous question and add an add an additional check (using AND between the two) for the rule of no more than one @.

Hint: How would you check that there are indeed at least two @? Use that and twist it around.

Suggested solutions to check-your-progress questions

SQL_constraints_Check_Answers

Links

MariaDB/MySQL syntax

For those readers/students who wishes to learn a little about how MariaDB/MySQL handles constraints (optional reading!), here's a few hints:

Source code etc

  • TODO

Further reading

Where to go next

Next page is on the topic of scraping data from the web.

« PreviousBook TOCNext »