Database:Assignment 2 - JOINs and Constraints

From Juneday education
Jump to: navigation, search

Assignment 2 - JOINs and constraints

This chapter makes most sense if you are using this book as the course literature in a database course. If you are using it for self-studies, you obviously must ignore the passages mentioning handing stuff in to the "teachers".

Goal of the assignment

The goal of this assignment is to verify that you are confident with the concepts from the chapters about JOIN and constraints. It also contains parts about using bash to work with your database from the command line. As with assignment 1, you will get information and knowledge from the text for the various tasks so you will hopefully learn some new stuff from this assignment too.

Files to be handed in to the teacher

You shall create a zip file with the exact name answer-handin02.zip . This is how you create it in bash:

  • Create a directory named exactly answer-handin02
  • The directory must contain the following files (created throughout the assignment):
    • name.txt (shall contain your complete name - you create this file manually)
    • license_number_constraint.txt (text file with the create talbe statement with constraints for valid LicenseNumber)
    • test_cases_license_number.txt (text file with your INSERT statements to verify that illegal LicenseNumber genererates Error: CHECK constraint failed)
    • silver_cars.txt (see below)
    • dodges.txt (see below)
    • red_mazdas.txt (see below)
    • select_statements.txt (see below)

Make sure you have the zip command installed in your bash environment. In order to zip the directory with all files (of course, this is something you do when you are done with all steps in the assignment!) in bash, you do this from the directory containing answer-handin02. See assignment 1 if you forgot the syntax for zipping the directory and all its contents. The file to hand in must be called answer-handin02.zip .

Hint: You cannot "be" in the directory you are zipping. You should be in the same directory where you created the answer-handin02 directory. When you are finished with a task, put the files you create in the task (from the list above) in this directory. Make sure every file is there (and has correct name) before you zip and hand in the zip file.

The Database and a short introduction

We will continue to work with the database many_cars and the table cars from the first assignment. The first thing we will do is to add a constraint to the column LicenseNumber so that we only allow LicenseNumber texts which are valid, i.e. three capital letters followed by one space followed by three numbers. A formal way to express that could be

[A-Z][A-Z][A-Z] [0-9][0-9][0-9].

The next step will be to create two additional tables, colors and make in order to "normalize" the cars table. Remember, normalizing a table was what we did when we moved columns from one self-contained table to tables of their own, in order to avoid repetition of values in many rows in the first table. When we are finished with the re-design of our database, the cars table will only contain the following columns: make_id, color_id and LicenseNumber. We will also take measures to keep the integrity of the tables (make sure the tables contain sane data).

Getting started with the assignment

Create a directory named handin2. This is where you will do all the work for this assignment. You will not submit this directory, it is only your project directory where you do the work.

Task 1 - Create the table cars with a constraint for LicenseNumber

We will start from scratch with a new database called many_cars, initially without any tables. Create a table called cars2 with only one column, LicenseNumber TEXT PRIMARY KEY, but we will also add a constraint for the license numbers. This table is only for testing and learning how to create this constraint! When we see that the constraint works as specified, we'll throw away this table and start over.

When you have set up a working test table cars2, you shall come up with test cases for your constraint (INSERT statements showing that the constraint works). This task will produce two text files to put in your answer-handin02 directory:

  • license_number_constraint.txt (where you shall paste the CREATE TABLE statement with the constraint)
  • test_cases_license_number.txt (where you should paste at least three INSERT statements with malformed license numbers)

The point of the files is to let the teachers verify that the CREATE TABLE creates a table with a working constraint for license numbers and that the SQLite program rejects the INSERT statements in your test cases. The teachers will, of course, also verify that the table accepts well-formed license numbers! You don't have to provide any working INSERT statements, however, the teachers will handle that part.

Now, lets get to work! In order to be able to create a constraint for the license numbers, we'll have to learn the function called GLOB (below shown as an "infix operator" - an operator which sits between two operands).

One way of understanding the concept of "globbing" is to use it in Bash! The examples below are just to get you familiar with globbing and you don't have to submit any bash code. Make a new directory for this part of the assignment so that you can practice globbing without interfering with your assignmnet code. You will now use some bash in order to learn how * works and a couple of useful operators.

The operators [], * and {}

We will create nine files in your test directory. The files will be empty but created with one single command:

$ touch file_{1..9}.txt   # {} is specific to bash and actually not part of globbing!
$ ls
file_1.txt  file_3.txt  file_5.txt  file_7.txt  file_9.txt
file_2.txt  file_4.txt  file_6.txt  file_8.txt

Now, lets learn about the globbing symbol *. A single * means "any number of any characters". The expression [0-9] means "one character which is a number between 0 and 9".

Let's use a combination of * and [0-9]:

$ ls *_[0-9].txt
#all files that have *_ followed by one digit followed by .txt:
file_1.txt  file_3.txt  file_5.txt  file_7.txt  file_9.txt
file_2.txt  file_4.txt  file_6.txt  file_8.txt

Next, create a new empty file:

$ touch file_99.txt

Let's list all the files which have only one number character after the underscore:

$ ls *_[0-9].txt  #the file_99.txt file will not be listed!
file_1.txt  file_3.txt  file_5.txt  file_7.txt  file_9.txt
file_2.txt  file_4.txt  file_6.txt  file_8.txt

As you see, the new file file_99.txt didn't match the globbing expression *_[0-9].txt, since it had more than one number between the underscore and the .txt part.

Now, let's use a globbing expression which will match file_99.txt but not the other files:

$ ls *_[0-9][0-9].txt
file_99.txt
# The file was listed, because it, and only it,
# had _ followed by two digits followed by .txt
# When we are done playing, remove the newly created text files.

The function GLOB in SQLite3 works with the same syntax for the expressions to match. The function returns 1 if the string operator matches the pattern, and 0 otherwise. In SQLite3 you may think of 1 as meaning TRUE and 0 as meaning FALSE.

Some examples in the interactive shell for SQLite3 (you must try this in order to learn about GLOB!):

sqlite> SELECT 'ABCabc012' GLOB '[A-Z]*';
1
-- any upper case letter followed by anything

sqlite> SELECT 'ABCabc012' GLOB '[A-Z][A-Z][A-Z]*';
1
-- any three upper case letters followed by anything

sqlite> SELECT 'ABCabc012' GLOB '[A-Z][A-Z][A-Z][A-Z]*';
0
-- No match, because string didn’t start with four upper case letters

sqlite> SELECT 'ABCabc012' GLOB '[A-Z][A-Z][A-Z]*[0-9]';
1
-- Matches three A-Z followed by anything followed by a number

sqlite> SELECT 'ABCabc012' GLOB '[A-Z][A-Z][A-Z][a-z][a-z][a-z][0-9][0-9][0-9]'; 
1
-- Matches the form of the string exactly

sqlite> SELECT 'A 0' GLOB '[A-Z] [0-9]';
1
-- Matches one upper case letter, a space, and one number.

Now you should be familiar with how to use GLOB as an infix operator to match "one or several capital letters", "a space", and, "one or several numbers". So it should be fairly simple now to create a slightly more complex pattern for the constraint for valid license numbers. Your task now is to create the test table cars2 which uses a GLOB expression for valid Swedish license numbers: "three capital letters between A and Z" followed by "a single space" followed by "three digits(numbers) between 0 and 9".

We'll give you the CREATE TABLE syntax but you'll have to figure out the GLOB part:

CREATE TABLE cars2(
  licensenumber TEXT PRIMARY KEY
                CHECK( licensenumber GLOB 'enter the glob pattern here!' )
);

If it's easier to read on one long line, here it is (but your browser will probably break the line anyway):

CREATE TABLE cars2( licensenumber TEXT PRIMARY KEY CHECK( licensenumber GLOB 'enter the glob pattern here!' ) );

It is that statement you shall save in the file license_number_constraint.txt to be included in the zip file.

Verify that the following INSERT statements fail:

sqlite> INSERT INTO cars2 (licensenumber) VALUES('abc 123');
Error: CHECK constraint failed: cars2
sqlite> INSERT INTO cars2 (licensenumber) VALUES('ABC123');
Error: CHECK constraint failed: cars2
sqlite> INSERT INTO cars2 (licensenumber) VALUES('ABC 12A');
Error: CHECK constraint failed: cars2
sqlite> INSERT INTO cars2 (licensenumber) VALUES('ABC 1234');
Error: CHECK constraint failed: cars2
sqlite> INSERT INTO cars2 (licensenumber) VALUES('ABCD 123');
Error: CHECK constraint failed: cars2

Now, you shall come up with at least three test cases of your own which will look like spelling mistakes for license numbers. Verify that they render the error message as shown above before you add them to the file test_cases_license_number.txt also to be included in the zip file.

Without creating any files, make sure that the table works for correctly formatted license numbers, e.g.:

sqlite> INSERT INTO cars2 (licensenumber) VALUES('ABC 123');
sqlite> INSERT INTO cars2 (licensenumber) VALUES('ZZZ 999');

You don't have to include such tests. They are mentioned here to give you a method to verify that your constraint isn't too narrow and accepts the format you are allowing!

Some tips for those who want to learn some more tricks

It is rather tiresome to write a lot of test cases manually in the interactive shell. What we will learn instead, is to create a text file with the name bad_licenses.txt with one malformed license number on each row. Then we'll test that every license number in the file is rejected thanks to our constraint, by creating INSERT statements from the file. This is just to show you some tesing techniques, so do not include this file in the zip file to be handed in! Here's the syntax for testing every license number from the file (in bash):

$ cat bad_licenses.txt|while read license;do echo "INSERT INTO cars2 VALUES('$license');";done | sqlite3 many_cars
Error: near line 1: CHECK constraint failed: cars2
Error: near line 2: CHECK constraint failed: cars2
Error: near line 3: CHECK constraint failed: cars2
Error: near line 4: CHECK constraint failed: cars2
Error: near line 5: CHECK constraint failed: cars2
Error: near line 6: CHECK constraint failed: cars2
Error: near line 7: CHECK constraint failed: cars2
Error: near line 8: CHECK constraint failed: cars2

Now, we might be interested to check how many errors we got. In order to do so, we'll redirect the standard error stream from SQLite to standard out using 2>&1 and pipe the result to wc -l (the expert in counting rows!):

$ cat bad_licenses.txt|while read license;do echo "INSERT INTO cars2 VALUES('$license');";done | sqlite3 many_cars 2>&1 | wc -l
8
$ wc -l bad_licenses.txt 
8 bad_licenses.txt

We confirmed that we got 8 errors from 8 rows in the file.

Explanation of the command line used above:

cat bad_licenses.txt       # print the contents of the file to standard out
| while read license;      # use the printout to read each line into the variable license 
do                         # loop:
echo "INSERT INTO cars2 VALUES('$license');";
# print the INSERT statement and use the value of the variable license inside the single quotes
done                       # end of the while-loop
| sqlite3 my_cars 2>&1     # send the result of the loop to
                           # SQLite3 for the database my_cars
                           # but let error printouts from SQLite3 go to standard out
                           # rather than standard error
| wc -l                    # pipe any error printouts to wc -l
                           # which will echo out the number or rows of errors

When you are done experimenting with this, make sure you have created the two text files for this part, put them in the directory to be zipped, and drop the table cars2. Use a search engine if you forgot the syntax for dropping a table.

Task 2 - Create the tables colors and makes

Note that in task 2, no files should be produced for inclusion in the zip file - you will merely create two tables and read data into them.

In this part, we'll prepare two tables with data, in order to be able to create a more normalized cars table than the one in assignment 1. We'll do this by changing the table cars to only contain these columns: make_id, color_id and LicenseNumber.

We move the colors and makes into tables of their own right, so that we don't have to repeat a lot of text strings for color and make in the cars table. Remember, normalizing was about consolidating data by moving some columns out to tables of their own and reference the tables instead.

So we need two new tables, colors and makes.

Create the table colors with the following columns:

  • color TEXT
  • color_id INTEGER PRIMARY KEY

Create the table makes with the following columns:

  • make TEXT
  • make_id INTEGER PRIMARY KEY

Expand using link to the right to get a sneak-peek of the create table syntax.

CREATE TABLE colors(color TEXT, color_id INTEGER PRIMARY KEY);
CREATE TABLE makes(make TEXT, make_id INTEGER PRIMARY KEY);

Download the files insert_colors.sql and insert_makes.sql and let SQLite3 run the statements in those files for the database many_cars.

Expand using link to the right to get a sneak-peek of the syntax.

$ sqlite3 many_cars < insert_colors.sql
$ sqlite3 many_cars < insert_makes.sql

Those statements will only work correctly if you have been successful in creating the tables as detailed above.

Log into the sqlite3 interactive shell for the database many_cars and verify that you have the colors and makes inserted in your tables:

sqlite> SELECT * FROM colors;
Blue|1
Red|2
Green|3
Yellow|4
Black|5
White|6
Brown|7
Silver|8
sqlite> SELECT * FROM makes;
Volvo|1
Mazda|2
Honda|3
Suzuki|4
Ford|5
Chevrolet|6
Saab|7
Simca|8
Dodge|9

If that is what you see, then this task is done. There is nothing to add to the zip file from this part.

Task 3 - Create the final version of the cars table

In this task, we'll create the cars table which you will use in task 4 below. Note that there are no files produced here in task 3 for inclusion in the zip file, but you must finish this task in order for the following tasks to work.

You should now create the cars table with two foreign keys, color_id and make_id and the column licensenumber with the constaint you created in a previous task. Here's the nearly complete CREATE TABLE statement, but or course, we'll hide the constraint from you, since that is part of a previous task!

sqlite> CREATE TABLE cars(color_id INTEGER,
                          make_id INTEGER, 
                          licensenumber TEXT PRIMARY KEY
                            CHECK(licensenumber GLOB 'your GLOB pattern here!'),
                          FOREIGN KEY(color_id) REFERENCES colors(color_id),
                          FOREIGN KEY(make_id) REFERENCES makes(make_id)
        );

Again, note that you must put the globbing pattern you created earlier between the single quotes of the GLOB expression above.

Task 4 - Populate the cars table with cars

Now that you have the tables colors, makes and cars, it's time for you to populate the cars table with rows of some cars, and see how you can make SELECT statements to get information about the cars, by joining the three tables.

You will want SQLite3 to enforce the foreign keys, so that you make sure that you don't enter rows referencing bad IDs in colors or makes. Do you remember the directive to SQLite3 for enabling foreign key checks?

Expand using link to the right to see the directive.

sqlite> PRAGMA FOREIGN_KEYS=ON;

Let's make sure that it works by trying to insert a car with a color id of 9 (which is illegal since there only exist 8 colors):

sqlite> INSERT INTO cars VALUES(9,1,'ABC 123');
Error: FOREIGN KEY constraint failed

Do the same, but this time check that the constraint for make_id works. Use 99 as the value for make_id.

Expand using link to the right to see the directive.

sqlite> INSERT INTO cars VALUES(1,99,'ABC 123');
Error: FOREIGN KEY constraint failed

Since there is no make_id 99 in the makes table, it should fail.

Now that you verified that the foreign key constraints work, it's finally time for some valid cars. Download insert_normalised_cars.sql and execute the statements in that file for the database many_cars.

Expand using link to the right to see how to create the directory and enter it.

$ sqlite3 many_cars < insert_normalised_cars.sql

Again, there are no files produced in this task to be included in the zip file. This task was only here for you to set up the tables and populate them with data so that you can do the following tasks where you will demonstrate that you know how to make SELECT statements joining the three tables together.

Task 5 - Make queries about cars

To refresh your knowledge about joining tables together, here's a SELECT statement which fetches all information about every car where the value for color is fetched from the colors table and the value for make is fetched from the makes table, using the color_id and make_id columns as the joining factors:

sqlite> SELECT make, color, licensenumber
        FROM cars c, colors co, makes m
        WHERE
          c.color_id = co.color_id
          AND
           c.make_id = m.make_id;

Alternative syntax using JOIN:

sqlite> SELECT make, color, licensenumber
        FROM cars
        JOIN makes, colors ON
           cars.make_id = makes.make_id
          AND
          cars.color_id = colors.color_id;

An alternative syntax is:

sqlite> SELECT make, color, licensenumber
        FROM cars
        JOIN makes ON
           cars.make_id = makes.make_id
        JOIN colors ON
          cars.color_id = colors.color_id;

An alternative syntax using NATURAL JOIN is:

sqlite> SELECT make, color, licensenumber
          FROM cars
  NATURAL JOIN colors
  NATURAL JOIN makes;

We recommend one of the versions using JOIN, because when you add conditions the WHERE clause quickly becomes complicated. The NATURAL JOIN is kind of neat, since it is certainly the shortest. It works because the columns we are joining on have the same name in the joined tables.

Now that you have refreshed the syntax for JOIN, your task is to produce four text files:

  • silver_cars.txt
  • dodges.txt
  • red_mazdas.txt
  • select_statements.txt

The file silver_cars.txt shall contain all the data on cars with the color Silver. Here's a hint to get you started:

.once silver_cars.txt
SELECT make, color, licensenumber -- you'll have to fill in the rest here - look at the examples above!

Hint: Do not use the id columns for your condition. You should use the text 'Silver', because it is clearer than a magic number representing Silver in the colors table. And what if the id for Silver changes in the future?

The file dodges.txt shall contain all the data on cars of make Dodge. Here's a hint to get you started:

.once dodges.txt
SELECT make, color, licensenumber -- you'll have to fill in the rest here - look at the examples above!

Hint: Do not use the id columns for your condition. You should use the text 'Dodge', because it is clearer than a magic number representing Dodge in the makes table. And what if the id for Dodge changes in the future?

The file red_mazdas.txt shall contain all the data on cars whose make is Mazda and whose color is Red. Hint:

.once red_mazdas.txt
SELECT make, color, licensenumber -- you'll have to fill in the rest here - look at the examples above!

Hint: Do not use the id columns for your condition. You should use the texts 'Mazda' and 'Red', because it is clearer than a magic number representing Mazda in the makes table and a magic number representing Red in the colors table. And what if the ids for Mazda and Red change in the future?


The file select_statements.txt shall contain all the SQL-statements you used to create the three files above. You don't need to include the .once directives.

Hint: Look carefully at the example JOINs above. Add suitable conditions to filter the results using a WHERE clause at the end of the statements.

Check list

Before submitting the zip file, verify that your directory answer-handin02 contains the following files with the exact same names:

  • name.txt
  • license_number_constraint.txt
  • test_cases_license_number.txt
  • silver_cars.txt
  • dodges.txt
  • red_mazdas.txt
  • select_statements.txt

Zip the directory and the resulting zip file should be named exactly answer-handin02.zip .

The reason we are so particular about the exact names of the files, is that we use a script to correct the submitted files. The script will only work if you follow our naming conventions. But a more important reason for requiring exact names, is to prepare you for very specific requirements. It is a skill which often is required when working with computers. If you have difficulties spelling names etc, you can use copy and paste for the file names. And double check with a friend before submitting.

The file name.txt will be used by our script in order to extract your name when correcting, and create a directory with your name. So it is very important that the file is called exactly name.txt and that it contains your name and nothing else.

Links

Source code etc

  • Download the files in one big zip here (via DownGit)

Where to go next

This is the last chapter of the database book! Well done!

« PreviousBook TOCNext »