CSCI 305: Introduction to Database Systems

Activity 4-4: Bulk-loading Data with SQLite, Data Integrity

In this exercises, we are going to practice with two issues. One is to create SQLite database with multiple tables from CSV files within SQLite3. The other is to have the database to check the data integrity.

1. Import CSV files directly into a SQLite database

In Homework 2, we learned how to write a Python program to import CSV files into a database and create tables corresponding to each of the CSV files. In this exercise, we will learn how to import CSV files directly into a SQLite database.

SQLite3 supports a set of meta-commands See this link for a complete list of meta-commands in SQLite3. The website calls them special commands or dot-commands. One of them is import that allows one to import data from CSV file to create tables. The following is an example of how to use such a command. Assume you have two CSV files test1.csv and test2.csv in your current Linux directory. Start SQLite3 at the Linux command line prompt.

sqlite3 test.db -- create a new db called 'test.db'
sqlite>.separator ,  -- specifies the comma ',' is the separator
sqlite>.import test1.csv table1
sqlite>.import test2.csv table2
sqlite>.schema
CREATE TABLE table1(
  "attribute1" TEXT,
  "attribute2" TEXT,
  "attribute3" TEXT
);
sqlite3>select * from table1;
......

The tables will be created as shown, with all fields (attributes) being as "TEXT." This is identical to what you would see if you create a database from CSV files using Python program. Typically you want to change the field type to what is best for the applications. You may also create other attributes as needed.

Please do the following as your exercise.

  1. Please download this Excel file that contains some NFL team information. This file has the rosters for three NFL teams, the Philadelphia Eagles, the New England Patriots, and the Oakland Raiders, as well as a complete list of NFL teams. In this part of the exercise, first, export from each of the sheet in the Excel file to a separate CSV file, give the CSV file some appropriate name, e.g., eagles.csv, patriots.csv, raiders.csv, and nfl-teams.csv. Then use the above described procedures to import the data into a SQLite database, naming the database nfl-data.db or something like that.
  2. Once importing successfully, please re-construct the tables such that the Number and Weight attributes in the team rosters are actually integer values, not text. Also specify the Number attribute as the primary key.
  3. Your next step is to combine the rosters from the three teams into one table and add a Team attribute for each of the players in the combined roster. Again you can use the re-construct method we learned earlier, i.e., rename the old table first, create a new table with desired attributes, then copy the records from the old table into the new table.
  4. Now that you have a complete nfl-data.db, try a few queries to convince yourself and your friend that the information in the database is correct.

2. Create triggers and check integrity of data

The nfl-data.db database we just created, or any database, needs to maintain its integrity. For example, we know the weight of an athlete, or any person, has to be in a reasonable range. A player in an NFL team has to belong to an existing, real NFL team. Any NFL team has to be associated with a real U.S. city. All these indicators are a part of the integrity of the data in a database. SQLite provides mechanisms (any SQL language would) to check the integrity of the data.

The basic flow of work is that you define triggers in database schema so that if certain conditions are met, these triggers will cause error messages to be printed. This is very similar to exception handling of try-catch-except in high level languages such as Python or Java. The following is an example.

Assume you created the Players table correctly from the first exercise where all NFL players are in one complete table. One can set a reasonable value for a weight of a player, e.g., the weight should meet the conditions of 150 ≤ weight ≤ 400. You can then set a trigger such that if the weight of a player is outside this range, an error message is printed and the operation aborts. Here operations can be a insertion or update. The way you set up such a trigger works the following way.

CREATE TRIGGER weight_trigger BEFORE UPDATE ON players
BEGIN
  SELECT
  CASE
  WHEN NEW.weight ≥ 400 or NEW.weight ≤ 150 THEN
  RAISE (
  ABORT,
  'Invalid value of weight'
  )
  END;
END;

This trigger specifies that when updating the weight for a player, if the value is greater than or equal 400 or less than or equal 150, then the update operation aborts and the error message will be printed. The name of the trigger is weight_trigger. This is associated with the table Players. You can read this tutorial for more details.

Create the above trigger in the Players table. Try a few updates with invalid weights to see if your database reacts properly.

Because we now have the team name as an attribute in the Players table. We can check the integrity of this piece of information through the enforcement of non-empty foreign key. Please read this tutorial about foreign key enforcement and specify the Team attribute in the Players table as a foreign key. Try a few updates with invalid team names to see if your database reacts properly.

References