CSCI 305: Introduction to Database Systems

Project 2: Implementing SFW in A High Level Language

Due Dates

Assigned Due
Fri 04/06 Mon 04/30

0. Objectives

After completing the project, students are expected to be able to

1. Overview

This is a team project of 2-3 students. Please let the instructor know if you'd prefer solo.

You will implement a simplified SFW clause in a database using a high level programming language such as Python. That is, when a user issues a database command such as select * from books where pubdate > 1990; you are to search through the set of data you mention and return the information as requested. In this case, all the books that are published after the year 1990.

A database system such as sqlite3 has its own user interface, or set of commands through which a user can interact with the system to accomplish tasks such as importing data, reorganizing data, imposing constraints, or querying the data to find useful information. These database functionality are implemented using some lower (compared to the database language) level programming languages such as C, Java, or Python. Database systems are typically implemented using a data structure such as B+ tree that supports efficient search and storage. In this project, you are given a implementation of a B+ and B tree. Your tasks are to understand how this version of B+ tree implementation works, then use it to realize the functionality you want, that is, to support a simplified SFW operation. While the given version of B+ tree is in Python, you can certainly search and use other versions of B+ tree in Python or other programming languages such as C or Java.

2. A Quick Review of B+ Tree and The Given Implementation of B+ Tree

A B+ tree is a N-ary search tree with a variable but often large number of children per node. A B+ tree has the following properties.

Here is a diagram from Wikipedia to illustrate the concept of a B+ tree.

B+ tree

Figure: A Sample B+ Tree

By Grundprinzip - Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=10758840

The above figure shows a simple B+ tree example linking the keys 1-7 to data values d1-d7. The linked list (red) allows rapid in-order traversal. This particular tree's branching factor is 4.

Because implementing data structures such as B+ tree is not the focus of our course, you are given a complete implementation of B+ tree in Python. The code can be downloaded from this link at the course website, or you can download it directly from the original website.

3. Your Tasks

Your overall task is to implement a simplified SFW SQL clause. That is, when a user issues an SFW command, your program should go through the data you have and return the datasets that meet the specified criteria. We say it will be simplified in the sense that we limit the functionality of SFW to have simple condition of one Boolean expression only. We also do not implement any nested features or aggregate functions in our base implementation. Some of these features can be implemented as a challenge. (See description later.)

Here are some specific points you'd want to pay attention.

While your implementation should work for any data, we provide a sample data set here. Download this CSV file that contains the top 100 books by Newsweek. The data has four columns, Rank, Title, Author, and Year. The meaning of these attributes are self-explained. The following are some sample queries that your program should support. Assume the table in the database is called books.

  1. select * from books where author='Ralph Ellison';
  2. select * from books where author!='Ralph Ellison';
  3. select title,author from books where year>1900;
  4. select title from books where author='Jane Austen';

3.1 Challenges

Your base implementation should support the queries listed above and the ones similar to these queries. You are asked to choose and implement a challenge task that goes beyond the base implementation. You and your partner can decide what to do. Pick something interesting and challenge, yet within your reach. We will leave a 10 percent of the project grade for this part. Here are some examples for your reference. Note that the difficult levels in these tasks vary widely.

4. Submission

You are asked to submit all program and data files in a single folder under your git repo tree. In addition to programs and data, submit a README file of a page or two describing your experiences, successes and challenges.

Submit everything through git. Make sure the submission is complete. The key is that when downloaded from git, the files should be self-contained that they run directly without requiring other files.


Last modified: Thu Mar 13 13:20:55 EDT 2018