{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Activity 6-1\n", "------------\n", "\n", "The goal for this activity will be to compute some BCNF decompositions, using the tools from last lecture\n", "\n", "First we'll load those tools, and some sample data:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from closure import compute_closure, display_side_by_side, print_setup" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Connected: None@None'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%load_ext sql\n", "%sql sqlite://" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n", "Done.\n", "1 rows affected.\n", "1 rows affected.\n", "1 rows affected.\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql DROP TABLE IF EXISTS T;\n", "CREATE TABLE T(course VARCHAR, classroom INT, time INT);\n", "INSERT INTO T VALUES ('CS 364', 132, 900);\n", "INSERT INTO T VALUES ('CS 245', 140, 1000);\n", "INSERT INTO T VALUES ('EE 101', 210, 900);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1\n", "\n", "First, let's decompose `T` into BCNF! Explicitly go through the steps of the BCNF algorithm using the `compute_closure` function, then decompose the following table (i.e. by creating new SQL tables) into BCNF:\n", "\n", "We've also made a function, `display_side_by_side`, for nicer display!" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
courseclassroomtime
CS 364132900
CS 2451401000
EE 101210900
" ], "text/plain": [ "[('CS 364', 132, 900), ('CS 245', 140, 1000), ('EE 101', 210, 900)]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql SELECT * FROM T;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are given the following FDs:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Attributes = {time,classroom,course}\n", "FDs:\n", "\n", "=> \tcourse -> classroom\n", "\n", "=> \ttime,classroom -> course\n" ] } ], "source": [ "A = set(['course', 'classroom', 'time'])\n", "F = [('course', 'classroom'), (set(['classroom', 'time']), 'course')]\n", "print_setup(A, F)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** What real-world constraints do these FDs express?\n", "\n", "Now, use the `compute_closure` function to help decompose this table to BCNF:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'classroom', 'course'}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# We find an X s.t. X^+ != A and X^+ != X\n", "X = set(['course'])\n", "compute_closure(X, F)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compose into two tables, $T_1$ and $T_2$:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n", "Done.\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "DROP TABLE IF EXISTS T1;\n", "CREATE TABLE T1 AS SELECT DISTINCT * FROM (\n", " SELECT course, classroom FROM T\n", ");" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n", "Done.\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "DROP TABLE IF EXISTS T2;\n", "CREATE TABLE T2 AS SELECT DISTINCT * FROM (\n", " SELECT course, time FROM T\n", ");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now run the below to display the decomposed tables side-by-side:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n", "Done.\n" ] }, { "data": { "text/html": [ "
courseclassroom
CS 364132
CS 245140
EE 101210
coursetime
CS 364900
CS 2451000
EE 101900
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "l = %sql SELECT * FROM T1;\n", "r = %sql SELECT * FROM T2;\n", "display_side_by_side(l,r)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** Is this now in BCNF?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 2\n", "\n", "In the next section of lecture, we'll discuss a shortcoming of BCNF decompositions; let's see if we can get a glimpse of this now.\n", "\n", "See if you can insert rows into $T_1$ and/or $T_2$ _which respect the local FDs that still hold_, such that **when $T_1$ and $T_2$ are now recomposed, the original FDs do not hold!**" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 rows affected.\n", "1 rows affected.\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "INSERT INTO T1 VALUES ('CS 145', 132);\n", "INSERT INTO T2 VALUES ('CS 145', 900);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, reconstruct and print the re-composed table using a SQL query:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
courseclassroomtime
CS 364132900
CS 2451401000
EE 101210900
CS 145132900
" ], "text/plain": [ "[('CS 364', 132, 900),\n", " ('CS 245', 140, 1000),\n", " ('EE 101', 210, 900),\n", " ('CS 145', 132, 900)]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT T1.course, T1.classroom, T2.time\n", "FROM T1, T2\n", "WHERE T1.course = T2.course;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Q:** What went wrong?? And how could we prevent this from occuring?" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 1 }