{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Activity 5-1\n", "------------\n", "\n", "## Section 2: FDs & Closures\n", "\n", "Recall that given a set of attributes $\\{A_1, \\dots, A_n\\}$ and a set of FDs $\\Gamma$\n", "\n", "The closure, denoted $\\{A_1, \\dots, A_n\\}^+$, is defined to be the largest set of attributes B s.t. $$A_1,\\dots,A_n \\rightarrow B \\text{ using } \\Gamma.$$\n", "\n", "We've built some functions to compute the closure of a set of attributes and other such operations (_feel free to look at the code- it's pretty simple and clean, if we say so ourselves..._):" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from closure import compute_closure" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1\n", "\n", "Consider a schema with attributes $X=\\{A,B,C,D,E,F,G,H\\}$.\n", "\n", "In this exercise, you are given a set of attributes $A\\subset X$ and a set of FDs $F$. Find **one FD** to add to $F$ so that the closure $A^+=X$\n", "\n", "**Note: you can add FDs to the below set $F$ using e.g. `F.append((set([...]), set([...])))` and then check how you're doing using the `compute_closure` function from above!**\n", "\n", "(As we'll find out immediately after this activity, this equivalent to saying: _Find one FD to add such that $A$ becomes a superkey for $X$_)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "A = set(['A', 'B','F'])\n", "F = [(set(['A', 'C']), 'D'),\n", " (set(['D','H', 'G']), 'E'),\n", " (set(['A', 'B']), 'G'),\n", " (set(['F', 'B', 'G']), 'C')]" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'A', 'B', 'C', 'D', 'F', 'G'}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "compute_closure(A,F)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Solution:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "F.append((set(['A']),'H'))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using FD A,B -> G\n", "\t Updated x to {A,G,B,F}\n", "Using FD G,B,F -> C\n", "\t Updated x to {G,A,B,F,C}\n", "Using FD A -> H\n", "\t Updated x to {H,G,A,B,F,C}\n", "Using FD A,C -> D\n", "\t Updated x to {H,G,D,A,B,F,C}\n", "Using FD D,H,G -> E\n", "\t Updated x to {H,G,D,A,B,F,E,C}\n" ] }, { "data": { "text/plain": [ "{'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "compute_closure(A, F, verbose=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Section 3: Superkeys & Keys\n", "\n", "Next, we'll add some new functions now for finding superkeys and keys. Recall:\n", "* A _superkey_ for a relation $R(B_1,\\dots,B_m)$ is a set of attributes $\\{A_1,\\dots,A_n\\}$ s.t.\n", "$$ \\{A_1,\\dots,A_n\\} \\rightarrow B_{j} \\text{ for all } j=1,\\dots m$$\n", "* A _key_ is a minimal (setwise) _superkey_\n", "\n", "The algorithm to determine if a set of attributes $A$ is a superkey for $X$ is actually very simple- we just see if the $A^+=X$:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def is_superkey_for(A, X, fds, verbose=False): \n", " return X.issubset(compute_closure(A, fds, verbose=verbose))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, to check if $A$ is a key for $X$, we just confirm that:\n", "* (a) it is a superkey\n", "* (b) there are no smaller superkeys (_Note that we only need to check for superkeys of one size smaller- think about why!_)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import itertools\n", "def is_key_for(A, X, fds, verbose=False):\n", " subsets = set(itertools.combinations(A, len(A)-1))\n", " return is_superkey_for(A, X, fds) and \\\n", " all([not is_superkey_for(set(SA), X, fds) for SA in subsets])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1\n", "\n", "Given the schema $R=\\{A,B,C\\}$, define a set of FDs such that there are two- _and only two_- keys, and check using the above functions!" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "R = set(['A','B','C'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Solution:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "True\n", "False\n" ] } ], "source": [ "F = [(set(['A','B']), 'C'),\n", " (set(['B','C']), 'A')]\n", "\n", "# AB & BC are keys, but not AC\n", "print (is_key_for(set(['A','B']), R, F))\n", "print (is_key_for(set(['C','B']), R, F))\n", "print (is_key_for(set(['A','C']), R, F))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 2\n", "\n", "Now, given the below relation $R$, and the above tools, define a set of FDs to result in the most keys possible. How many keys can you make? Largest number wins it all!\n", "\n", "_Bonus question: how many different sets of FDs will result in this maximum number of keys?_" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "R = set(['A','B','C','D','E'])" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "#### Solution:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n", "True\n", "True\n", "True\n", "True\n", "True\n", "True\n", "True\n", "True\n", "True\n" ] } ], "source": [ "F = [(set(['A','B']), set(['C','D','E'])),\n", " (set(['A','C']), set(['B','D','E'])),\n", " (set(['A','D']), set(['C','B','E'])),\n", " (set(['A','E']), set(['C','D','B'])),\n", " (set(['B','C']), set(['A','D','E'])),\n", " (set(['B','D']), set(['A','C','E'])),\n", " (set(['B','E']), set(['A','D','C'])),\n", " (set(['C','D']), set(['A','B','E'])),\n", " (set(['C','E']), set(['A','B','D'])),\n", " (set(['D','E']), set(['A','B','C']))]\n", " \n", "print (is_key_for(set(['A','B']), R, F))\n", "print (is_key_for(set(['A','C']), R, F))\n", "print (is_key_for(set(['A','D']), R, F))\n", "print (is_key_for(set(['A','E']), R, F))\n", "print (is_key_for(set(['B','C']), R, F))\n", "print (is_key_for(set(['B','D']), R, F))\n", "print (is_key_for(set(['B','E']), R, F))\n", "print (is_key_for(set(['C','D']), R, F))\n", "print (is_key_for(set(['C','E']), R, F))\n", "print (is_key_for(set(['D','E']), R, F))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 1 }