{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Activity 5-1\n",
    "------------\n",
    "\n",
    "## Section 2: FDs & Closures\n",
    "\n",
    "Recall that given a set of attributes  $\\{A_1, \\dots, A_n\\}$ and a set of FDs $\\Gamma$\n",
    "\n",
    "The closure, denoted $\\{A_1, \\dots, A_n\\}^+$, is defined to be the largest set of attributes B s.t. $$A_1,\\dots,A_n \\rightarrow B \\text{ using } \\Gamma.$$\n",
    "\n",
    "We've built some functions to compute the closure of a set of attributes and other such operations (_feel free to look at the code- it's pretty simple and clean, if we say so ourselves..._):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from closure import compute_closure"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 1\n",
    "\n",
    "Consider a schema with attributes $X=\\{A,B,C,D,E,F,G,H\\}$.\n",
    "\n",
    "In this exercise, you are given a set of attributes $A\\subset X$ and a set of FDs $F$.  Find **one FD** to add to $F$ so that the closure $A^+=X$\n",
    "\n",
    "**Note: you can add FDs to the below set $F$ using e.g. `F.append((set([...]), set([...])))` and then check how you're doing using the `compute_closure` function from above!**\n",
    "\n",
    "(As we'll find out immediately after this activity, this equivalent to saying: _Find one FD to add such that $A$ becomes a superkey for $X$_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "A = set(['A', 'B','F'])\n",
    "F = [(set(['A', 'C']), 'D'),\n",
    "     (set(['D','H', 'G']), 'E'),\n",
    "     (set(['A', 'B']), 'G'),\n",
    "     (set(['F', 'B', 'G']), 'C')]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'A', 'B', 'C', 'D', 'F', 'G'}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "compute_closure(A,F)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Solution:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "F.append((set(['A']),'H'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using FD A,B -> G\n",
      "\t Updated x to {A,G,B,F}\n",
      "Using FD G,B,F -> C\n",
      "\t Updated x to {G,A,B,F,C}\n",
      "Using FD A -> H\n",
      "\t Updated x to {H,G,A,B,F,C}\n",
      "Using FD A,C -> D\n",
      "\t Updated x to {H,G,D,A,B,F,C}\n",
      "Using FD D,H,G -> E\n",
      "\t Updated x to {H,G,D,A,B,F,E,C}\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "compute_closure(A, F, verbose=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Section 3: Superkeys & Keys\n",
    "\n",
    "Next, we'll add some new functions now for finding superkeys and keys.  Recall:\n",
    "* A _superkey_ for a relation $R(B_1,\\dots,B_m)$ is a set of attributes $\\{A_1,\\dots,A_n\\}$ s.t.\n",
    "$$ \\{A_1,\\dots,A_n\\} \\rightarrow B_{j} \\text{ for all } j=1,\\dots m$$\n",
    "* A _key_ is a minimal (setwise) _superkey_\n",
    "\n",
    "The algorithm to determine if a set of attributes $A$ is a superkey for $X$ is actually very simple- we just see if the $A^+=X$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def is_superkey_for(A, X, fds, verbose=False): \n",
    "    return X.issubset(compute_closure(A, fds, verbose=verbose))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, to check if $A$ is a key for $X$, we just confirm that:\n",
    "* (a) it is a superkey\n",
    "* (b) there are no smaller superkeys (_Note that we only need to check for superkeys of one size smaller- think about why!_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import itertools\n",
    "def is_key_for(A, X, fds, verbose=False):\n",
    "    subsets = set(itertools.combinations(A, len(A)-1))\n",
    "    return is_superkey_for(A, X, fds) and \\\n",
    "        all([not is_superkey_for(set(SA), X, fds) for SA in subsets])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 1\n",
    "\n",
    "Given the schema $R=\\{A,B,C\\}$, define a set of FDs such that there are two- _and only two_- keys, and check using the above functions!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "R = set(['A','B','C'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Solution:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "True\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "F = [(set(['A','B']), 'C'),\n",
    "     (set(['B','C']), 'A')]\n",
    "\n",
    "# AB & BC are keys, but not AC\n",
    "print (is_key_for(set(['A','B']), R, F))\n",
    "print (is_key_for(set(['C','B']), R, F))\n",
    "print (is_key_for(set(['A','C']), R, F))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 2\n",
    "\n",
    "Now, given the below relation $R$, and the above tools, define a set of FDs to result in the most keys possible.  How many keys can you make?  Largest number wins it all!\n",
    "\n",
    "_Bonus question: how many different sets of FDs will result in this maximum number of keys?_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "R = set(['A','B','C','D','E'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "#### Solution:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "F = [(set(['A','B']), set(['C','D','E'])),\n",
    "     (set(['A','C']), set(['B','D','E'])),\n",
    "     (set(['A','D']), set(['C','B','E'])),\n",
    "     (set(['A','E']), set(['C','D','B'])),\n",
    "     (set(['B','C']), set(['A','D','E'])),\n",
    "     (set(['B','D']), set(['A','C','E'])),\n",
    "     (set(['B','E']), set(['A','D','C'])),\n",
    "     (set(['C','D']), set(['A','B','E'])),\n",
    "     (set(['C','E']), set(['A','B','D'])),\n",
    "     (set(['D','E']), set(['A','B','C']))]\n",
    "     \n",
    "print (is_key_for(set(['A','B']), R, F))\n",
    "print (is_key_for(set(['A','C']), R, F))\n",
    "print (is_key_for(set(['A','D']), R, F))\n",
    "print (is_key_for(set(['A','E']), R, F))\n",
    "print (is_key_for(set(['B','C']), R, F))\n",
    "print (is_key_for(set(['B','D']), R, F))\n",
    "print (is_key_for(set(['B','E']), R, F))\n",
    "print (is_key_for(set(['C','D']), R, F))\n",
    "print (is_key_for(set(['C','E']), R, F))\n",
    "print (is_key_for(set(['D','E']), R, F))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}