Activity 5-1
------------

## Section 2: FDs & Closures

Recall that given a set of attributes $\{A_1, \dots, A_n\}$ and a set of FDs $\Gamma$

The closure, denoted $\{A_1, \dots, A_n\}^+$, is defined to be the largest set of attributes B s.t. $$A_1,\dots,A_n \rightarrow B \text{ using } \Gamma.$$

We've built some functions to compute the closure of a set of attributes and other such operations (_feel free to look at the code- it's pretty simple and clean, if we say so ourselves..._):

In [1]:
from closure import compute_closure

### Exercise 1

Consider a schema with attributes $X=\{A,B,C,D,E,F,G,H\}$.

In this exercise, you are given a set of attributes $A\subset X$ and a set of FDs $F$. Find **one FD** to add to $F$ so that the closure $A^+=X$

**Note: you can add FDs to the below set $F$ using e.g. `F.append((set([...]), set([...])))` and then check how you're doing using the `compute_closure` function from above!**

(As we'll find out immediately after this activity, this equivalent to saying: _Find one FD to add such that $A$ becomes a superkey for $X$_)

In [2]:
A = set(['A', 'B','F'])
F = [(set(['A', 'C']), 'D'),
 (set(['D','H', 'G']), 'E'),
 (set(['A', 'B']), 'G'),
 (set(['F', 'B', 'G']), 'C')]

In [3]:
compute_closure(A,F)

{'A', 'B', 'C', 'D', 'F', 'G'}

#### Solution:

In [4]:
F.append((set(['A']),'H'))

In [5]:
compute_closure(A, F, verbose=True)

Using FD A,B -> G
	 Updated x to {A,G,B,F}
Using FD G,B,F -> C
	 Updated x to {G,A,B,F,C}
Using FD A -> H
	 Updated x to {H,G,A,B,F,C}
Using FD A,C -> D
	 Updated x to {H,G,D,A,B,F,C}
Using FD D,H,G -> E
	 Updated x to {H,G,D,A,B,F,E,C}


{'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'}

## Section 3: Superkeys & Keys

Next, we'll add some new functions now for finding superkeys and keys. Recall:
* A _superkey_ for a relation $R(B_1,\dots,B_m)$ is a set of attributes $\{A_1,\dots,A_n\}$ s.t.
$$ \{A_1,\dots,A_n\} \rightarrow B_{j} \text{ for all } j=1,\dots m$$
* A _key_ is a minimal (setwise) _superkey_

The algorithm to determine if a set of attributes $A$ is a superkey for $X$ is actually very simple- we just see if the $A^+=X$:

In [6]:
def is_superkey_for(A, X, fds, verbose=False): 
 return X.issubset(compute_closure(A, fds, verbose=verbose))

Then, to check if $A$ is a key for $X$, we just confirm that:
* (a) it is a superkey
* (b) there are no smaller superkeys (_Note that we only need to check for superkeys of one size smaller- think about why!_)

In [7]:
import itertools
def is_key_for(A, X, fds, verbose=False):
 subsets = set(itertools.combinations(A, len(A)-1))
 return is_superkey_for(A, X, fds) and \
 all([not is_superkey_for(set(SA), X, fds) for SA in subsets])

### Exercise 1

Given the schema $R=\{A,B,C\}$, define a set of FDs such that there are two- _and only two_- keys, and check using the above functions!

In [8]:
R = set(['A','B','C'])

#### Solution:

In [10]:
F = [(set(['A','B']), 'C'),
 (set(['B','C']), 'A')]

# AB & BC are keys, but not AC
print (is_key_for(set(['A','B']), R, F))
print (is_key_for(set(['C','B']), R, F))
print (is_key_for(set(['A','C']), R, F))

True
True
False


### Exercise 2

Now, given the below relation $R$, and the above tools, define a set of FDs to result in the most keys possible. How many keys can you make? Largest number wins it all!

_Bonus question: how many different sets of FDs will result in this maximum number of keys?_

In [11]:
R = set(['A','B','C','D','E'])

#### Solution:

In [12]:
F = [(set(['A','B']), set(['C','D','E'])),
 (set(['A','C']), set(['B','D','E'])),
 (set(['A','D']), set(['C','B','E'])),
 (set(['A','E']), set(['C','D','B'])),
 (set(['B','C']), set(['A','D','E'])),
 (set(['B','D']), set(['A','C','E'])),
 (set(['B','E']), set(['A','D','C'])),
 (set(['C','D']), set(['A','B','E'])),
 (set(['C','E']), set(['A','B','D'])),
 (set(['D','E']), set(['A','B','C']))]
 
print (is_key_for(set(['A','B']), R, F))
print (is_key_for(set(['A','C']), R, F))
print (is_key_for(set(['A','D']), R, F))
print (is_key_for(set(['A','E']), R, F))
print (is_key_for(set(['B','C']), R, F))
print (is_key_for(set(['B','D']), R, F))
print (is_key_for(set(['B','E']), R, F))
print (is_key_for(set(['C','D']), R, F))
print (is_key_for(set(['C','E']), R, F))
print (is_key_for(set(['D','E']), R, F))

True
True
True
True
True
True
True
True
True
True
