{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "NOTE:\n",
    "-----\n",
    "\n",
    "Please run the below cells first before proceeding- you'll need them soon!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The sql extension is already loaded. To reload it, use:\n",
      "  %reload_ext sql\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "u'Connected: None@dataset_1.db'"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%load_ext sql\n",
    "%sql sqlite:///dataset_1.db"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Activity 3-2\n",
    "------------\n",
    "Aggregation operators, GROUP BY"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Exercise #1\n",
    "-----------\n",
    "\n",
    "Consider a set of tables that describe the up-and-coming bagel startup industry; for now let's just look at two tables here, `bagel`, which describes types of bagels made by the different bagel companies:\n",
    "> * name STRING\n",
    "> * price FLOAT\n",
    "> * made_by STRING\n",
    "\n",
    "And `purchase`:\n",
    "> * bagel_name STRING\n",
    "> * franchise STRING\n",
    "> * date INT\n",
    "> * quantity INT\n",
    "> * purchaser_age INT\n",
    "\n",
    "Where `purchase.bagel_name` references `bagel.name` and `purchase.franchise` references `bagel.made_by`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>name</th>\n",
       "        <th>price</th>\n",
       "        <th>made_by</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Plain with shmear</td>\n",
       "        <td>1.99</td>\n",
       "        <td>Bobs Bagels</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Egg with shmear</td>\n",
       "        <td>2.39</td>\n",
       "        <td>Bobs Bagels</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>eBagel Drinkable Bagel</td>\n",
       "        <td>27.99</td>\n",
       "        <td>eBagel</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(u'Plain with shmear', 1.99, u'Bobs Bagels'),\n",
       " (u'Egg with shmear', 2.39, u'Bobs Bagels'),\n",
       " (u'eBagel Drinkable Bagel', 27.99, u'eBagel')]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql SELECT * FROM bagel LIMIT 3;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>bagel_name</th>\n",
       "        <th>franchise</th>\n",
       "        <th>date</th>\n",
       "        <th>quantity</th>\n",
       "        <th>purchaser_age</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Plain with shmear</td>\n",
       "        <td>Bobs Bagels</td>\n",
       "        <td>1</td>\n",
       "        <td>12</td>\n",
       "        <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Egg with shmear</td>\n",
       "        <td>Bobs Bagels</td>\n",
       "        <td>2</td>\n",
       "        <td>6</td>\n",
       "        <td>47</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Plain with shmear</td>\n",
       "        <td>BAGEL CORP</td>\n",
       "        <td>2</td>\n",
       "        <td>12</td>\n",
       "        <td>24</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(u'Plain with shmear', u'Bobs Bagels', 1, 12, 28),\n",
       " (u'Egg with shmear', u'Bobs Bagels', 2, 6, 47),\n",
       " (u'Plain with shmear', u'BAGEL CORP', 2, 12, 24)]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql SELECT * FROM purchase LIMIT 3;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Can you write a query to get the _total revenue_ for each bagel type **which had an average purchaser age over 18**?  Type your query below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>name</th>\n",
       "        <th>Revenue</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Egg with shmear</td>\n",
       "        <td>14.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Plain with shmear</td>\n",
       "        <td>84.51</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(u'Egg with shmear', 14.34), (u'Plain with shmear', 84.50999999999999)]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%sql\n",
    "SELECT b.name, SUM(p.quantity * b.price) AS Revenue\n",
    "FROM bagel b, purchase p\n",
    "WHERE b.name = p.bagel_name AND b.made_by = p.franchise\n",
    "GROUP BY b.name\n",
    "HAVING AVG(p.purchaser_age) > 18;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Exercise #2\n",
    "-----------\n",
    "\n",
    "Here we'll use a simplified version of the `precipitation_full` table, which just has _daily_ rainfall _in CA only_, and has the following schema:\n",
    "\n",
    "> * station_id\n",
    "> * day\n",
    "> * precipitation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>station_id</th>\n",
       "        <th>day</th>\n",
       "        <th>precipitation</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>16102</td>\n",
       "        <td>1</td>\n",
       "        <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>16102</td>\n",
       "        <td>4</td>\n",
       "        <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>16102</td>\n",
       "        <td>24</td>\n",
       "        <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>21201</td>\n",
       "        <td>1</td>\n",
       "        <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>21201</td>\n",
       "        <td>20</td>\n",
       "        <td>10</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(16102, 1, 10),\n",
       " (16102, 4, 10),\n",
       " (16102, 24, 30),\n",
       " (21201, 1, 0),\n",
       " (21201, 20, 10)]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql SELECT * FROM precipitation LIMIT 5;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We want to get station_ids which have average precipitations > 75.  Try doing this first as a nested query:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>station_id</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>88302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>250002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>335701</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>357302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>488301</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(88302,), (250002,), (335701,), (357302,), (488301,)]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%sql\n",
    "SELECT DISTINCT p.station_id\n",
    "FROM precipitation p \n",
    "WHERE (\n",
    "    SELECT AVG(precipitation) \n",
    "    FROM precipitation \n",
    "    WHERE station_id = p.station_id) > 75;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, try re-writing as a GROUP BY:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>station_id</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>88302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>250002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>335701</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>357302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>488301</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(88302,), (250002,), (335701,), (357302,), (488301,)]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%sql\n",
    "SELECT station_id\n",
    "FROM precipitation\n",
    "GROUP BY station_id\n",
    "HAVING AVG(precipitation) > 75;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now time it by using `%time` followed by single-line versions of your queries above (clunky, but will work) to see how they compare!\n",
    "\n",
    "**Note:** Yes, currently the answers are filled in below for convenience... but you should still try getting them on your own above!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Done.\n",
      "CPU times: user 41.9 ms, sys: 1.39 ms, total: 43.3 ms\n",
      "Wall time: 42.5 ms\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>station_id</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>88302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>250002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>335701</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>357302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>488301</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(88302,), (250002,), (335701,), (357302,), (488301,)]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%time %sql SELECT DISTINCT p.station_id FROM precipitation p WHERE (SELECT AVG(precipitation) FROM precipitation WHERE station_id = p.station_id) > 75;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Done.\n",
      "CPU times: user 3.06 ms, sys: 1.43 ms, total: 4.49 ms\n",
      "Wall time: 3.11 ms\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>station_id</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>88302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>250002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>335701</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>357302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>488301</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(88302,), (250002,), (335701,), (357302,), (488301,)]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%time %sql SELECT p.station_id FROM precipitation p GROUP BY p.station_id HAVING AVG(p.precipitation) > 75;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**An ~ 10-20x difference in execution time!!**"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}