{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "When we want to work with SQL easily in an Ipython notebook, we'll load the ipython-sql [extension](https://github.com/catherinedevlin/ipython-sql) as follows:\n", "\n", "**Note: DO NOT PANIC** \n", "* Don't worry if you get (a) a big red-highlighted warning or (b) a note that the extension has already been loaded! As long as your SQL commands work, it's loaded properly!\n", "\n", "* If you don't understand all the SQL queries... very soon, you will!" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Library/Python/2.7/site-packages/IPython/config.py:13: ShimWarning: The `IPython.config` package has been deprecated. You should import from traitlets.config instead.\n", " \"You should import from traitlets.config instead.\", ShimWarning)\n", "/Library/Python/2.7/site-packages/IPython/utils/traitlets.py:5: UserWarning: IPython.utils.traitlets has moved to a top-level traitlets package.\n", " warn(\"IPython.utils.traitlets has moved to a top-level traitlets package.\")\n" ] } ], "source": [ "%load_ext sql" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we'll load an SQLite database stored as a file as follows:\n", "\n", "**NOTE: We load a file below (here, \"dataset_1.db\", which must be in the same directory as the notebook. You'll use this file throughout the next few lectures and PS1. Make sure to download the from the webpage!!**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "ERROR:root:Line magic function `%sql` not found.\n" ] } ], "source": [ "%sql sqlite:///dataset_1.db" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets look at one table from the US National Oceanic and Atmospheric Administration (NOAA) Rainfall dataset- `precipitation_full`- having the following schema:\n", "\n", "> * `state_code`\n", "> * `station_id`\n", "> * `year`\n", "> * `month`\n", "> * `day`\n", "> * `hour`\n", "> * `precipitation`\n", "> * `flag_1`\n", "> * `flag_2`\n", "\n", "Each tuple in this table describes one hour of rainfall (`precipitation`- in hundredths of an inch) at one station (`station_id`) in one state (`state_code`). Note that tuples with `hour=25` record the total rainfall for that day, and that we can ignore the values of attributes `flag_1` and `flag_2` for now.\n", "\n", "Let's take a look at some sample rows of the table via SQL queries." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that you can also load an empty in-memory database using:\n", "\n", "``%sql sqlite://``\n", "\n", "Now let's try out some queries! (Don't worry, if you're new to SQL, we'll go over all the syntax in more depth next lecture)\n", "\n", "**`%sql` is used for single line SQL commands:**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%sql SELECT * FROM precipitation_full LIMIT 5;" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%sql SELECT COUNT(*) FROM precipitation_full;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And **`%%sql` is used for multi-line SQL commands:**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%sql\n", "SELECT SUM(p.precipitation) \n", "FROM precipitation_full p, states s \n", "WHERE p.state_code = s.code AND s.abbrev = 'CA';" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also work with the output of the queries we issue:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = %sql SELECT * FROM states;" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result.keys" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note that the first row is the header row with column names!*" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result[1]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result[1].name" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now feel free to have fun with the dataset- we'll see more of it later though!" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [ipykernel_py3]", "language": "python", "name": "Python [ipykernel_py3]" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 1 }