{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Simulated Data Sets and a Demonstration of Central Limit Theorem" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Material to accompany coverage in Hughes and Hase.\n", "\n", "- Introductory section complements Section 3.5, and generates graphs like those in Figs.3.6, 3.7, 3.8.\n", "- Second section complements Chapter 8 on hypothesis testing using $\\chi^2$\n", "\n", "NOTE: In this notebook I use the module scipy.stats for all statistics functions, including generation of random numbers. There are other modules with some overlapping functionality, e.g., the regular python random module, and the scipy.random module, but I do not use them here. The scipy.stats module includes tools for a large number of distributions, it includes a large and growing set of statistical functions, and there is a unified class structure. (And namespace issues are minimized.) See https://docs.scipy.org/doc/scipy/reference/stats.html." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import scipy as sp\n", "from scipy import stats\n", "\n", "import matplotlib as mpl # As of July 2017 Bucknell computers use v. 2.x \n", "import matplotlib.pyplot as plt\n", "\n", "# Following is an Ipython magic command that puts figures in the notebook.\n", "# For figures in separate windows, comment out following line and uncomment\n", "# the next line\n", "# Must come before defaults are changed.\n", "%matplotlib notebook\n", "#%matplotlib\n", "\n", "# As of Aug. 2017 reverting to 1.x defaults.\n", "# In 2.x text.ustex requires dvipng, texlive-latex-extra, and texlive-fonts-recommended, \n", "# which don't seem to be universal\n", "# See https://stackoverflow.com/questions/38906356/error-running-matplotlib-in-latex-type1cm?\n", "mpl.style.use('classic')\n", " \n", "# M.L. modifications of matplotlib defaults using syntax of v.2.0 \n", "# More info at http://matplotlib.org/2.0.0/users/deflt_style_changes.html\n", "# Changes can also be put in matplotlibrc file, or effected using mpl.rcParams[]\n", "plt.rc('figure', figsize = (6, 4.5)) # Reduces overall size of figures\n", "plt.rc('axes', labelsize=16, titlesize=14)\n", "plt.rc('figure', autolayout = True) # Adjusts supblot parameters for new size" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Rolling of the dice" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consider rolling a fair 6-sided die many times and recording the results.\n", "This \"original\" distribution has a mean and variance given by \n", "- $\\mu = \\sum_{i=1}^6 p_i\\, i = \\sum_{i=1}^6 \\frac{1}{6}\\times i$\n", "- $\\sigma^2 = \\frac{1}{6}\\sum_{i=1}^6(i-\\mu)^2$" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(3.5, 1.707825127659933)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "i = sp.arange(1, 7, 1)\n", "mu = sum(i)/6\n", "sigma = sp.sqrt(sum((i - mu)**2)/6)\n", "mu, sigma" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following code simulates rolling the die 10 times by sampling random numbers betwen 1 and 6:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 5, 4, 5, 3, 5, 4, 2, 5, 3])" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "low, high, n = (1, 6, 10)\n", "sp.stats.randint.rvs(low, high+1, size=n)\n", "#sp.random.randint(low, high+1, n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the average of a sample of 10 numbers from this distribution is" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.8999999999999999" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sum(sp.stats.randint.rvs(low, high+1, size=n))/n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "or" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.7000000000000002" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sp.mean(sp.stats.randint.rvs(low, high+1, size=n))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The average of a fixed number of rolls is itself a random variable. The question is: What is the probability distribution for the average? Let's repeat the \"experiment\" 1000 times and look at the distribution of values, but this time let's let the number of rolls in each experiment be $n=400$.\n", "\n", "In the language of Hughes and Hase on p. 33 (right before the beginning of Section 3.5.1,\n", "the values of $x_1$, $x_2$, $x_3$, etc., are the results of individual rolls of the die, and $N$ corresponds the number of rolls that are going to be averaged (400) to get a sample mean. The number of \"experiments\" (nEx) is the number of times we calculate the mean to get the distribution of the sample means. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Simulation - \"unpythonic\" version\n", "n_ex = 1000\n", "n = 400\n", "sim_data = sp.array([]) # Create an empty array\n", "for i in range(n_ex):\n", " sim_data = sp.append(sim_data, sp.mean(sp.stats.randint.rvs(low, high+1, size=n)))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Simulation - more concise\n", "n_ex = 1000\n", "n = 400\n", "sim_data = sp.array([sp.mean(sp.random.randint(low, high+1, n)) for i in range(n_ex)])" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "/* Put everything inside the global mpl namespace */\n", "window.mpl = {};\n", "\n", "\n", "mpl.get_websocket_type = function() {\n", " if (typeof(WebSocket) !== 'undefined') {\n", " return WebSocket;\n", " } else if (typeof(MozWebSocket) !== 'undefined') {\n", " return MozWebSocket;\n", " } else {\n", " alert('Your browser does not have WebSocket support.' +\n", " 'Please try Chrome, Safari or Firefox ≥ 6. ' +\n", " 'Firefox 4 and 5 are also supported but you ' +\n", " 'have to enable WebSockets in about:config.');\n", " };\n", "}\n", "\n", "mpl.figure = function(figure_id, websocket, ondownload, parent_element) {\n", " this.id = figure_id;\n", "\n", " this.ws = websocket;\n", "\n", " this.supports_binary = (this.ws.binaryType != undefined);\n", "\n", " if (!this.supports_binary) {\n", " var warnings = document.getElementById(\"mpl-warnings\");\n", " if (warnings) {\n", " warnings.style.display = 'block';\n", " warnings.textContent = (\n", " \"This browser does not support binary websocket messages. \" +\n", " \"Performance may be slow.\");\n", " }\n", " }\n", "\n", " this.imageObj = new Image();\n", "\n", " this.context = undefined;\n", " this.message = undefined;\n", " this.canvas = undefined;\n", " this.rubberband_canvas = undefined;\n", " this.rubberband_context = undefined;\n", " this.format_dropdown = undefined;\n", "\n", " this.image_mode = 'full';\n", "\n", " this.root = $('
');\n", " this._root_extra_style(this.root)\n", " this.root.attr('style', 'display: inline-block');\n", "\n", " $(parent_element).append(this.root);\n", "\n", " this._init_header(this);\n", " this._init_canvas(this);\n", " this._init_toolbar(this);\n", "\n", " var fig = this;\n", "\n", " this.waiting = false;\n", "\n", " this.ws.onopen = function () {\n", " fig.send_message(\"supports_binary\", {value: fig.supports_binary});\n", " fig.send_message(\"send_image_mode\", {});\n", " if (mpl.ratio != 1) {\n", " fig.send_message(\"set_dpi_ratio\", {'dpi_ratio': mpl.ratio});\n", " }\n", " fig.send_message(\"refresh\", {});\n", " }\n", "\n", " this.imageObj.onload = function() {\n", " if (fig.image_mode == 'full') {\n", " // Full images could contain transparency (where diff images\n", " // almost always do), so we need to clear the canvas so that\n", " // there is no ghosting.\n", " fig.context.clearRect(0, 0, fig.canvas.width, fig.canvas.height);\n", " }\n", " fig.context.drawImage(fig.imageObj, 0, 0);\n", " };\n", "\n", " this.imageObj.onunload = function() {\n", " this.ws.close();\n", " }\n", "\n", " this.ws.onmessage = this._make_on_message_function(this);\n", "\n", " this.ondownload = ondownload;\n", "}\n", "\n", "mpl.figure.prototype._init_header = function() {\n", " var titlebar = $(\n", " '');\n", " var titletext = $(\n", " '');\n", " titlebar.append(titletext)\n", " this.root.append(titlebar);\n", " this.header = titletext[0];\n", "}\n", "\n", "\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "\n", "mpl.figure.prototype._root_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "mpl.figure.prototype._init_canvas = function() {\n", " var fig = this;\n", "\n", " var canvas_div = $('');\n", "\n", " canvas_div.attr('style', 'position: relative; clear: both; outline: 0');\n", "\n", " function canvas_keyboard_event(event) {\n", " return fig.key_event(event, event['data']);\n", " }\n", "\n", " canvas_div.keydown('key_press', canvas_keyboard_event);\n", " canvas_div.keyup('key_release', canvas_keyboard_event);\n", " this.canvas_div = canvas_div\n", " this._canvas_extra_style(canvas_div)\n", " this.root.append(canvas_div);\n", "\n", " var canvas = $('');\n", " canvas.addClass('mpl-canvas');\n", " canvas.attr('style', \"left: 0; top: 0; z-index: 0; outline: 0\")\n", "\n", " this.canvas = canvas[0];\n", " this.context = canvas[0].getContext(\"2d\");\n", "\n", " var backingStore = this.context.backingStorePixelRatio ||\n", "\tthis.context.webkitBackingStorePixelRatio ||\n", "\tthis.context.mozBackingStorePixelRatio ||\n", "\tthis.context.msBackingStorePixelRatio ||\n", "\tthis.context.oBackingStorePixelRatio ||\n", "\tthis.context.backingStorePixelRatio || 1;\n", "\n", " mpl.ratio = (window.devicePixelRatio || 1) / backingStore;\n", "\n", " var rubberband = $('');\n", " rubberband.attr('style', \"position: absolute; left: 0; top: 0; z-index: 1;\")\n", "\n", " var pass_mouse_events = true;\n", "\n", " canvas_div.resizable({\n", " start: function(event, ui) {\n", " pass_mouse_events = false;\n", " },\n", " resize: function(event, ui) {\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " stop: function(event, ui) {\n", " pass_mouse_events = true;\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " });\n", "\n", " function mouse_event_fn(event) {\n", " if (pass_mouse_events)\n", " return fig.mouse_event(event, event['data']);\n", " }\n", "\n", " rubberband.mousedown('button_press', mouse_event_fn);\n", " rubberband.mouseup('button_release', mouse_event_fn);\n", " // Throttle sequential mouse events to 1 every 20ms.\n", " rubberband.mousemove('motion_notify', mouse_event_fn);\n", "\n", " rubberband.mouseenter('figure_enter', mouse_event_fn);\n", " rubberband.mouseleave('figure_leave', mouse_event_fn);\n", "\n", " canvas_div.on(\"wheel\", function (event) {\n", " event = event.originalEvent;\n", " event['data'] = 'scroll'\n", " if (event.deltaY < 0) {\n", " event.step = 1;\n", " } else {\n", " event.step = -1;\n", " }\n", " mouse_event_fn(event);\n", " });\n", "\n", " canvas_div.append(canvas);\n", " canvas_div.append(rubberband);\n", "\n", " this.rubberband = rubberband;\n", " this.rubberband_canvas = rubberband[0];\n", " this.rubberband_context = rubberband[0].getContext(\"2d\");\n", " this.rubberband_context.strokeStyle = \"#000000\";\n", "\n", " this._resize_canvas = function(width, height) {\n", " // Keep the size of the canvas, canvas container, and rubber band\n", " // canvas in synch.\n", " canvas_div.css('width', width)\n", " canvas_div.css('height', height)\n", "\n", " canvas.attr('width', width * mpl.ratio);\n", " canvas.attr('height', height * mpl.ratio);\n", " canvas.attr('style', 'width: ' + width + 'px; height: ' + height + 'px;');\n", "\n", " rubberband.attr('width', width);\n", " rubberband.attr('height', height);\n", " }\n", "\n", " // Set the figure to an initial 600x600px, this will subsequently be updated\n", " // upon first draw.\n", " this._resize_canvas(600, 600);\n", "\n", " // Disable right mouse context menu.\n", " $(this.rubberband_canvas).bind(\"contextmenu\",function(e){\n", " return false;\n", " });\n", "\n", " function set_focus () {\n", " canvas.focus();\n", " canvas_div.focus();\n", " }\n", "\n", " window.setTimeout(set_focus, 100);\n", "}\n", "\n", "mpl.figure.prototype._init_toolbar = function() {\n", " var fig = this;\n", "\n", " var nav_element = $('')\n", " nav_element.attr('style', 'width: 100%');\n", " this.root.append(nav_element);\n", "\n", " // Define a callback function for later on.\n", " function toolbar_event(event) {\n", " return fig.toolbar_button_onclick(event['data']);\n", " }\n", " function toolbar_mouse_event(event) {\n", " return fig.toolbar_button_onmouseover(event['data']);\n", " }\n", "\n", " for(var toolbar_ind in mpl.toolbar_items) {\n", " var name = mpl.toolbar_items[toolbar_ind][0];\n", " var tooltip = mpl.toolbar_items[toolbar_ind][1];\n", " var image = mpl.toolbar_items[toolbar_ind][2];\n", " var method_name = mpl.toolbar_items[toolbar_ind][3];\n", "\n", " if (!name) {\n", " // put a spacer in here.\n", " continue;\n", " }\n", " var button = $('');\n", " button.addClass('ui-button ui-widget ui-state-default ui-corner-all ' +\n", " 'ui-button-icon-only');\n", " button.attr('role', 'button');\n", " button.attr('aria-disabled', 'false');\n", " button.click(method_name, toolbar_event);\n", " button.mouseover(tooltip, toolbar_mouse_event);\n", "\n", " var icon_img = $('');\n", " icon_img.addClass('ui-button-icon-primary ui-icon');\n", " icon_img.addClass(image);\n", " icon_img.addClass('ui-corner-all');\n", "\n", " var tooltip_span = $('');\n", " tooltip_span.addClass('ui-button-text');\n", " tooltip_span.html(tooltip);\n", "\n", " button.append(icon_img);\n", " button.append(tooltip_span);\n", "\n", " nav_element.append(button);\n", " }\n", "\n", " var fmt_picker_span = $('');\n", "\n", " var fmt_picker = $('');\n", " fmt_picker.addClass('mpl-toolbar-option ui-widget ui-widget-content');\n", " fmt_picker_span.append(fmt_picker);\n", " nav_element.append(fmt_picker_span);\n", " this.format_dropdown = fmt_picker[0];\n", "\n", " for (var ind in mpl.extensions) {\n", " var fmt = mpl.extensions[ind];\n", " var option = $(\n", " '', {selected: fmt === mpl.default_extension}).html(fmt);\n", " fmt_picker.append(option)\n", " }\n", "\n", " // Add hover states to the ui-buttons\n", " $( \".ui-button\" ).hover(\n", " function() { $(this).addClass(\"ui-state-hover\");},\n", " function() { $(this).removeClass(\"ui-state-hover\");}\n", " );\n", "\n", " var status_bar = $('