{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Statistics Tools\n",
    "\n",
    "<span style=\" color:red\">NOTE</span>: In this notebook I use the`stats`  submodule of `scipy` for <em>all</em> statistics functions, including generation of random numbers. There are other modules with some overlapping functionality, e.g., the regular python `random` module, and the `np.random` module, but I do not use them here.  The `stats` submodule includes tools for a large number of distributions, it includes a large and growing set of statistical functions, and there is a unified class structure.  In addition, potential namespace issues are minimized. See https://docs.scipy.org/doc/scipy/reference/stats.html.\n",
    "\n",
    "Marty Ligare, August 2020\n",
    "\n",
    "Modified by Tom Solomon, February 2021\n",
    "\n",
    "Modified by Ned Ladd, January 2022 to remove np.random references in favor of stats.randint.rvs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from scipy import stats\n",
    "\n",
    "import matplotlib as mpl\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Following is an Ipython magic command that puts figures in notebook.\n",
    "%matplotlib notebook\n",
    "        \n",
    "# M.L. modifications of matplotlib defaults\n",
    "# Changes can also be put in matplotlibrc file, \n",
    "# or effected using mpl.rcParams[]\n",
    "mpl.style.use('classic') \n",
    "plt.rc('figure', figsize = (6, 4.5)) # Reduces overall size of figures\n",
    "plt.rc('axes', labelsize=16, titlesize=14)\n",
    "plt.rc('figure', autolayout = True) # Adjusts supblot parameters for new size\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generating random integers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you want to generate just one random integer, here is a way to do it. You have to specify a \"low\" and \"high\" for the range; i.e., you want a random integer between \"low\" and \"high\".  Note that you have to add one to the \"high\" limit.\n",
    "\n",
    "This example pulls a random integer between 3 and 7.  Execute the cell a few times to get a feel for it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "low = 3\n",
    "high = 7\n",
    "stats.randint.rvs(low,high+1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is more likely that you will want to get a full array of integers in one fell swoop.  Here is how to do that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate n integers between low and high:\n",
    "low = -3\n",
    "high = 6\n",
    "n = 100\n",
    "\n",
    "# or equivalently:\n",
    "# loq, high, n = (-3, 6, 100)\n",
    "\n",
    "stats.randint.rvs(low, high+1, size=n)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sampling random numbers from a uniform p.d.f."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Sample n random numbers in interval [0.0,1.0]:\n",
    "n = 10\n",
    "stats.uniform.rvs(size=n)        # \"uniform\" in the command indicates all values are equally likely"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Or, if you need a uniform distribution within a different interval\n",
    "# you can specify the lower bound of the interval, and the range\n",
    "#\n",
    "low = 2\n",
    "rng = 7\n",
    "stats.uniform.rvs(low,rng,size=10)    # this command will produce 10 random values in the interval [2.0,9.0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sampling random numbers from a normal distribution"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sample $n$  random numbers from the normal distribution with mean $\\mu$, standard deviation $\\sigma$, and pdf of Eq.(2.4) of Hughes & Hase:\n",
    "\\begin{equation}\n",
    "p(x) = \\frac{1}{\\sigma\\sqrt{2\\pi}}\\exp\\left[-\\frac{(x-\\mu)^2}{2\\sigma^2}\\right]\n",
    "\\end{equation}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Sampling from normal distribution\n",
    "n = 10\n",
    "mean = 10.\n",
    "sigma = 2.\n",
    "stats.norm.rvs(mean, sigma, size=n)   # \"norm\" in the command indicates a \"normal\" or \"Gaussian\" distribution"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Graph the Probability Distribution Function (PDF) of a Normal Distribution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "x = np.linspace(mean-3.*sigma, mean+3.*sigma,200)  # make an array of 200 evenly spaced values, starting with\n",
    "                                                   # mean-3*sigma = 4 and going to mean+3*sigma = 10\n",
    "y = stats.norm.pdf(x, mean, sigma)                 # determine the value of the pdf at each of the points in 'x'\n",
    "plt.title(\"pdf of normal distribution\")\n",
    "plt.xlabel(\"$x$\")\n",
    "plt.ylabel(\"$p(x)$\")\n",
    "plt.grid()\n",
    "plt.plot(x, y);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Graph of the Cumulative Distribution Function (CDF) of normal distribution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "x = np.linspace(mean-3.*sigma, mean+3.*sigma, 200)   # don't really need to do this again....\n",
    "y = stats.norm.cdf(x, mean, sigma)\n",
    "plt.title(\"cdf of normal distribution\")\n",
    "plt.xlabel(\"$x$\")\n",
    "plt.ylabel(\"$\\int_{-\\infty}^x p(x^\\prime)\\, dx^\\prime$\")\n",
    "plt.grid()\n",
    "plt.plot(x, y);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sampling random numbers from a Poisson distribution"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sample $n$  random numbers from the Poisson distribution with average count $\\overline{N}$, and probability distibution given by  Eq.(3.1) of Hughes & Hase:\n",
    "\\begin{equation}\n",
    "p(N;\\overline{N}) = \\frac{\\exp\\left(-\\overline{N}\\right)\\overline{N}^N}{N!}\n",
    "\\end{equation}\n",
    "The standard deviation of the Poisson distribution is given by \n",
    "$$ \\sigma = \\sqrt{\\overline{N}}.  $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Sampling from a Poisson distribution \n",
    "n = 1000\n",
    "mean = 2\n",
    "stats.poisson.rvs(mean, size=n)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.mean(_)    # The underscore \"_\" is similar to Mathematicas \"%\"\n",
    "              # It refers to the output of the previously executed cell"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.std(__)    # Notice the double underscore \"__\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.sqrt(mean)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Graph of the Probability Mass Function (PMF; it's like a PDF) of Poisson distribution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "x = np.linspace(0, 12, 13)\n",
    "y = stats.poisson.pmf(x, mean)\n",
    "plt.title(\"pmf of Poisson distribution\")\n",
    "plt.xlabel(\"$n$\")\n",
    "plt.ylabel(\"$p(n)$\")\n",
    "plt.grid()\n",
    "plt.axhline(0)\n",
    "plt.scatter(x, y);\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Graph of the CDF of Poisson distribution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "x = np.linspace(0, 12, 13)\n",
    "y = stats.poisson.cdf(x, mean)\n",
    "plt.title(\"cdf of Poisson distribution\")\n",
    "plt.xlabel(\"$x$\")\n",
    "plt.ylabel(\"$C_{DF}$\")\n",
    "plt.xlim(-1, 13)\n",
    "plt.grid()\n",
    "plt.axhline(0)\n",
    "plt.scatter(x, y);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sampling random numbers from a binomial distribution"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Consider $n$ trials, with probability of success $p$ in each trial.  The array below is the number successes in each of $size$ trials."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Sampling from a binomial distribution\n",
    "n = 2\n",
    "p = 0.4\n",
    "stats.binom.rvs(n, p, size=100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "np.mean(_)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The probablity of getting $x$ successes is given by the probability mass function (pmf).  This is analogous to the continous pdf (and it's called the PDF in Mathematica).  As an example, the probability of 2 successes in 3 trials with a probability of success in each trial of 0.4 is\n",
    "29%:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n, s, p = (3, 2, 0.4)\n",
    "stats.binom.pmf(s, n, p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Graph of pmf (~pdf) of binomial distribution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "n = 5\n",
    "x = np.linspace(0, n, n+1)\n",
    "y = stats.binom.pmf(x, n, p)\n",
    "\n",
    "plt.title(\"pmf ($\\sim$pdf) of binom. dist.; $n=5$, $p = 0.4$\")\n",
    "plt.xlabel(\"$n$\")\n",
    "plt.ylabel(\"probability of $n$ successes\")\n",
    "plt.grid()\n",
    "plt.axhline(0)\n",
    "plt.scatter(x, y);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Graph of cdf of binomial distribution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "n = 5\n",
    "x = np.linspace(0, n, n+1)\n",
    "y = stats.binom.cdf(x, n, p)\n",
    "plt.title(\"cdf of binomial dist.; $n=5$, $p = 0.4$\")\n",
    "plt.xlabel(\"$n$\")\n",
    "plt.ylabel(\"cdf\")\n",
    "plt.grid()\n",
    "plt.axhline(0)\n",
    "plt.scatter(x, y);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Histograms"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### Generate some random data from a normal distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n = 100\n",
    "mean = 10.\n",
    "sigma = 2.\n",
    "data = stats.norm.rvs(mean, sigma, size=n)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Select number of bins between low and high values.\n",
    "\n",
    "<b>NOTE:</b> `plt.hist` plots the histogram, and ouputs the binned data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "nbins = 6\n",
    "low = mean - 3*sigma\n",
    "high = mean + 3*sigma\n",
    "plt.xlabel(\"value\")\n",
    "plt.ylabel(\"occurences\")\n",
    "plt.title(\"Histogram; equal sized bins\")\n",
    "out = plt.hist(data, nbins, [low,high])\n",
    "out[0],out[1]    # occurrences and bin boundaries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### OR ... Select specific bin boundaries\n",
    "Again, `plt.hist` outputs the binned data and plots the histogram.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "bins = [4, 7, 8, 10, 13, 16]\n",
    "plt.xlabel(\"value\")\n",
    "plt.ylabel(\"occurences\")\n",
    "plt.title(\"Histogram; specified (nonuniform) bins\")\n",
    "out = plt.hist(data, bins)\n",
    "out[0],out[1]   # occurrences and bin boundaries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Version Information \n",
    "`version_information` is from J.R. Johansson (jrjohansson at gmail.com); see <a href='http://nbviewer.jupyter.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-0-Scientific-Computing-with-Python.ipynb'>Introduction to scientific computing with Python</a> for more information and instructions for package installation.\n",
    "\n",
    "`version_information` is installed on the linux network at Bucknell"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext version_information"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "version_information numpy, scipy, matplotlib"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}