{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prelude to Hughes & Hase Chapter 8, A Bad model\n",
    "\n",
    "In Part B of the Prelude to Chapter 8, we looked at the expected distribution \n",
    "of values of $\\chi^2$ for the fit of a straight line to 5 data points.  In this part we\n",
    "repeat exactly the same process, but the model is an exponential decay, and \n",
    "we fit the model to a larger number of data points that don't fit the model as well.\n",
    "\n",
    "Ned Ladd, March 2022"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from scipy import stats\n",
    "from scipy import optimize\n",
    "\n",
    "import matplotlib as mpl\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Following is an Ipython magic command that puts figures in notebook.\n",
    "%matplotlib notebook\n",
    "        \n",
    "# M.L. modifications of matplotlib defaults\n",
    "# Changes can also be put in matplotlibrc file, \n",
    "# or effected using mpl.rcParams[]\n",
    "mpl.style.use('classic') \n",
    "plt.rc('figure', figsize = (6, 4.5))  # Reduces overall size of figures\n",
    "plt.rc('axes', labelsize=16, titlesize=14)\n",
    "plt.rc('figure', autolayout = True)   # Adjusts supblot parameters for new size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Functions\n",
    "Remember:\n",
    "$$\n",
    "\\chi^2 = \\sum_i\\frac{\\left(y(x_i) - y_i\\right)^2}{\\alpha_i^2}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def f(x,q,r,s):\n",
    "    '''Simple exponential decay function (with offset); nonlinear\n",
    "    in the fit parameteres q, r, and s.'''\n",
    "    return q*np.exp(-r*x) + s\n",
    "\n",
    "def chi2(x,y,u,q,r,s):\n",
    "    '''Chisquare as a function of data (x, y, and yerr=u), and model \n",
    "    parameters q, r, and s'''\n",
    "    return np.sum((y - f(x,q,r,s))**2/u**2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Set III"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = np.array([[0.        , 7.46129812, 0.4       ],\n",
    "       [0.2       , 5.85452157, 0.4       ],\n",
    "       [0.4       , 7.07717576, 0.4       ],\n",
    "       [0.6       , 5.4736062 , 0.4       ],\n",
    "       [0.8       , 3.8904504 , 0.4       ],\n",
    "       [1.        , 4.79926158, 0.4       ],\n",
    "       [1.2       , 4.55971739, 0.4       ],\n",
    "       [1.4       , 3.60873722, 0.4       ],\n",
    "       [1.6       , 2.86732949, 0.4       ],\n",
    "       [1.8       , 2.84260122, 0.4       ],\n",
    "       [2.        , 2.13946228, 0.4       ],\n",
    "       [2.2       , 2.22060587, 0.4       ],\n",
    "       [2.4       , 1.63133617, 0.4       ],\n",
    "       [2.6       , 2.46693153, 0.4       ],\n",
    "       [2.8       , 2.32689348, 0.4       ],\n",
    "       [3.        , 1.5128004 , 0.4       ],\n",
    "       [3.2       , 1.20809552, 0.4       ],\n",
    "       [3.4       , 1.43849374, 0.4       ],\n",
    "       [3.6       , 1.36242689, 0.4       ],\n",
    "       [3.8       , 2.13797785, 0.4       ],\n",
    "       [4.        , 1.72819865, 0.4       ]])\n",
    "\n",
    "x, y, u = data.T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "plt.title('Data set #2')\n",
    "plt.xlabel('$x$')\n",
    "plt.ylabel('$y$')\n",
    "plt.xlim(-0.5,4.5) \n",
    "plt.ylim(0,9)\n",
    "plt.errorbar(x,y,yerr=u,fmt='o');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Find best-fit parameters (\"old business\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "p0 = 5, 1, 1   # Initial estimates of q, r, and s\n",
    "popt, pcov = optimize.curve_fit(f, x, y, p0, sigma=u, absolute_sigma=True)\n",
    "chi2_data = chi2(x,y,u,*popt)\n",
    "print(\"chi2_data =\", chi2_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "xc = np.linspace(0,6,201) # quasi-continuous set of x's for function plot\n",
    "plt.figure()\n",
    "plt.title(\"Data set #2 with best fit function\")\n",
    "plt.xlabel('$x$')\n",
    "plt.ylabel('$y$')\n",
    "plt.axhline(0, color='magenta')\n",
    "plt.xlim(-0.5,4.5)  # Pad x-range on plot\n",
    "plt.errorbar(x, y, yerr=u, fmt='o');\n",
    "plt.plot(xc ,f(xc, *popt));"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Normalized residuals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.figure()\n",
    "plt.axhline(0,color='magenta')\n",
    "plt.xlabel('$x$')\n",
    "plt.ylabel('normalized residuals')\n",
    "plt.title('Data set #2: normalized residuals')\n",
    "plt.grid(True)\n",
    "plt.errorbar(x,(f(x,*popt)-y)/u,1,fmt='o')\n",
    "plt.xlim(-0.5,5.5);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What confidence can we have in the model? (\"new business\")\n",
    "\n",
    "Normalized residuals look ok according to qualitative assessments \n",
    "mentioned, so precede to more quantitative measures.\n",
    "\n",
    "1. <b>Assume the model is correct</b>, and that any deviations \n",
    "from the model are due to the fact that data is the result of\n",
    "random sampling from normal probability distributions characterized by the \n",
    "known standard errors of the data points.  This assumption is called\n",
    "the <b>null hypothesis</b>.\n",
    "2. <b>Generate many simulated data sets</b> based on the model, best-fit parameters, and uncertainties. (Use same values for $x_i$).\n",
    "3. <b>Fit each of the simulated data sets</b> and store the values of \n",
    "$\\chi^2$ in an array.\n",
    "4. <b>Examine the histogram of simulated values of $\\chi^2$.</b>\n",
    "Where does the value of $\\chi^2_{\\text{min}}$ from the fit to the \n",
    "original data fall within the histogram?  What's the probability of \n",
    "getting values of $\\chi^2$ from the simulated data sets \n",
    "that are greater than the value given by the data that was actually measured? \n",
    "In other words, is our value of $\\chi^2_{\\text{min}}$ a reasonable result from the model\n",
    "and the fluctuations we expect in our data?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Simulation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_sim = 10000               # Number of data sets to simulate\n",
    "count = 0\n",
    "\n",
    "chi2_sim = np.zeros(n_sim)  # Array for chisq valuesfrom simulated sets\n",
    "\n",
    "for i in range(n_sim):\n",
    "    y_sim = stats.norm.rvs(f(x,*popt), u)  # Generate simulated data set\n",
    "    popt_sim, pcov_sim = optimize.curve_fit(f, x, y_sim, popt, sigma=u, absolute_sigma=True)\n",
    "    a = chi2(x,y_sim,u,*popt_sim)   # Calculate chisq for simulated data\n",
    "    chi2_sim[i] = a                 # store value of chisq\n",
    "    if a > chi2_data:               # increment counter\n",
    "        count += 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"% of simulated sets with chisq larger than that from orig. data =\", 100*count/n_sim)\n",
    "mean  = np.mean(chi2_sim)\n",
    "sigma = np.sqrt(np.var(chi2_sim))\n",
    "print(\"mean of simulated values of chisq =\", mean)\n",
    "print(\"std of simulated values of chisq =\", sigma)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Histogram of results from simulation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nbins = 25  # number of bis\n",
    "low = 0     # lower limit for histogram \n",
    "#high = mean + 5*sigma\n",
    "high = 50   # upper limit for histogram\n",
    "\n",
    "plt.figure()\n",
    "plt.xlabel(\"$\\chi^2$\")\n",
    "plt.ylabel(\"occurrences\")\n",
    "plt.title(\"Histogram; equal sized bins\")\n",
    "out = plt.hist(chi2_sim, nbins, [low,high], label=\"$\\chi^2$ from sim. data\")\n",
    "#out[0],out[1]   # occurrences and bin boundaries (for use in future)\n",
    "\n",
    "plt.axvline(chi2_data, label='$\\chi^2 = 34.7$')\n",
    "plt.legend();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Discussion\n",
    "\n",
    "Again, the histogram gives \n",
    "a <i>probability distribution</i> for values $\\chi^2$ that \n",
    "minimize data sets consistent with the model and uncertainties.\n",
    "With more data points, the probability distribution is \n",
    "more symmetric than that in Part A.\n",
    "\n",
    "Again, the value of $\\chi^2$ obtained\n",
    "by fitting the original data is in the range of values we expect from \n",
    "the model and uncertainties.  To be more quantitative, \n",
    "there is a 1% chance of getting a value of $\\chi^2$ that \n",
    "is 34.7 or bigger, so it's not very reasonable at all to expect a value \n",
    "of about 34.7.\n",
    "\n",
    "<b>Therefore, you should be very suspicious of the exponential decay model for these data, based on the value of \n",
    " $\\chi^2$.</b>\n",
    " \n",
    "If you end up in a situation like this, there are two possibilities to consider:\n",
    "<Ul>\n",
    "    <li> The model you're trying to fit doesn't adequately explain the data. Perhaps you need to consider a more sophisticated version, or even some different physical understanding of the situation that produced these data. \n",
    "    <li> Your uncertainties are too small, and don't adequately represent the variation that one might expect betweeen the (assumed correct) model and the data you measured. You might want to re-examine your measurement technique and make sure that you have properly characterized its measuring capacity. \n",
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The $\\chi^2$ Distribution\n",
    "\n",
    "You don't need to simulate the distribution of $\\chi^2$ values each time you do a fit, because the functional form for this distribution is well-known. It's characterized by the number of <b>degrees of freedom</b> in your fitting process. In nearly all cases, you can determine the number of degrees of freedom by subtracting the number of parameters in your model (in this case, 3) from the number of datapoints to be fit (21). \n",
    "\n",
    "Then, we can use the CDF for this distribution to determine the probability of finding a value of $\\chi^2$ less than or equal to our determined $\\chi^2_{min}$. Or, as is more useful in this case, 1 - CDF will give us the probability of finding a $\\chi^2$ greater than our value of $\\chi^2_{min}$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "prob = 1 - stats.chi2.cdf(chi2_data,18)\n",
    "print(\"Probability that we would find chi-square larger than what we found:\",prob)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Version details\n",
    "`version_information` is from J.R. Johansson (jrjohansson at gmail.com); see <a href='http://nbviewer.jupyter.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-0-Scientific-Computing-with-Python.ipynb'>Introduction to scientific computing with Python</a> for more information and instructions for package installation.\n",
    "\n",
    "`version_information` is installed on the linux network at Bucknell"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext version_information"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "version_information numpy, scipy, matplotlib"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}