{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fe1ecdc0-e850-49a6-8063-a2f828e75e84",
   "metadata": {},
   "source": [
    "# Data Analysis VI: Monte Carlo Methods\n",
    "\n",
    "Goals for this class:\n",
    "\n",
    "1) **Simulate many trials**: Perform a Monte Carlo simulation of an experiment with multiple data points per trial, and also simulate many trials of the experiment.\n",
    "\n",
    "2) **Monte Carlo error propagation**: Use a Monte Carlo simulation to generate the PDF for a quantity that is derived from other measured quantities, each of which has an uncertainty."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "5c9584c0-2542-433b-8b2a-0dfc561b6943",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from scipy import stats"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ce3a0b60-fd54-4d3e-b652-4bb21bbd9406",
   "metadata": {},
   "source": [
    "## Goal 1: Simulate many trials - the coin flip"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "184ed1ea-11c9-4633-812d-a03504ba880f",
   "metadata": {},
   "source": [
    "_Question_: I flip a fair coin 100 times. What is the probability that I get 60 or more \"heads\"?\n",
    "\n",
    "One way to answer this question is to use the known properties of the binomial distribution, but we will use it as a way to practice Monte Carlo simulation."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b06bcd31-e03e-4deb-8aad-13d4711056e5",
   "metadata": {},
   "source": [
    "**Task:** Type into this markdown box to outline a plan for how you can answer the above question, using a Monte Carlo simulation.  Your outline should include: will you have any loops (how many)?  what will happen inside the loop(s)?\n",
    "\n",
    "**ADD YOUR OUTLINE HERE**\n",
    "\n",
    "Once you're done with your outline, we'll have a class discussion and map out an approach."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "283e0006-53bd-4fce-95b0-dd4d5159c46f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# (complete task in above box)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef264638-7b0c-4b16-bacf-f2577b94f931",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
    "### Approach 1A: Simulate all the individual coin flips in each trial ###"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54a580db-5086-486b-a06a-526b43c9261c",
   "metadata": {},
   "source": [
    "#### i) Simulate one trial first to test code\n",
    "\n",
    "When you're writing a loop, it's a good idea to write and test the code that will go _inside_ the loop before you actually make the loop.  Before looping across many trials, let's write the code for one trial.\n",
    "\n",
    "First: use `stats.randint.rvs` to generate 100 coin flips (1 or 0).  Print your result to make sure it's what you expect.  (Run `stats.randint?` to review the inputs to stats.randint.rvs.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "3e547bda-6135-4804-bb3c-78a56e83fd2d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# instructions above"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fccce232-caeb-4949-ac3a-10768c5c276b",
   "metadata": {},
   "source": [
    "Now, write code to count and print the number of heads in your 100 flips."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "f1544caf-0390-4c8f-88f0-6d9ab87dcc11",
   "metadata": {},
   "outputs": [],
   "source": [
    "# instructions above"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc485634-a938-47f9-8800-8e13e91f1000",
   "metadata": {},
   "source": [
    "#### ii) Simulate 1000 trials to get a probability"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17ac612f-a367-4a50-9719-330e39ef9957",
   "metadata": {},
   "source": [
    "Now, write a loop to run this experiment 1000 times.  In each iteration of the loop, save the number of heads from that trial into a list or array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "ab970373-cc1b-4059-84a1-a923bd68d20f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# instructions above"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c9c53db4-53b9-481e-80b7-90438fd99551",
   "metadata": {},
   "source": [
    "Now, make a histogram showing the distribution of number of heads from the different trials."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "bf943f41-dd4b-4c66-9339-ba8197a66fcc",
   "metadata": {},
   "outputs": [],
   "source": [
    "# instructions above"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d498d4bb-a850-4f9c-a180-bf87850de79a",
   "metadata": {},
   "source": [
    "Count the number of trials that have a number of heads >= 60.  To do this, first try something like:\n",
    "\n",
    "`print(heads_list>29)`   (This is not the exact code you'll need, it's just an example.)\n",
    "\n",
    "Now try:\n",
    "\n",
    "`np.sum([True,False,True])`     What is this doing?  How can you combine an inequality and np.sum to achieve your goal?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "d16cb901-aef8-4031-86eb-b784b94ac799",
   "metadata": {},
   "outputs": [],
   "source": [
    "# instructions above"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "354f4fe3-e744-467e-a1f0-ae5a99cc5a03",
   "metadata": {},
   "source": [
    "Finally, to answer the question at the beginning of Goal 1, convert your number of occurrences (of >=60 heads) into a probability."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "b0598560-6d65-4e76-b441-eb4dd61c32d6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# instructions above"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc8ebbf6-046e-4b37-a698-c22fc7cadba2",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
    "### (optional) Approach 1B: Simulate the number of heads in each trial (no individual flips)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "37b030be-98cb-4c9c-a26f-059ddba8f2fa",
   "metadata": {},
   "source": [
    "The binomial distribution describes the scenario where you have $n$ measurements, each of which has a probability $p$ of coming out positive.  For example, $n=100$ for 100 coin flips, and $p=0.5$ for coin flips, or $p=1/6$ for the probability of rolling a 1 when you roll a 6-sided die.\n",
    "\n",
    "You can use ```stats.binom.rvs``` to simulate a single run of 100 coin flips, and return the number of positive outcomes (heads) in that run.\n",
    "\n",
    "First, **run stats.binom?** and look for the help line that says what inputs go to stats.binom.rvs.  (It turns out this is more informative than the documentation for stats.binom.rvs.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "180616f5-ae67-4f38-aefa-6f956fdee77d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# view help for stats.binom"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a6d57e30-0335-4237-86f7-336e2aa9ce8d",
   "metadata": {},
   "source": [
    "First try generating and printing the outcome (number of heads) of one trial of 100 flips."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "6a438a1e-9b60-4916-9162-2c6059954ac0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# instructions above"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dee40976-cb80-4152-8066-489b6d515d01",
   "metadata": {},
   "source": [
    "Now try generating and printing the outcome (number of heads) from 1000 trials of 100 flips."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "9f10c1e1-b575-443d-8c06-2aa20cea3d89",
   "metadata": {},
   "outputs": [],
   "source": [
    "# instructions above"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69b45cce-3c59-4a6e-ad18-2680c1bb7387",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true,
    "tags": []
   },
   "source": [
    "### (optional) Approach 1C: Use the binomial CDF"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "79829e23-0cc1-4f4e-9454-fbbbe6e8658b",
   "metadata": {},
   "source": [
    "$C_{DF}(x)$ tells you the probability of getting $\\leq n$ heads in a single experiment of $n$ flips.  Figure out how to use `stats.binom.cdf` to calculate the answer to the original question in a single line of code.  Is this a Monte Carlo simulation?  Why or why not?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "08b68a3a-4d84-472a-af60-61221fc786f9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# instructions above"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cbcfa351-32da-4c61-836f-dd4a5a0f7db8",
   "metadata": {},
   "source": [
    "## Goal 2: Monte Carlo Error Propagation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8ef3c1db-e883-4dce-80d4-ffcd36c52fbe",
   "metadata": {},
   "source": [
    "Distant galaxies are generally moving away from us due the expansion of the universe, and the further the galaxy, the faster it's moving away.  Hubble's Law states that the speed $v$ at which distant galaxies move away from us is proportional to their distance $d$:\n",
    "$$\n",
    "v = H_0 d\n",
    "$$\n",
    "The constant of proportionality $H_0$ is difficult to measure.  It's more precise now, but in the late 1990s, the rough estimate was $H_0 = 70 \\pm 20$ km/s/Mpc (a 30% uncertainty!).  In more convenient units for our purposes, that's:\n",
    "$$\n",
    "H_0 = 21000 \\pm 6000 \\textrm{ km/s/Gly}\n",
    "$$\n",
    "where a Gly = 1 billion lightyears.\n",
    "\n",
    "Astronomers use Hubble's Law to estimate the distance of other galaxies, based on their recession speed.  To measure the recession speed $v$, they first measure redshift $z$, which is the fractional change in the wavelength due to the Doppler effect.  A 23% redshift ($z=0.23$) means a galaxy is moving away from us at 23% of the speed of light:\n",
    "$$\n",
    "v = cz\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32358099-08ba-4cb7-899d-83a499db32a9",
   "metadata": {},
   "source": [
    "**Task:** Combine the two equations for $v$ and rearrange them to solve for distance $d$ of a galaxy in terms of its measured redshift $z$ and the constants $H_0$ and $c$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "26dd2abd-6ac4-4f89-9fec-ca139ad11f48",
   "metadata": {},
   "outputs": [],
   "source": [
    "# add your equation to the above Markdown cell"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fa4dad1-4139-4988-b796-cbb422dd8cb4",
   "metadata": {},
   "source": [
    "_Question_: An astronomer uses the photometric redshift technique to measure a galaxy's redshift, obtaining: $z=0.23\\pm0.05$. (The photometric redshift technique uses colors in photos to estimate the Doppler shift - this is not very precise!)  Based on their measurement, what is the distance to this galaxy, with uncertainties?\n",
    "\n",
    "To answer this question, we will first ask another question:\n",
    "\n",
    "_Question_: If the uncertain measurements of $z$ and $H_0$ were each conducted a bajillion times, what is the range of possible $d$ values that could be measured?  (To answer this, we will make a PDF and CDF of the possible values of $d$ for this galaxy.)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "778a8c4a-05d8-4d50-ae0a-08638b7b1e72",
   "metadata": {},
   "source": [
    "**Class discussion**: Outline a plan for how you can answer the above question, using a Monte Carlo simulation.  (You can do this one without loops!)\n",
    "\n",
    "**ADD YOUR OUTLINE HERE**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "f96a878f-935c-4981-9c0b-43c2b1cbb400",
   "metadata": {},
   "outputs": [],
   "source": [
    "# type into above Markdown cell"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b1cc0cbb-7a52-48ff-9e94-317bbfc5c4e7",
   "metadata": {},
   "source": [
    "Let's do 10,000 trials.  For each, let's generate a random $z$ and a random $H_0$, selected from a Gaussian probability distribution.  You can generate 10,000 values of $z$ all at once using `stats.norm.rvs` (and same for $H_0$)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "d058a573-0e3a-48c1-830c-bd8181607dce",
   "metadata": {},
   "outputs": [],
   "source": [
    "# instructions above"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b6baa00-3f85-4aff-a49a-3add12f18b47",
   "metadata": {},
   "source": [
    "Make a histogram of your random values of $z$ and $H_0$ to make sure the center and width are what you expect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "7e7938bf-e29b-419d-9a10-81094aae2818",
   "metadata": {},
   "outputs": [],
   "source": [
    "# instructions above"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ad5c682-239d-4be5-bf19-2dbc172bc2f2",
   "metadata": {},
   "source": [
    "Now calculate the value of $d$ for each of your 10,000 trials."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "eae089f5-bc54-4101-9a9b-f9e33ac12d4d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# instructions above"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7aaad73a-5084-4fe0-8ff3-415f7fcce834",
   "metadata": {},
   "source": [
    "Calculate the _nominal_ value of $d$ (using the measured value of $z$ and $H_0$)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "61e93e2a-687b-4ab5-87f1-c7c5af6246e3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# instructions above"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8f51351e-d039-4bdd-b34b-bc5afc5d9ba5",
   "metadata": {},
   "source": [
    "Now make a histogram of your simulated $d$ values.  Make your histogram a PDF.  Plot the nominal value of $d$ as a vertical line (you can use plt.axvline).  Is the nominal value the same as the most likely value of $d$?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1380d78b-e941-4dba-8dd8-7075d5c3ee7c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# instructions above"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3b6de62-f7da-462c-8203-c90a31c0a519",
   "metadata": {},
   "source": [
    "Now use the histogram feature to make a CDF.  What is the median value of $d$?  How does it compare to the most likely and the nominal value?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "b399d717-ac73-4b30-83ce-02056d4172f8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# instructions above"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e4a28534-c301-440d-a5b5-69a86e3c06aa",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}