{ "cells": [ { "cell_type": "markdown", "id": "fe1ecdc0-e850-49a6-8063-a2f828e75e84", "metadata": {}, "source": [ "# Data Analysis VI: Monte Carlo Methods\n", "\n", "Goals for this class:\n", "\n", "1) **Simulate many trials**: Perform a Monte Carlo simulation of an experiment with multiple data points per trial, and also simulate many trials of the experiment.\n", "\n", "2) **Monte Carlo error propagation**: Use a Monte Carlo simulation to generate the PDF for a quantity that is derived from other measured quantities, each of which has an uncertainty." ] }, { "cell_type": "code", "execution_count": 2, "id": "5c9584c0-2542-433b-8b2a-0dfc561b6943", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from scipy import stats" ] }, { "cell_type": "markdown", "id": "ce3a0b60-fd54-4d3e-b652-4bb21bbd9406", "metadata": {}, "source": [ "## Goal 1: Simulate many trials - the coin flip" ] }, { "cell_type": "markdown", "id": "184ed1ea-11c9-4633-812d-a03504ba880f", "metadata": {}, "source": [ "_Question_: I flip a fair coin 100 times. What is the probability that I get 60 or more \"heads\"?\n", "\n", "One way to answer this question is to use the known properties of the binomial distribution, but we will use it as a way to practice Monte Carlo simulation." ] }, { "cell_type": "markdown", "id": "b06bcd31-e03e-4deb-8aad-13d4711056e5", "metadata": {}, "source": [ "**Task:** Type into this markdown box to outline a plan for how you can answer the above question, using a Monte Carlo simulation. Your outline should include: will you have any loops (how many)? what will happen inside the loop(s)?\n", "\n", "**ADD YOUR OUTLINE HERE**\n", "\n", "Once you're done with your outline, we'll have a class discussion and map out an approach." ] }, { "cell_type": "code", "execution_count": 3, "id": "283e0006-53bd-4fce-95b0-dd4d5159c46f", "metadata": {}, "outputs": [], "source": [ "# (complete task in above box)" ] }, { "cell_type": "markdown", "id": "ef264638-7b0c-4b16-bacf-f2577b94f931", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "### Approach 1A: Simulate all the individual coin flips in each trial ###" ] }, { "cell_type": "markdown", "id": "54a580db-5086-486b-a06a-526b43c9261c", "metadata": {}, "source": [ "#### i) Simulate one trial first to test code\n", "\n", "When you're writing a loop, it's a good idea to write and test the code that will go _inside_ the loop before you actually make the loop. Before looping across many trials, let's write the code for one trial.\n", "\n", "First: use `stats.randint.rvs` to generate 100 coin flips (1 or 0). Print your result to make sure it's what you expect. (Run `stats.randint?` to review the inputs to stats.randint.rvs.)" ] }, { "cell_type": "code", "execution_count": 10, "id": "3e547bda-6135-4804-bb3c-78a56e83fd2d", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "fccce232-caeb-4949-ac3a-10768c5c276b", "metadata": {}, "source": [ "Now, write code to count and print the number of heads in your 100 flips." ] }, { "cell_type": "code", "execution_count": 11, "id": "f1544caf-0390-4c8f-88f0-6d9ab87dcc11", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "cc485634-a938-47f9-8800-8e13e91f1000", "metadata": {}, "source": [ "#### ii) Simulate 1000 trials to get a probability" ] }, { "cell_type": "markdown", "id": "17ac612f-a367-4a50-9719-330e39ef9957", "metadata": {}, "source": [ "Now, write a loop to run this experiment 1000 times. In each iteration of the loop, save the number of heads from that trial into a list or array." ] }, { "cell_type": "code", "execution_count": 12, "id": "ab970373-cc1b-4059-84a1-a923bd68d20f", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "c9c53db4-53b9-481e-80b7-90438fd99551", "metadata": {}, "source": [ "Now, make a histogram showing the distribution of number of heads from the different trials." ] }, { "cell_type": "code", "execution_count": 14, "id": "bf943f41-dd4b-4c66-9339-ba8197a66fcc", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "d498d4bb-a850-4f9c-a180-bf87850de79a", "metadata": {}, "source": [ "Count the number of trials that have a number of heads >= 60. To do this, first try something like:\n", "\n", "`print(heads_list>29)` (This is not the exact code you'll need, it's just an example.)\n", "\n", "Now try:\n", "\n", "`np.sum([True,False,True])` What is this doing? How can you combine an inequality and np.sum to achieve your goal?" ] }, { "cell_type": "code", "execution_count": 16, "id": "d16cb901-aef8-4031-86eb-b784b94ac799", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "354f4fe3-e744-467e-a1f0-ae5a99cc5a03", "metadata": {}, "source": [ "Finally, to answer the question at the beginning of Goal 1, convert your number of occurrences (of >=60 heads) into a probability." ] }, { "cell_type": "code", "execution_count": 17, "id": "b0598560-6d65-4e76-b441-eb4dd61c32d6", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "dc8ebbf6-046e-4b37-a698-c22fc7cadba2", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "### (optional) Approach 1B: Simulate the number of heads in each trial (no individual flips)" ] }, { "cell_type": "markdown", "id": "37b030be-98cb-4c9c-a26f-059ddba8f2fa", "metadata": {}, "source": [ "The binomial distribution describes the scenario where you have $n$ measurements, each of which has a probability $p$ of coming out positive. For example, $n=100$ for 100 coin flips, and $p=0.5$ for coin flips, or $p=1/6$ for the probability of rolling a 1 when you roll a 6-sided die.\n", "\n", "You can use ```stats.binom.rvs``` to simulate a single run of 100 coin flips, and return the number of positive outcomes (heads) in that run.\n", "\n", "First, **run stats.binom?** and look for the help line that says what inputs go to stats.binom.rvs. (It turns out this is more informative than the documentation for stats.binom.rvs.)" ] }, { "cell_type": "code", "execution_count": 8, "id": "180616f5-ae67-4f38-aefa-6f956fdee77d", "metadata": {}, "outputs": [], "source": [ "# view help for stats.binom" ] }, { "cell_type": "markdown", "id": "a6d57e30-0335-4237-86f7-336e2aa9ce8d", "metadata": {}, "source": [ "First try generating and printing the outcome (number of heads) of one trial of 100 flips." ] }, { "cell_type": "code", "execution_count": 18, "id": "6a438a1e-9b60-4916-9162-2c6059954ac0", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "dee40976-cb80-4152-8066-489b6d515d01", "metadata": {}, "source": [ "Now try generating and printing the outcome (number of heads) from 1000 trials of 100 flips." ] }, { "cell_type": "code", "execution_count": 19, "id": "9f10c1e1-b575-443d-8c06-2aa20cea3d89", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "69b45cce-3c59-4a6e-ad18-2680c1bb7387", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "### (optional) Approach 1C: Use the binomial CDF" ] }, { "cell_type": "markdown", "id": "79829e23-0cc1-4f4e-9454-fbbbe6e8658b", "metadata": {}, "source": [ "$C_{DF}(x)$ tells you the probability of getting $\\leq n$ heads in a single experiment of $n$ flips. Figure out how to use `stats.binom.cdf` to calculate the answer to the original question in a single line of code. Is this a Monte Carlo simulation? Why or why not?" ] }, { "cell_type": "code", "execution_count": 9, "id": "08b68a3a-4d84-472a-af60-61221fc786f9", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "cbcfa351-32da-4c61-836f-dd4a5a0f7db8", "metadata": {}, "source": [ "## Goal 2: Monte Carlo Error Propagation" ] }, { "cell_type": "markdown", "id": "8ef3c1db-e883-4dce-80d4-ffcd36c52fbe", "metadata": {}, "source": [ "Distant galaxies are generally moving away from us due the expansion of the universe, and the further the galaxy, the faster it's moving away. Hubble's Law states that the speed $v$ at which distant galaxies move away from us is proportional to their distance $d$:\n", "$$\n", "v = H_0 d\n", "$$\n", "The constant of proportionality $H_0$ is difficult to measure. It's more precise now, but in the late 1990s, the rough estimate was $H_0 = 70 \\pm 20$ km/s/Mpc (a 30% uncertainty!). In more convenient units for our purposes, that's:\n", "$$\n", "H_0 = 21000 \\pm 6000 \\textrm{ km/s/Gly}\n", "$$\n", "where a Gly = 1 billion lightyears.\n", "\n", "Astronomers use Hubble's Law to estimate the distance of other galaxies, based on their recession speed. To measure the recession speed $v$, they first measure redshift $z$, which is the fractional change in the wavelength due to the Doppler effect. A 23% redshift ($z=0.23$) means a galaxy is moving away from us at 23% of the speed of light:\n", "$$\n", "v = cz\n", "$$" ] }, { "cell_type": "markdown", "id": "32358099-08ba-4cb7-899d-83a499db32a9", "metadata": {}, "source": [ "**Task:** Combine the two equations for $v$ and rearrange them to solve for distance $d$ of a galaxy in terms of its measured redshift $z$ and the constants $H_0$ and $c$." ] }, { "cell_type": "code", "execution_count": 5, "id": "26dd2abd-6ac4-4f89-9fec-ca139ad11f48", "metadata": {}, "outputs": [], "source": [ "# add your equation to the above Markdown cell" ] }, { "cell_type": "markdown", "id": "2fa4dad1-4139-4988-b796-cbb422dd8cb4", "metadata": {}, "source": [ "_Question_: An astronomer uses the photometric redshift technique to measure a galaxy's redshift, obtaining: $z=0.23\\pm0.05$. (The photometric redshift technique uses colors in photos to estimate the Doppler shift - this is not very precise!) Based on their measurement, what is the distance to this galaxy, with uncertainties?\n", "\n", "To answer this question, we will first ask another question:\n", "\n", "_Question_: If the uncertain measurements of $z$ and $H_0$ were each conducted a bajillion times, what is the range of possible $d$ values that could be measured? (To answer this, we will make a PDF and CDF of the possible values of $d$ for this galaxy.)" ] }, { "cell_type": "markdown", "id": "778a8c4a-05d8-4d50-ae0a-08638b7b1e72", "metadata": {}, "source": [ "**Class discussion**: Outline a plan for how you can answer the above question, using a Monte Carlo simulation. (You can do this one without loops!)\n", "\n", "**ADD YOUR OUTLINE HERE**" ] }, { "cell_type": "code", "execution_count": 6, "id": "f96a878f-935c-4981-9c0b-43c2b1cbb400", "metadata": {}, "outputs": [], "source": [ "# type into above Markdown cell" ] }, { "cell_type": "markdown", "id": "b1cc0cbb-7a52-48ff-9e94-317bbfc5c4e7", "metadata": {}, "source": [ "Let's do 10,000 trials. For each, let's generate a random $z$ and a random $H_0$, selected from a Gaussian probability distribution. You can generate 10,000 values of $z$ all at once using `stats.norm.rvs` (and same for $H_0$)." ] }, { "cell_type": "code", "execution_count": 20, "id": "d058a573-0e3a-48c1-830c-bd8181607dce", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "3b6baa00-3f85-4aff-a49a-3add12f18b47", "metadata": {}, "source": [ "Make a histogram of your random values of $z$ and $H_0$ to make sure the center and width are what you expect." ] }, { "cell_type": "code", "execution_count": 21, "id": "7e7938bf-e29b-419d-9a10-81094aae2818", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "6ad5c682-239d-4be5-bf19-2dbc172bc2f2", "metadata": {}, "source": [ "Now calculate the value of $d$ for each of your 10,000 trials." ] }, { "cell_type": "code", "execution_count": 22, "id": "eae089f5-bc54-4101-9a9b-f9e33ac12d4d", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "7aaad73a-5084-4fe0-8ff3-415f7fcce834", "metadata": {}, "source": [ "Calculate the _nominal_ value of $d$ (using the measured value of $z$ and $H_0$)." ] }, { "cell_type": "code", "execution_count": 24, "id": "61e93e2a-687b-4ab5-87f1-c7c5af6246e3", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "8f51351e-d039-4bdd-b34b-bc5afc5d9ba5", "metadata": {}, "source": [ "Now make a histogram of your simulated $d$ values. Make your histogram a PDF. Plot the nominal value of $d$ as a vertical line (you can use plt.axvline). Is the nominal value the same as the most likely value of $d$?" ] }, { "cell_type": "code", "execution_count": null, "id": "1380d78b-e941-4dba-8dd8-7075d5c3ee7c", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "f3b6de62-f7da-462c-8203-c90a31c0a519", "metadata": {}, "source": [ "Now use the histogram feature to make a CDF. What is the median value of $d$? How does it compare to the most likely and the nominal value?" ] }, { "cell_type": "code", "execution_count": 25, "id": "b399d717-ac73-4b30-83ce-02056d4172f8", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "code", "execution_count": null, "id": "e4a28534-c301-440d-a5b5-69a86e3c06aa", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }