{ "cells": [ { "cell_type": "markdown", "id": "a05e2b97-54a9-496f-ade1-784b93f373d8", "metadata": {}, "source": [ "## Physics 310\n", "\n", "### Class 4\n", "\n", "### Tuesday Feb 3, 2027\n", "\n", "Goals:\n", "1. _Gaussian distribution_: Relate PDF to CDF. Discuss frequentist interpretation. Find the probability of measuring values more extreme than a given value, or within a certain range.\n", "2. _Poisson distribution_: Discuss traits of PDF: discreteness, asymmetry. Use number of counts to find rate and uncertainty on rate. Find probability of measuring (less/greater than) a certain number of counts. " ] }, { "cell_type": "markdown", "id": "e43a9546-911d-4c32-b6fc-2881eee618c5", "metadata": {}, "source": [ "We need to import: np, stats, plt." ] }, { "cell_type": "code", "execution_count": null, "id": "8f391d32-8145-41ed-946a-5dfc222eb18f", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from scipy import stats\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "id": "cf808a2c-c31c-4041-b77f-dc65cf460831", "metadata": {}, "source": [ "# 1) Gaussian distribution" ] }, { "cell_type": "markdown", "id": "74d89ffb-76cf-4bf1-a2c8-a24b53d7228f", "metadata": {}, "source": [ "## a) Relate PDF to CDF\n", "\n", "The _cumulative distribution function_ (CDF), denoted $C_{DF}(x)$ in your textbook, is the integral of the _probability distribution (or density) function_ (PDF), denoted $P_{DF}(x)$, from the minimum x-value (for a Gaussian this is typically negative $\\infty$) up to an arbitrary value of $x$.\n", "\n", "**Task**: Modify the following Latex equation to add the limits of integration. \n", "\n", "$$\n", "C_{DF}(x) \\equiv \\int_{?}^{?} P_{DF}(x)\\, dx \n", "$$" ] }, { "cell_type": "code", "execution_count": null, "id": "9aacb551-1dce-4a58-aeea-a0ec8aab475f", "metadata": {}, "outputs": [], "source": [ "# (Not a coding task - modify above Markdown)" ] }, { "cell_type": "markdown", "id": "ffdad05f-f817-4332-be1e-128efa8bf36b", "metadata": {}, "source": [ "Generate a list of closely-spaced x-values from -10 to 20. For a mean of 5 and a standard deviation of 3, generate the PDF (stats.norm.pdf) and the CDF (guess the function name). (Recommend: view help or review last week's lesson to refresh what the arguments are.)" ] }, { "cell_type": "code", "execution_count": null, "id": "6526a4f5-581f-48aa-9668-6af085d27907", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "38d0b8e3-9e48-40da-96a1-2137136c5c86", "metadata": {}, "source": [ "Here is a subplot example that makes a grid of subplots that is 3 wide and 2 high, and puts (hypothetical) data in the upper left plot and center bottom plot:\n", "\n", "```\n", "plt.subplot(231)\n", "plt.plot(x,y1)\n", "\n", "plt.subplot(235)\n", "plt.plot(x,y5)\n", "```\n", "\n", "Your goal: make two vertically aligned subplots, with the PDF on top and the CDF below. Then, tell your neighbor or instructor how the two plots are related. What is the value of the CDF at x=5, and why is that the case?" ] }, { "cell_type": "code", "execution_count": null, "id": "70b83d1d-55e1-4630-b3f7-e10a09354c50", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "7e35e56f-c792-4ac7-b360-ecafea8b87f8", "metadata": {}, "source": [ "## b) H&H example 3.2.2: Likelihood of a single measurement, when true mean is known" ] }, { "cell_type": "markdown", "id": "4f691f40-853a-42df-accd-f5f8b4195291", "metadata": {}, "source": [ "3.3.2 in H&H: A box contains resistors with $R=100\\,\\Omega$, which are known to have a standard deviation of $2\\,\\Omega$." ] }, { "cell_type": "markdown", "id": "0b08546c-e044-42ab-969a-529d02561226", "metadata": {}, "source": [ "### b.i) Probability of getting an extreme data point (\"this distance or further from the mean\")" ] }, { "cell_type": "markdown", "id": "023cda20-b9ca-4f80-863f-49e9f66ac0e3", "metadata": {}, "source": [ "What is the probability of selecting a resistor with a value of 95 $\\Omega$ or less?\n", "\n", "**Task**: Modify the following Latex equation to add the limits of integration, and value of $x$ where the CDF is being evaluated, in order to solve this problem. \n", "\n", "$$\n", "C_{DF}(?) \\equiv \\int_{?}^{?} P_{DF}(x)\\, dx \n", "$$" ] }, { "cell_type": "code", "execution_count": null, "id": "c6793adb-942c-4e0e-8aee-a75d230dc9c0", "metadata": {}, "outputs": [], "source": [ "# (Not a coding task - modify above Markdown)" ] }, { "cell_type": "markdown", "id": "74ec0ba0-9778-405b-bc26-adc46d139e82", "metadata": {}, "source": [ "Now, use the stats CDF function to find this probability. After generating your result, explain your probability in a sentence." ] }, { "cell_type": "code", "execution_count": null, "id": "055dcd49-0d79-4054-8f0a-f0ac16d7eea5", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "eb45fd4e-5d6d-4c2c-9d2d-3e9482f08eec", "metadata": {}, "source": [ "### b.ii) (optional) Probability of data lying in a certain range" ] }, { "cell_type": "markdown", "id": "d8f56f6a-f797-44c8-8974-cd2c5ef56485", "metadata": {}, "source": [ "(Do this part if you finished above tasks before instructor reviews, and have extra time.)\n", "\n", "What is the probability of finding a resistor in the range 99-101 $\\Omega$?\n", "\n", "**Task**: Modify the following Latex equation to replace the ? marks, in order to solve this problem. \n", "\n", "\\begin{eqnarray*}\n", "P &=& \\int_{?}^{?} P_{DF}(R)\\, dR\\\\\n", " &=& \\int_{-\\infty}^{?}P_{DF}(R)\\, dR - \\int_{-\\infty}^{?} P_{DF}(R)\\, dR \\\\\n", " &=& C_{DF}(?) - C_{DF}(?)\n", "\\end{eqnarray*}\n" ] }, { "cell_type": "code", "execution_count": null, "id": "741b703e-77e9-42e4-8887-99ddad874f09", "metadata": {}, "outputs": [], "source": [ "# (Not a coding task - modify above Markdown)" ] }, { "cell_type": "markdown", "id": "c53d436d-c460-48f1-a71b-5e6584c4fdb3", "metadata": {}, "source": [ "Now, use the stats CDF function to find this probability. After generating your result, explain your probability in a sentence." ] }, { "cell_type": "code", "execution_count": null, "id": "9e8c25db-8c86-4aaf-ac75-dd038c7e5141", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "1d4470f8-1ba2-478d-a063-aedd55f0f9e6", "metadata": {}, "source": [ "## c) Frequentist interpretation" ] }, { "cell_type": "markdown", "id": "256fa182-b8e8-4168-a74c-2e9ce900efa4", "metadata": {}, "source": [ "In physics, two schools of thought about statistics are _frequentist_ and _Bayesian_. They overlap a lot in practice but have different underlying views of what the statistics mean. We will mostly interpret our statistics from a frequentist perspective.\n", "\n", "**Frequentist view of statistics:** I ran this experiment once (and probably collected multiple data points as part of it). If I ran this same experiment a million times, what range of outcomes would I get?\n", "\n", "This week, we will apply that lesson to comparing a theoretical predicted value (assumed to be the ``true mean'' $\\mu$ of our Gaussian distribution) to a measured value $d$ (for \"data\").\n", "\n", "**Frequentist view of comparing theoretical value to model value:** If I ran this experiment a million times, and the true value is the theoretical prediction, then what percent of the time would I get results as extreme as my measurement from this experiment?" ] }, { "cell_type": "code", "execution_count": null, "id": "a65fde39-9084-4794-bb7a-26c9f7e3794b", "metadata": {}, "outputs": [], "source": [ "# read above text! ask questions!" ] }, { "cell_type": "markdown", "id": "9eede46f-6a0e-4bab-8198-9ecc55f8a15a", "metadata": {}, "source": [ "Now let's do a modified version of the H&H resistor problem. The manufacturer's box says that their resistors have a $100\\,\\Omega$ resistance (the \"true value\") but you're not so sure. You measure 15 resistors and get a mean resistance of $102.30\\,\\Omega$ and a standard deviation of $2.1\\,\\Omega$. Does your data contradict the manufacturer's claim that the mean resistance of their resistors is $100.000\\,\\Omega$?\n", "\n", "To test this, a frequentist may ask the following question: \"If the true resistance is 100.000 $\\Omega$ but there is some fluctuation, what is the probability of taking 15 resistors and measuring a mean of 102.30 purely due to random fluctuation? In other words, if I took a giant mountain of all the 100-$\\Omega$ resistors made by this manufacturer, and I conducted the experiment a million times, each time drawing 15 resistors and measuring the mean, in what fraction of experiments would I measure a mean resistance that is at least 2.30 off from 100.000?\"\n", "\n", "This is known as a \"2-tailed test\". You want to know the probability that your data would be at least 2.30 off in any direction: less than 97.70, or more than 102.30.\n", "\n", "**Task:** To generate a CDF for the outcomes of running the experiment a million times (drawing 15 each time), given the above statement, what should I use as:\n", "* the true mean of the distribution?\n", "* the standard deviation?\n", "\n", "Careful: Both answers are tricky!\n" ] }, { "cell_type": "code", "execution_count": null, "id": "700e4a24-9870-4dea-b96a-0403262356e6", "metadata": {}, "outputs": [], "source": [ "# (Discussion question - answers above)" ] }, { "cell_type": "markdown", "id": "44a7de2c-3e14-4074-90ed-9c3093d8e7ae", "metadata": {}, "source": [ "**Task**: Modify the following Latex equation to replace the ? marks.\n", "\n", "$$\n", "\\textrm{Prob}(x < 97.70 \\textrm{ or } x > 102.30) = \\int_{-\\infty}^{?} P_{DF}(x)\\, dx + \\int_{?}^{\\infty} P_{DF}(x)\\, dx\n", " = C_{DF}(?) + [1 - C_{DF}(?)]\n", " = 2 C_{DF}(?)\n", "$$" ] }, { "cell_type": "code", "execution_count": 2, "id": "d1312319-999c-4733-9333-68d1fdeb3e1e", "metadata": {}, "outputs": [], "source": [ "# (modify Markdown above)" ] }, { "cell_type": "markdown", "id": "6a86b0d6-f20c-4446-85a7-cb94c07e886f", "metadata": {}, "source": [ "Now, write code to answer your above question. State your result in a sentence. Do you think the manufacturer's estimate of 100.000 $\\Omega$ is correct for the box of resistors you received?" ] }, { "cell_type": "code", "execution_count": null, "id": "2128dbd6-7dcf-4896-8d6c-5519d7a41f41", "metadata": {}, "outputs": [], "source": [ "# Instructions above" ] }, { "cell_type": "markdown", "id": "4027f27b-c115-49d8-9633-fea7bbbdc19c", "metadata": {}, "source": [ "# 2) Poisson distribution" ] }, { "cell_type": "markdown", "id": "b81ff3d7-a6cb-47cf-9654-3366beb02fe4", "metadata": {}, "source": [ "Example: An X-ray detector on the Swift telescope measures the number of counts from a distant supernova explosion that come in every 10 minutes. Here are the count rates measured in 100 back-to-back 10-minute intervals:\n", "\n", " [4, 1, 2, 0, 0, 2, 1, 3, 1, 1, 3, 1, 3, 2, 4, 2, 3, 1, 1, 0, 1, 1,\n", " 0, 2, 0, 3, 3, 2, 2, 1, 2, 3, 2, 2, 1, 1, 1, 4, 2, 2, 0, 1, 1, 1,\n", " 1, 1, 1, 0, 3, 2, 2, 1, 2, 4, 0, 0, 0, 1, 0, 2, 2, 1, 0, 0, 1, 2,\n", " 3, 2, 2, 2, 2, 3, 2, 3, 0, 3, 0, 4, 1, 2, 0, 1, 1, 4, 2, 1, 3, 2,\n", " 2, 4, 0, 1, 3, 1, 3, 0, 5, 0, 1, 2]" ] }, { "cell_type": "markdown", "id": "a71f77a5-e4ff-41ad-8143-c1b2beba3b08", "metadata": {}, "source": [ "## 2a) Visualize Poisson distribution" ] }, { "cell_type": "markdown", "id": "8b30e0d5-4640-4339-bc77-7749b44f5fa9", "metadata": {}, "source": [ "Plot a histogram of the number of counts. Is this a symmetric or asymmetric distribution? Is it a continuous or discrete distribution? Why?\n", "\n", "*Challenge:* Modify your code to plot the histogram with bins of width 1. np.arange or np.linspace would be useful to help define the bins." ] }, { "cell_type": "code", "execution_count": null, "id": "cc424211-bece-42b6-8b4d-775256454a25", "metadata": {}, "outputs": [], "source": [ "# Instructions above" ] }, { "cell_type": "markdown", "id": "864fc706-e43f-46f9-9622-b411708b898e", "metadata": {}, "source": [ "## 2b) Calculate rate and uncertainty on rate" ] }, { "cell_type": "markdown", "id": "2b89621e-87b8-420c-9b63-4004924cd2b1", "metadata": {}, "source": [ "What is the total number of counts in all 1000 min? What is the uncertainty on that total? Write your answer in a sentence using $\\pm$." ] }, { "cell_type": "code", "execution_count": null, "id": "fa370be6-69c7-4066-8df4-d87691c6ef2c", "metadata": {}, "outputs": [], "source": [ "# Instructions above" ] }, { "cell_type": "markdown", "id": "454ba524-a846-4774-95dd-92be1d0d9c8e", "metadata": {}, "source": [ "What is the average count rate (counts per minute)? What is the uncertainty on that rate? Write your answer in a sentence with proper formatting and units." ] }, { "cell_type": "code", "execution_count": null, "id": "71878657-8346-4b5d-848a-aaf255ceeb3e", "metadata": {}, "outputs": [], "source": [ "# Instructions above" ] }, { "cell_type": "markdown", "id": "bd28fbb1-2dba-4170-8171-357fcee2fd97", "metadata": {}, "source": [ "Notice that you **have to** start by calculating uncertainty on the total number of counts, before you get uncertainty on the rate." ] }, { "cell_type": "markdown", "id": "2fd37dc2-2fc8-47ee-bfb4-663ce49f95ca", "metadata": {}, "source": [ "## 2c) Calculate probability of data lying in a certain range" ] }, { "cell_type": "markdown", "id": "febfdc4f-f8ab-4b82-be8c-f84f8a01d5cd", "metadata": {}, "source": [ "What is the average number of counts in a single measurement, from the data so far? (You'll need this - don't worry about uncertainty on this one.)" ] }, { "cell_type": "code", "execution_count": null, "id": "6b3fcfdd-b762-444d-8d7f-ce0286613b7c", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "ddc72062-666a-43fa-9970-728c4dfaa451", "metadata": {}, "source": [ "The CDF of the Poisson distribution is discrete: a sum of the PDF from the minimum possible value for a data point (which is what???) up to an arbitrary value $x$.\n", "\n", "**Task:** Fill in the question marks for the start and end value of the sum:\n", "\n", "$$\n", "C_{DF}(x) \\equiv \\sum_{n=?}^{?} P_{DF}(n) \n", "$$" ] }, { "cell_type": "code", "execution_count": null, "id": "75ddf41c-7f3b-4edd-b37d-74ffda6ea4ff", "metadata": {}, "outputs": [], "source": [ "# (modify Markdown above)" ] }, { "cell_type": "markdown", "id": "f6606ed4-3ef0-42d2-a224-6ce6e129cae7", "metadata": {}, "source": [ "Exercise: If you took a million more 10-minute measurements, what fraction of them would be 2 or less?\n", "\n", "**Task:** Fill in the question marks for the CDF x-value and the start and end value of the sum:\n", "\n", "$$\n", "C_{DF}(?) = \\sum_{n=?}^{?} P_{DF}(n) \n", "$$\n" ] }, { "cell_type": "code", "execution_count": null, "id": "32096981-a712-414e-bc7c-6c80976ba1d7", "metadata": {}, "outputs": [], "source": [ "# (modify Markdown above)" ] }, { "cell_type": "markdown", "id": "3f8cb6ef-5aaf-4605-849b-cb0f309614d9", "metadata": {}, "source": [ "Now write code to determine this probability. (Guess the name of the cdf function: it's a similar format to stats.norm.cdf. Look at the help - it takes fewer parameters than the Gaussian version!)" ] }, { "cell_type": "code", "execution_count": null, "id": "360d5594-66ea-4a63-9b04-31ec9914bc09", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "80c905bc-c09a-4467-a11f-334a5f09713f", "metadata": {}, "source": [ "Exercise: In future 10-minute measurements, what is the probability of obtaining a measurement of 10 or more counts?\n", "\n", "**Task:** Fill in the question marks to solve this problem:\n", "\n", "$$\n", "Prob (x\\geq10) =\n", "\\sum_{n=?}^{\\infty} P_{DF}(n) \n", "= 1 - \\sum_{n=0}^{?} P_{DF}(n)\n", "= 1 - C_{DF}(?)\n", "$$\n", "\n", "Be careful about what number is going into the CDF!" ] }, { "cell_type": "code", "execution_count": null, "id": "84943c36-234e-4a08-b467-c0d0a1abbf52", "metadata": {}, "outputs": [], "source": [ "# (modify Markdown above)" ] }, { "cell_type": "markdown", "id": "c0568bbd-1e6b-47de-8785-4db62bdb2881", "metadata": {}, "source": [ "Now write code to determine this probability." ] }, { "cell_type": "code", "execution_count": null, "id": "6b56defa-b80c-4636-881f-4cc5617ebc6f", "metadata": {}, "outputs": [], "source": [ "# Instructions above" ] }, { "cell_type": "markdown", "id": "cd058f00-ed90-4878-a635-c865af73ff48", "metadata": {}, "source": [ "Question: I flip a fair coin 100 times. What is the probability that I get 60 or more \"heads\"?\n", "\n", "You can answer this question using the binomial probability distribution, which we have not studied, so instead we will answer it by simulating this experiment a bunch of times." ] }, { "cell_type": "markdown", "id": "b8edcc81-248a-4fc0-adac-d6ad7f3831b3", "metadata": {}, "source": [ "## 3a) Simulating a single run (\"trial\") of the experiment: 100 coin flips\n", "\n", "Generate 100 random 0's and 1's, and let 1 represent a \"heads\" and 0 represent a \"tails.\" (Useful function: stats.randint.rvs.) Save the output into a variable and print it." ] }, { "cell_type": "code", "execution_count": null, "id": "8fc4b83f-ce7a-471e-a8aa-8ecaa226d3f1", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "a6a7e3fb-38cc-4acf-8000-e5c6fbaf2117", "metadata": {}, "source": [ "Now count the heads using np.sum, and print your result." ] }, { "cell_type": "code", "execution_count": null, "id": "26972472-9c4c-4a28-a420-e04f86645224", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "02da46cb-4a8c-4f78-b1ef-bc45bb2984a0", "metadata": {}, "source": [ "## 3b) Simulating many trials of 100 coin flips, to see how many times I get 60 or more heads" ] }, { "cell_type": "markdown", "id": "999d8e2d-2cc8-4bd0-9910-a54d9b095a7d", "metadata": {}, "source": [ "Edit the code below. Your goal is to run 10000 trials, of 100 coin flips each, and count the number of heads each time." ] }, { "cell_type": "code", "execution_count": null, "id": "ac45647d-8972-4d19-844b-9075634bcc50", "metadata": {}, "outputs": [], "source": [ "n_expts = 10000 # Number of experiments to simulate\n", "results = np.zeros(n_expts) # Create array for results of simulations\n", "\n", "for i in range(n_expts):\n", " # add code here to run a single experiment here and count the heads\n", " \n", " results[i] = # add something here to save the number of heads" ] }, { "cell_type": "markdown", "id": "a6f95205-5cb7-462e-8b10-b89eff910103", "metadata": {}, "source": [ "## 3c) Inspect your results to see if they make sense" ] }, { "cell_type": "markdown", "id": "0c5cfa21-51f9-4cd3-92b1-d3ba84cdef37", "metadata": {}, "source": [ "Calculate the mean value of number of heads you obtained. Plot a histogram of your results." ] }, { "cell_type": "code", "execution_count": null, "id": "dd69a201-3652-406c-9979-dc59fc70bc21", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "code", "execution_count": null, "id": "91df4291-33ce-4b39-9bc4-f981b97cac56", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "e718908e-733b-4b5e-b613-1674d2ae5bed", "metadata": {}, "source": [ "## 3c) Turn your results into a probability" ] }, { "cell_type": "markdown", "id": "8c45d8be-531b-4ba6-9e8b-3091924e0a25", "metadata": {}, "source": [ "Try writing an inequality to check if the contents of results are 60 or above. Print the results of that inequality: you should see an array of True and False." ] }, { "cell_type": "code", "execution_count": null, "id": "31d6fede-e722-4ed4-b273-09fa5205176b", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "8dfbb798-01af-4752-beba-4d9c858dc682", "metadata": {}, "source": [ "Now try taking the sum of that array using numpy. Summing treats True as 1 and False as 0. What do you get? What does it mean?" ] }, { "cell_type": "code", "execution_count": null, "id": "475c8791-c1d4-4721-bac6-ec597276808b", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] }, { "cell_type": "markdown", "id": "9b795f31-cecc-4bf7-a2fd-63584967043a", "metadata": {}, "source": [ "Turn this into a probability. Based on your Monte Carlo simulation, what is the probability of obtaining 60 or more heads?" ] }, { "cell_type": "code", "execution_count": null, "id": "4b4ebeb0-b831-4d4b-a824-0ebd6cb0ee41", "metadata": {}, "outputs": [], "source": [ "# instructions above" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }