Monte Carlo simulation of a simple experiment using scipy.stats tools

Question: I flip a fair coin 100 times. What is the probability that I get 60 or more "heads"?

One way to answer this question is to use the known properties of the binomial distribution (see the end of this notebook), but I want to get the answer by using tools from the scipy.stats submodule.

In [37]:
import scipy as sp
from scipy import stats

import matplotlib as mpl       # As of July 2017 Bucknell computers use v. 2.x 
import matplotlib.pyplot as plt
%matplotlib notebook
# As of Aug. 2017 M.L. reverting to 1.x defaults.
# In 2.x text.ustex requires dvipng, texlive-latex-extra, and texlive-fonts-recommended, 
# which don't seem to be universal
# See https://stackoverflow.com/questions/38906356/error-running-matplotlib-in-latex-type1cm?
mpl.style.use('classic')
        
# M.L. modifications of matplotlib defaults using syntax of v.2.0 
# More info at http://matplotlib.org/2.0.0/users/deflt_style_changes.html
# Changes can also be put in matplotlibrc file, or effected using mpl.rcParams[]
plt.rc('figure', figsize = (6, 4.5))            # Reduces overall size of figures
plt.rc('axes', labelsize=16, titlesize=14)
plt.rc('figure', autolayout = True)             # Adjusts supblot parameters for new size

Simulating spin flips

I am going to generate 100 random 0's and 1's, and let 1 represent a "heads" and 0 represent a "tails."

In [38]:
n_flips = 100 # Number of spin flips
data = sp.stats.randint.rvs(0, 2, size=n_flips)
print(data)
[0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 0 0 0 0 0 0 1 1 0 1 0 1 1 1 1 0 1 1 0 0 1 1
 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 1 1 0 1 1 1 0 0 1 0 0 0 0 1 1 0 1
 0 0 1 0 1 0 1 1 0 1 0 0 1 0 1 1 0 0 1 1 1 1 0 0 0 0]

I can count the "heads" by summing the array data.

In [39]:
n_heads = sp.sum(data)
print(n_heads)
51

Simulating many trials of 100 spin flips, to see how many times I get 60 or more heads

In [40]:
n_expts = 100000               # Number of experiments to simulate
results = sp.zeros(n_expts)  # Create array in which to store results of experiments

for i in range(n_expts):
    results[i] = sp.sum(sp.stats.randint.rvs(0, 2, size=n_flips))

Histogram of results

In [41]:
nbins = 101
low = -.5
high = 100.5
plt.figure(1)
plt.xlabel("# of heads")
plt.ylabel("occurences")
out = plt.hist(results, nbins, [low,high])
plt.xlim(30,70)
Out[41]:
(30, 70)

Find average number of "heads"; should be close to 50

In [42]:
sp.mean(results)
Out[42]:
49.97569

Count experiments which give 60 or more "heads," Version I)

In [43]:
success = 0
for i in range(len(results)):
    if results[i] > 59:
        success += 1
In [44]:
print("number of heads =",success, ", probability =",success/n_expts)
number of heads = 2893 , probability = 0.02893

Count experiments which give 60 or more "heads," Version II)

In [45]:
sum(1 if i>59 else 0 for i in results)
Out[45]:
2893

Using properties of binomial distribution to analyze the same experiment

The probablity of obtaining 60 or more heads is

$$1 - \mbox{probability of obtaining 59 or fewer heads} = 1 - C_{DF}(59)$$

In [46]:
p = 0.5    # probability of getting a "head" in a single trial
n = 100    # number of trials

print("probability =", 1 - sp.stats.binom.cdf(59, n, p))
probability = 0.0284439668205