{
"cells": [
{
"cell_type": "markdown",
"id": "74976eec-e9a9-46b1-824d-01dad15478bc",
"metadata": {
"cell_style": "center",
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"# Lecture 2\n",
"\n",
"---\n",
"\n",
"## Probability density functions and confidence intervals\n",
"\n",
"<br>\n",
"<br>\n",
"\n",
" Hartmut Stadie\n",
"\n",
"hartmut.stadie@uni-hamburg.de"
]
},
{
"cell_type": "markdown",
"id": "2552c4a9",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"# Examples of probability density functions"
]
},
{
"cell_type": "markdown",
"id": "0383c396",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"## Discrete distributions\n",
"\n",
"### Binomial distribution\n",
"\n",
"Consider a series of independent tries, each having two possible outcomes where $p$ is the probability for success.<br>\n",
"The probability for $k$ success with $n$ tries is given by the **binomial distribution**:\n",
"$$P(k) = {n \\choose k} p^k(1-p)^{n-k} \\text{, } k = 0,1,2...n$$\n",
"\n",
"Expectation value and variance\n",
"$$<k> = E[k] = \\sum \\limits_{k = 0}^{n} k P(k) = np$$\n",
"$$V[k] = \\sigma^2 = np(1-p)$$"
]
},
{
"cell_type": "markdown",
"id": "9350625b",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example "
]
},
{
"cell_type": "markdown",
"id": "2039f885",
"metadata": {
"cell_style": "split",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"Tossing five coins $n = 5$, $p = 0.5$ \n",
"\n",
"| k | 0 | 1 | 2 | 3 | 4 | 5 |\n",
"|:-----|:----:|:----:|:-----:|:-----:|:----:|:----:|\n",
"| P(k) | 1/32 | 5/32 | 10/32 | 10/32 | 5/32 | 1/32 |\n",
"\n",
"\n",
"\n",
"<br>\n",
"\n",
"\n",
"plot using [`scipy.stats.binom`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom.html)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "662bb993",
"metadata": {
"cell_style": "split"
},
"outputs": [],
"source": [
"import numpy as np\n",
"import scipy\n",
"import matplotlib.pyplot as plt\n",
"\n",
"ks = np.arange(0,5.1)\n",
"plt.plot(ks, scipy.stats.binom.pmf(ks, 5, 0.5),'+')\n",
"plt.xlabel(\"$k$\")\n",
"plt.ylabel(\"$P(k)$\")"
]
},
{
"cell_type": "markdown",
"id": "7ec58fe0",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Exercise II \n",
"\n",
"Statistical uncertainty on efficiency of a device/selection criterion.\n",
"<p>\n",
"Estimate the efficiency and its uncertainty from a sample with $n$ events where $k$ events are selected. \n",
"\n",
"Random variable is the measured efficiency: $h_k = \\frac{k}{n}$. \n",
"<br>\n",
"How large is the uncertainty?\n",
"<br>\n",
" \n",
"Assumption (maybe wrong): $k$ is binomially distributed with $p_k = E[h_k] = E[\\frac{k}{n}]$: \n",
" $$\\begin{aligned}\n",
" \\sigma(h_k) &= &\\sqrt{V[\\frac{k}{n}]} = \\sqrt{\\frac{1}{n^2} V[k]} = \\sqrt{\\frac{1}{n^2}\\cdot np_k(1-p_k)}\\\\\n",
" &=& \\sqrt{\\frac{p_k(1-p_k)}{n}}\\\\ \n",
"\\end{aligned}$$"
]
},
{
"cell_type": "markdown",
"id": "1180ce3f",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Poisson distribution\n",
"\n",
"Consider the binomial case for very large $n$, very small $p$ and $Np$ remains equal to some finite value $\\mu$:\n",
"$$P(k) = \\frac{\\mu^ke^{-\\mu}}{k!}$$\n",
"\n",
"Expectation value and variance:\n",
"$$E[k] = \\sum \\limits_{k = 1}^{\\infty} k \\frac{e^{-\\mu}\\mu^k}{k!}\n",
" = \\mu \\sum \\limits_{k = 1}^{\\infty} k \\frac{e^{-\\mu}\\mu^{k-1}}{(k-1)! k}\n",
" = \\mu \\sum \\limits_{s = 0}^{\\infty} \\frac{e^{-\\mu}\\mu^{s}}{s!} = \\mu$$\n",
"$$V[k] = \\sigma^2 = \\mu$$"
]
},
{
"cell_type": "markdown",
"id": "06c5b8a1",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Poisson and binomial distributions\n",
"\n",
"binomial distribution with $n= 1000$ and $p = 0.01$<br>\n",
"poisson distribution with $\\mu = 10$ (hatched) \n",
"\n",
"<img src=\"./figures/08/bp.jpg\" width=\"55.0%\"\n",
"alt=\"image\" />"
]
},
{
"cell_type": "markdown",
"id": "0d43ae1c",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Example from old statistic books"
]
},
{
"cell_type": "markdown",
"id": "dd995e3f",
"metadata": {
"cell_style": "split",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"Deaths from horse kicks in the Prussian army\n",
"\n",
"The deaths from horse kicks were recorded for each year and army corps (?). \n",
"\n",
"<br>\n",
"For 20 years (1875-1894) and 14 army corps, one gets:\n",
"\n",
"\n",
"| number of deaths $k$ | 0 | 1 | 2 | 3 | 4 | 5 | 6 |\n",
"|:------------------------------------------|----:|----:|----:|----:|----:|----:|----:|\n",
"| number of corps-years with $k$ death | 144 | 91 | 32 | 11 | 2 | 0 | 0 |"
]
},
{
"cell_type": "markdown",
"id": "fce70dd5",
"metadata": {
"cell_style": "split",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"<img src=\"./figures/08/poisson70.png\" width=\"80.0%\" />\n",
"\n",
"poisson distribution for $\\mu = \\frac{196}{280} = 0.70$ and data with (wrong) error bars"
]
},
{
"cell_type": "markdown",
"id": "b3097b69-91ec-413e-935e-e310c96d5c25",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"## Continuous distributions\n",
"\n",
"### Normal distribution\n",
"\n",
"Normal- oder Gaussian distribution\n",
"$$f(x) = \\frac{1}{\\sqrt{2\\pi}\\sigma}e^{-\\frac{(x-\\mu)^2}{2\\sigma^2}}$$\n",
"\n",
"Expectation value and variance:\n",
"$$<x> = E[x] = \\mu$$ \n",
"$$V[x] = \\sigma^2$$"
]
},
{
"cell_type": "markdown",
"id": "32da631f",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Standard Gaussian distribution\n",
"\n",
"normal distribution with $\\mu = 0$ and $\\sigma = 1$\n",
"\n",
"<img src=\"./figures/10/gauss_alpha.png\" width=\"60.0%\"\n",
"alt=\"image\" />"
]
},
{
"cell_type": "markdown",
"id": "f7050597-71f8-45a4-a4ac-64567d17827a",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"Probability for some intervals:\n",
"\n",
"| | | |\n",
"|:----------------------|---------------------------------:|------------:|\n",
"| $|x-\\mu| \\ge \\sigma$ | (x außerhalb $\\pm 1\\sigma$) ist: | $31{,}74$ % |\n",
"| $|x-\\mu| \\ge 2\\sigma$ | (x außerhalb $\\pm 2\\sigma$) ist: | $4{,}55$ % |\n",
"| $|x-\\mu| \\ge 3\\sigma$ | (x außerhalb $\\pm 3\\sigma$) ist: | $0{,}27$ % |"
]
},
{
"cell_type": "markdown",
"id": "b5910862-3a33-46be-91e0-868a3093cd48",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Multivariate normal distribution \n",
"\n",
"$$f_{X}(\\vec x )= \\frac {1}{\\sqrt {(2\\pi )^{p}\\det({C)}}}\\exp \\left(-{\\frac {1}{2}}(\\vec x- \\vec\\mu)^{\\top}C^{-1}(\\vec x - \\vec \\mu)\\right)$$\n",
"\n",
"<img src=\"./figures/10/Multivariate_Gaussian.png\"\n",
"width=\"60.0%\" alt=\"image\" />"
]
},
{
"cell_type": "markdown",
"id": "39087e01",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### 2-D multivariate normal distribution\n",
"\n",
"$$\\vec\\mu = (\\bar x, \\bar y) \\text{ und } C = \\left( \n",
" \\begin{array}{rr} \n",
" \\sigma_x ^2 & \\rho \\sigma_x \\sigma_y \\\\ \n",
" \\rho \\sigma_x \\sigma_y & \\sigma_y^2\\\\ \n",
" \\end{array} \\right)$$\n",
"\n",
"<img src=\"./figures/10/error_elipse.png\" width=\"70%\" alt=\"image\" />"
]
},
{
"cell_type": "markdown",
"id": "bf784648-04e4-4d6b-a6d0-e85f4210ee1d",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Poisson- Binomial- und Gauß-Verteilung \n",
"\n",
"binomial distribution with $n= 1000$ and $p = 0.01$<br> \n",
"poisson distribution with $\\mu = 10$(schraffiert)<br>\n",
"Gaussian distribution with $\\mu = 10$, $\\sigma = \\sqrt{10}$\n",
"\n",
"<img src=\"./figures/08/bpg.jpg\" width=\"55.0%\"\n",
"alt=\"image\" />"
]
},
{
"cell_type": "markdown",
"id": "91e4ed52",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Uniform distribution\n",
"\n",
"Uniform distribution: the probability density function f(x) is constant and equal to \n",
"$\\frac{1}{b-a}$ for $a \\leq x \\leq b$ and zero outside the interval."
]
},
{
"cell_type": "markdown",
"id": "a3b105a2-4332-46f7-a423-df435d4b06e8",
"metadata": {
"slideshow": {
"slide_type": "fragment"
},
"tags": []
},
"source": [
"Expectation value and variance:\n",
"$$<x> = E[x] = \\int_a^b \\frac{x}{b-a}\\, dx = \\frac{1}{2(b-a)} [b^2-a^2] = \\frac{a + b}{2}$$\n",
"\n",
"$$\\begin{aligned}\n",
" V[x] & = &\\sigma^2 = E[(x-<x>)^2] = E[x^2] - <x>^2 \\\\\n",
" & = & \\int_a^b \\frac{x^2}{b-a}\\, dx - <x>^2 = \\frac{b^3 - a^3}{3(b-a)} - \\frac{(a+b)^2}{4}\\\\\n",
" & = & \\frac{b^2+ab+a^2}{3}-\\frac{a^2 + 2ab + b^2}{4} = \\frac{(b-a)^2}{12} \\\\ \n",
"\\end{aligned}$$"
]
},
{
"cell_type": "markdown",
"id": "dfb30ccb",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Example: strip detectors"
]
},
{
"cell_type": "markdown",
"id": "4b00e1a6",
"metadata": {
"cell_style": "split",
"slideshow": {
"slide_type": "-"
},
"tags": []
},
"source": [
"<img src=\"./figures/10/cms_strip.jpg\" width=\"80.0%\"\n",
"alt=\"image\" />"
]
},
{
"cell_type": "markdown",
"id": "ddd2a24f",
"metadata": {
"cell_style": "split",
"slideshow": {
"slide_type": "-"
},
"tags": []
},
"source": [
"<img src=\"./figures/10/strip.png\" width=\"80.0%\"\n",
"alt=\"image\" /> "
]
},
{
"cell_type": "markdown",
"id": "2acc164e-255b-43f4-93ee-d569d0bb9e50",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": []
},
"source": [
"one measurement: signal in strip $i$, $p(x)$ is uniformly distributed between\n",
"$b_i$ und $a_i$ \n",
"\n",
"How is a combination of measurements distributed?"
]
},
{
"cell_type": "markdown",
"id": "e4d5dad2-983e-4f8f-b305-fb0a787282d6",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"### Zentraler Grenzwertsatz \n",
"\n",
"Zentraler Grenzwertsatz: Die Wahrscheinlichkeitsdichte der Summe\n",
"$\\sum_{i=0}^{n} x_i$ einer Stichprobe aus $n$ unabhängigen\n",
"Zufallsvariablen $x_i$ mit einer beliebigen Wahrscheinlichkeitsdichte\n",
"mit Mittelwert $<x>$ und Varianz $\\sigma^2$ geht in der Grenze\n",
"$n \\to \\infty$ gegen eine Gau\"s-Wahrscheinlichkeitsdichte mit Mittelwert\n",
"$\\mu = n \\cdot <x>$ und Varianz $V[w] = n \\cdot \\sigma^2$.\n",
"\n",
"<embed src=\"./figures/08/Summe-von-Gleichverteilungen3.pdf\"\n",
"style=\"width:60.0%\" /> "
]
},
{
"cell_type": "markdown",
"id": "fb2a4c3c",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## What is meant with error/uncertainty on a measured quantity?"
]
},
{
"cell_type": "markdown",
"id": "ffd68924",
"metadata": {},
"source": [
"If we quote $a = 1 \\pm 0.5$, we usually mean that the probability for the *true* value of $a$ is Gaussian $G(a, \\mu, \\sigma)$ distributed with $\\mu = 1$ and $\\sigma = 0.5$. "
]
},
{
"cell_type": "markdown",
"id": "f8037279",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Exercise: How often can/should the measurement be outside one sigma?\n",
"\n",
"Let's use pseudo-experiments/Monte Carlo:\n",
"\n",
" * generate 10.000 Gaussian distributed measurements\n",
" * count how ofter they differ by more than one sigma\n",
" \n",
" Relatively easy with *scipy* and *numpy*:\n",
" * use [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html)\n",
" * use [scipy.stats.norm](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html) class\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9b4557d9",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "a92fa7d1",
"metadata": {
"slideshow": {
"slide_type": "notes"
}
},
"source": [
"import scipy.stats as stats\n",
"import numpy as np\n",
"\n",
"pseudo_a = stats.norm.rvs(1, 0.5, 10000)\n",
"print(pseudo_a)\n",
"is_outside = abs(pseudo_a - 1) > 0.5\n",
"print(is_outside)\n",
"print(\"fraction outside one sigma:\", sum(is_outside)/len(pseudo_a)) "
]
},
{
"cell_type": "markdown",
"id": "54d58609",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Why is it a Gaussian?\n",
"\n",
"Central limit theorem:\n",
"\n",
"\"let $X_{1},X_{2},\\dots ,X_{n}$ denote a statistical sample of size $n$ from a population with expected value (average) $\\mu$ and finite positive variance $\\sigma ^{2}$, and let $\\bar {X_{n}}$ denote the sample mean (which is itself a random variable). Then the limit as $n\\to \\infty$ of the distribution of $\\frac {({\\bar {X}}_{n}-\\mu )}{\\frac {\\sigma }{\\sqrt {n}}}$, is a normal distribution with mean 0 and variance 1.\"\n"
]
},
{
"cell_type": "markdown",
"id": "977b88f5-3cb7-445b-adac-f44be4d69c90",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"## Back to the Bundesliga"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "641ec6cc-6f1a-4399-b652-7e3da333782a",
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"data = np.loadtxt(\"./exercises/09_data.txt\")\n",
"summe = data[:,0] + data[:,1]\n",
"freq = plt.hist(summe, bins=10, range=(-0.5,9.5))\n",
"plt.xlabel(\"k\")\n",
"print(freq)"
]
},
{
"cell_type": "markdown",
"id": "e6bcf92b-386d-492b-9f09-e29118e93505",
"metadata": {
"slideshow": {
"slide_type": "fragment"
},
"tags": []
},
"source": [
"In 18 matches exactly six goals were scored.<br>\n",
"How large is the error on this 18?<br>\n",
"What is a 68,27\\%-confidence interval for the 6-goals bin?"
]
},
{
"cell_type": "markdown",
"id": "8c4a1016-7fc9-4eb6-b515-b51c0d16fed8",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": []
},
"source": [
"naively: $k=18$, poisson distribution with $\\hat \\mu = 18$ and $\\sigma = \\sqrt{\\hat \\mu} = \\sqrt{18}$: $18\\pm 4.12$"
]
},
{
"cell_type": "markdown",
"id": "b26cd06e-7700-4011-b5d8-6bf56e22489f",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Confidence: \n",
"$P(x_- \\le x \\le x_+) = \\int_{x_-}^{x^+} P(x) dx$"
]
},
{
"cell_type": "markdown",
"id": "1903f299-4d16-41b0-bb48-5f17a07a9006",
"metadata": {
"cell_style": "center",
"slideshow": {
"slide_type": "fragment"
},
"tags": []
},
"source": [
"### However, what is $x$? $\\mu$ or $k$???"
]
},
{
"cell_type": "markdown",
"id": "b54b11f9-e897-4d2a-a5df-961c16cdbb3f",
"metadata": {
"cell_style": "split",
"slideshow": {
"slide_type": "fragment"
},
"tags": []
},
"source": [
"Confidence for $P(k=18) = 1$ (our measurement!) \n",
"\n",
"We need the confidence interval for $\\mu$.\n",
"\n",
"\n",
"Hence: $P(\\mu_- \\le \\mu \\le\\mu_+) = \\frac{\\int_{\\mu_-}^{\\mu_+} P(k, \\mu) d\\mu}{\\int_{0}^{\\infty} P(k, \\mu) d\\mu}$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5052bdb2-6e4b-4ea6-b08e-ba22f2a4a3d2",
"metadata": {
"cell_style": "split",
"slideshow": {
"slide_type": "fragment"
},
"tags": []
},
"outputs": [],
"source": [
"import scipy\n",
"\n",
"mus = np.linspace(0,50,1000)\n",
"plt.plot(mus, scipy.stats.poisson.pmf(18,mus))\n",
"plt.xlabel(r\"$\\mu$\")\n",
"scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(18,x), 0, 100)"
]
},
{
"cell_type": "markdown",
"id": "d278b736-1f45-4b6c-8e1a-c851bb618aeb",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Test of naive interval"
]
},
{
"cell_type": "markdown",
"id": "71c0df91-c1e0-4fbf-a94b-fd88f839ef07",
"metadata": {
"slideshow": {
"slide_type": "-"
},
"tags": []
},
"source": [
"$P(\\mu_- \\le \\mu \\le\\mu_+) = \\int_{\\mu_-}^{\\mu_+} P(k, \\mu) d\\mu$"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c595233d",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "7c461ddf-0cb8-49b0-adc6-a7ae6f11212b",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"outputs": [],
"source": [
"scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(18,x), 18-np.sqrt(18), 18+np.sqrt(18))"
]
},
{
"cell_type": "markdown",
"id": "e34f6d9d",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"...Ok-ish..."
]
},
{
"cell_type": "markdown",
"id": "88f176b2-deb2-4212-b4ba-d86b7964e545",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"## Example from Publication:\n",
"\"Search for flavor changing neutral currents in decays of top quarks\" (D0)\n",
"\n",
"Physics Letters B 701 (2011), pp. 313-320\n",
"[https://arxiv.org/abs/1103.4574]\n",
"\n",
"<img src=\"./figures/12/ht_bq_1jets_comb_5pc.png width=\"100%\" \\>"
]
},
{
"cell_type": "markdown",
"id": "5135b8e6-711b-440e-8507-f049d017c137",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Let's try another bin\n",
"\n",
"1 match had exactly seven goals: \n",
"\n",
"Naive confidence interval: $1\\pm 1$\n",
"\n",
"Test it:\n",
" - draw the Poisson probability $P(1,\\mu$ vs. \\mu\n",
" - compute the confidence for the interval $[0, 2]$ numerically with [scipy.integrate.quad](https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.quad.html)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "06885d9b",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "e6102fec-1275-439a-b6b4-8cb1ee4960fc",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"outputs": [],
"source": [
"mus = np.linspace(0,10,1000)\n",
"plt.plot(mus, scipy.stats.poisson.pmf(1,mus))\n",
"plt.xlabel(r\"$\\mu$\");"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a0595e97",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(1,x), 1-np.sqrt(1), 1+np.sqrt(1))"
]
},
{
"cell_type": "markdown",
"id": "2c24eace",
"metadata": {
"jupyter": {
"source_hidden": true
},
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"too small!"
]
},
{
"cell_type": "markdown",
"id": "3139be1a-8a4d-4ae9-9fa7-aa99d7efffe6",
"metadata": {
"jupyter": {
"source_hidden": true
},
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Let's find a better interval: $[\\mu_-, \\mu_+]$\n",
"\n",
"here:\n",
"$\\mu_- = 0$\n",
"\n",
"\n",
"find $\\mu_+$, so that the integral from $[0,\\mu_+]$ is $0.6827$:\n",
" - define a function that returns the integral for a given $mu_+$\n",
" - plot the function \n",
" - use root finding with [scipy.optimize.brentq](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.brentq.html) to find the value of $\\mu_+$ where the integral minus 0.6827 is 0."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2d187a99",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "630c83e0-2be4-41dd-a124-2ac672a2c507",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"def intervall(mu_plus):\n",
" return scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(1,x), 0, mu_plus)[0]\n",
"\n",
"mus = np.linspace(0,10,100)\n",
"plt.plot(mus, np.vectorize(intervall)(mus))\n",
"plt.grid()\n",
"plt.xlabel(r\"$\\mu$\")\n",
"\n",
"scipy.optimize.brentq(lambda x: intervall(x)-0.6827, 0,10)"
]
},
{
"cell_type": "markdown",
"id": "f20e18c3",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"Korrektes Intervall: $[0; 2.36]$"
]
},
{
"cell_type": "markdown",
"id": "1c4ad7b4-bbdb-4f9d-a795-0e2dae39a181",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### What is now the correct interval for the six-goals bin?\n",
"\n",
"Which one?\n",
"\n",
"symmetric around $\\mu$ or...\n",
"\n",
"e.g. symmetric in the confidence:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5fffd76e-14da-4854-8af0-5262e0c3c9a3",
"metadata": {
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"def intervall_minus(mu):\n",
" return scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(18,x), mu, 18)[0]\n",
"\n",
"def intervall_plus(mu):\n",
" return scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(18,x), 18, mu)[0]\n",
"\n",
"print(\"naive:\", 18-np.sqrt(18), 18+np.sqrt(18))\n",
"mu_minus = scipy.optimize.brentq(lambda x: intervall_minus(x)-0.6827/2, 0,18)\n",
"mu_plus = scipy.optimize.brentq(lambda x: intervall_plus(x)-0.6827/2, 18,40)\n",
"print(\"symmetric in P:\", mu_minus, mu_plus) \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "52f8f412-cb94-4c32-95b6-03f178cbcdb6",
"metadata": {},
"outputs": [],
"source": [
"scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(18,x), mu_minus, mu_plus)"
]
},
{
"cell_type": "markdown",
"id": "f7ef04e8-1dc9-4e8b-b597-39d277bde591",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Confidence intervals for Poisson"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e0a6084d-dd5d-4ffc-8849-a61a8327fb4e",
"metadata": {
"cell_style": "split",
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"def intervall_minus(k, mu):\n",
" return scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(k,x), mu, k)[0]\n",
"\n",
"def intervall_plus(k, mu):\n",
" return scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(k,x), k, mu)[0]\n",
"\n",
"def conv_limits(k, c):\n",
" c_low = intervall_minus(k, 0)\n",
" c_up = c / 2\n",
" if c_low < c/2:\n",
" mu_minus = 0\n",
" else:\n",
" c_low = c / 2\n",
" mu_minus = scipy.optimize.brentq(lambda x: intervall_minus(k, x)-c_low, 0,k)\n",
" c_up = c - c_low\n",
" mu_plus = scipy.optimize.brentq(lambda x: intervall_plus(k, x)-c_up, k, 2*k + 20)\n",
" return mu_minus, mu_plus\n",
"\n",
"conv_limits(3, 0.6827)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "382e5a13-5f5f-44e5-90dd-7272fbdb171a",
"metadata": {
"cell_style": "split"
},
"outputs": [],
"source": [
"ks = np.arange(0, 20)\n",
"\n",
"limits = np.zeros((len(ks), 2))\n",
"for i,k in enumerate(ks):\n",
" limits[i] = conv_limits(k, 0.6827)\n",
"#print(limits)\n",
"plt.plot(ks, limits[:,0], 'b_')\n",
"plt.plot(ks, limits[:,1], 'r_')\n",
"plt.plot(ks, ks-np.sqrt(ks), 'b.')\n",
"plt.plot(ks, ks+np.sqrt(ks), 'r.')\n",
"plt.xlabel(\"$k$\")\n",
"plt.ylabel(r\"$\\mu$\")\n",
"plt.grid()\n",
"\n",
"print(limits[2])\n"
]
},
{
"cell_type": "markdown",
"id": "61d94d51-fc1c-4ddb-9cbc-c416a28158f8",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"## Confidence intervals for Gaussian distribution"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cb5826ae-29df-41c3-ad8f-dde61b5d90ba",
"metadata": {
"cell_style": "split"
},
"outputs": [],
"source": [
"def intervall_minus(x, mu, sigma):\n",
" return scipy.integrate.quad(lambda y: scipy.stats.norm.pdf(x,y, sigma), mu, x)[0]\n",
"\n",
"def intervall_plus(x, mu, sigma):\n",
" return scipy.integrate.quad(lambda y: scipy.stats.norm.pdf(x, y, sigma), x, mu)[0]\n",
"\n",
"def conv_limits(x, c, sigma):\n",
" c_low = c / 2\n",
" mu_minus = scipy.optimize.brentq(lambda mu: intervall_minus(x, mu, sigma)-c_low, x - 10*sigma,x)\n",
" c_up = c - c_low\n",
" mu_plus = scipy.optimize.brentq(lambda mu: intervall_plus(x, mu, sigma)-c_up, x, x + 10*sigma)\n",
" return mu_minus, mu_plus\n",
"\n",
"conv_limits(3, 0.6827, 1)\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e87da118-0c11-4345-a656-096de6f980e9",
"metadata": {
"cell_style": "split",
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"xs = np.linspace(0,20,100)\n",
"sigma = 1\n",
"limits = np.zeros((len(xs), 2))\n",
"for i,k in enumerate(xs):\n",
" limits[i] = conv_limits(k, 0.6827, sigma)\n",
"#print(limits)\n",
"plt.plot(xs, limits[:,0], 'b')\n",
"plt.plot(xs, limits[:,1], 'r')\n",
"plt.plot(xs, xs-sigma, 'b.')\n",
"plt.plot(xs, xs+sigma, 'r.')\n",
"plt.xlabel(\"$x$\")\n",
"plt.ylabel(r\"$\\mu$\")\n",
"plt.grid()\n",
"\n",
"print(limits[2])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "202144ae-1057-41fc-8757-7f6f40c75868",
"metadata": {},
"outputs": [],
"source": [
"print(scipy.integrate.quad(lambda y: scipy.stats.norm.pdf(3, y, 1), 2, 3)[0], scipy.integrate.quad(lambda x: scipy.stats.norm.pdf(x, 3, 1), 2,3)[0])"
]
},
{
"cell_type": "markdown",
"id": "24359dd4-7289-4235-872e-1138d5fe0b0b",
"metadata": {},
"source": [
"For Gaussian: $G(x, \\mu, \\sigma) = G(\\mu, x, \\sigma)$: $\\int_{\\mu_-}^{\\mu_+} G(x, \\mu, \\sigma) d\\mu = \\int_{\\mu_-}^{\\mu_+} G(\\mu, x, \\sigma) d\\mu$"
]
},
{
"cell_type": "markdown",
"id": "10f482d3-6667-4250-aefa-5df896e46936",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"## Confidence regions\n",
"\n",
"Confidence values for Gaussian distribution and intervals: $[\\mu - z\\sigma, \\mu - z\\sigma]$:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "580b83dc-3cb0-40df-bc77-8ad48d90fce5",
"metadata": {},
"outputs": [],
"source": [
"def conv_gaus(z):\n",
" return scipy.stats.norm.cdf(z) - scipy.stats.norm.cdf(-z)\n",
"\n",
"for z in [1,2,3,4,5]:\n",
" print(z, conv_gaus(z))"
]
},
{
"cell_type": "markdown",
"id": "3904ad68-2b4d-4207-a7e8-35def0372238",
"metadata": {},
"source": [
"$z$ for 68\\%, 90\\%, 95\\% und 99\\%: "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5a71ee06-db32-4349-a3da-ae0e394fbd74",
"metadata": {},
"outputs": [],
"source": [
"for c in [0.68, 0.90, 0.95, 0.99]:\n",
" print(c, scipy.optimize.brentq(lambda z: conv_gaus(z)-c,0, 10))"
]
},
{
"cell_type": "markdown",
"id": "2d5f1dab-cebd-45fe-934e-36cd06ba2782",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Regions in two or three dimensions:\n",
"\n",
"$\\vec x^T = (x_1, x_2, \\dots, x_n)$ \n",
"\n",
"Let $x_i$ be independent random variables and Gaussian distributed"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "73c11f8d-c8e4-4dc7-ad37-4ca4a33c55ea",
"metadata": {},
"outputs": [],
"source": [
"def conv_gaus_nd(z, n):\n",
" return np.power(scipy.stats.norm.cdf(z) - scipy.stats.norm.cdf(-z),n)\n",
"\n",
"for z in [1,2,3,4,5]:\n",
" print(z, conv_gaus_nd(z, 1), conv_gaus_nd(z, 2), conv_gaus_nd(z, 3))"
]
},
{
"cell_type": "markdown",
"id": "8dd1036c-edf9-40c4-be16-56025f49134b",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Regions in two or three dimensions:\n",
"\n",
"$z$ for 68\\%, 90\\%, 95\\% und 99\\%: "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7f14d38e-1e41-460e-9876-2afb62bed08b",
"metadata": {},
"outputs": [],
"source": [
"for c in [0.68, 0.90, 0.95, 0.99]:\n",
" print(c, scipy.optimize.brentq(lambda z: conv_gaus_nd(z, 1)-c,0, 10), scipy.optimize.brentq(lambda z: conv_gaus_nd(z, 2)-c,0, 10), scipy.optimize.brentq(lambda z: conv_gaus_nd(z,3)-c,0, 10))"
]
},
{
"cell_type": "markdown",
"id": "9aca662e-bdd6-408e-8f28-852614e51577",
"metadata": {},
"source": [
"1,2,3-$\\sigma$-equivalents in two or three dimensions:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "702daac8-81e8-406c-b70f-c3146c58a493",
"metadata": {},
"outputs": [],
"source": [
"for k in [1, 2, 3]:\n",
" c = conv_gaus_nd(k, 1)\n",
" print(c, scipy.optimize.brentq(lambda z: conv_gaus_nd(z, 2)-c,0, 10), scipy.optimize.brentq(lambda z: conv_gaus_nd(z,3)-c,0, 10))"
]
},
{
"cell_type": "markdown",
"id": "bba9c7fc-9e11-4441-90e0-f1d186a1726e",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"# Hypothesis tests\n",
"\n",
"**Bundesliga**:\n",
"Hypothesis: \"The $k_i$ goals in each Bundesliga match $i$ are Poisson distributed with a common parameter $\\mu = <k>$.\"\n",
"\n",
"We need an alternative Hypothesis for the test: \"The goals in each Bundesliga match $k_i$ are Poisson distributed with parameter $\\mu_i = ki$ for each match.\"\n",
"\n",
"Errors of first and second kind:\n",
"* error of first kind: The hypothesis is true, but is rejected.\n",
" \n",
" significance: $\\alpha$ \n",
" \n",
" specificity: $1-\\alpha$ (efficiency) \n",
"\n",
"\n",
"* error of the second kind: The hypothesis is accepted, but is wrong (false positive).\n",
"\n",
" probability for error: $\\beta$\n",
" \n",
" power: $1-\\beta$ "
]
},
{
"cell_type": "markdown",
"id": "644cb69d",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Example\n",
"Gaussian distributed random variable $x$ ($\\sigma = 1$)\n",
"\n",
"Hypothesis: $\\mu = 0$"
]
},
{
"cell_type": "markdown",
"id": "4e67fff2-211b-4eef-95e2-44e3b363098c",
"metadata": {
"cell_style": "split",
"slideshow": {
"slide_type": "-"
},
"tags": []
},
"source": [
"For which region $x_- < x < x_+$ shall we accept the hypothesis for an error of the first kind of $\\alpha = 5\\%$?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "85e25147-c902-4434-b3dc-908486408c5b",
"metadata": {
"cell_style": "split",
"slideshow": {
"slide_type": "fragment"
},
"tags": []
},
"outputs": [],
"source": [
"x = np.linspace(-5,5,100)\n",
"xout1 = np.linspace(-5, -1.9599639845401602)\n",
"xout2 = np.linspace(1.9599639845401602, 5)\n",
"plt.plot(x, scipy.stats.norm.pdf(x))\n",
"plt.fill_between(xout1, scipy.stats.norm.pdf(xout1), np.zeros_like(xout1),color=\"c\")\n",
"plt.fill_between(xout2, scipy.stats.norm.pdf(xout2), np.zeros_like(xout2),color=\"c\")\n",
"\n",
"plt.xlabel(\"$x$\")\n",
"\n",
"plt.grid()"
]
},
{
"cell_type": "markdown",
"id": "6cbb0d56-981d-4dfa-9a7a-0d6d4a51a357",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### What is the error of the second kind?"
]
},
{
"cell_type": "markdown",
"id": "cb79afd6-ba74-448b-9096-05cedeb842a6",
"metadata": {
"cell_style": "split"
},
"source": [
"Needs an alternative hypothesis!\n",
"\n",
"Example: true $\\mu = 3$:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "603a31e1-1425-4eb6-9b12-c44ba31d06a6",
"metadata": {
"cell_style": "split"
},
"outputs": [],
"source": [
"x = np.linspace(-5,8,200)\n",
"xin = np.linspace(-1.9599639845401602, 1.9599639845401602)\n",
"xout1 = np.linspace(-5, -1.9599639845401602)\n",
"xout2 = np.linspace(1.9599639845401602, 5)\n",
"plt.plot(x, scipy.stats.norm.pdf(x))\n",
"plt.fill_between(xout1, scipy.stats.norm.pdf(xout1), np.zeros_like(xout1),color=\"c\")\n",
"plt.fill_between(xout2, scipy.stats.norm.pdf(xout2), np.zeros_like(xout2),color=\"c\")\n",
"plt.plot(x, scipy.stats.norm.pdf(x, 3))\n",
"plt.fill_between(xin, scipy.stats.norm.pdf(xin, 3), np.zeros_like(xin),color=\"orange\")\n",
"plt.xlabel(\"$x$\")\n",
"plt.grid()\n",
"print(\"error of second kind: beta:\", scipy.stats.norm.cdf(1.9599639845401602, 3) - scipy.stats.norm.cdf(-1.9599639845401602, 3))"
]
},
{
"cell_type": "markdown",
"id": "f6a454b7-fce4-4dca-ad3c-9e220956706b",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"## Neyman-Pearson Lemma\n",
"\n",
"Hypothesis: $H$\n",
"\n",
"Alternative hypothesis: $A$\n",
"\n",
"Goal: a criterion for an acceptance region giving the highest power (signal purity).\n",
"\n",
"Assume $H$ is accepted for $x$ in range $V$:\n",
"\n",
"$\\int_{V} P_{H}(x) dx = 1 - \\alpha$ (large)\n",
"\n",
"$\\int_{V} P_{A}(x) dx = \\beta$ (small)\n",
"\n",
"In $V$ are the values of $x$, for which $\\frac{P_{H}(x)}{P_{A}(x)}$ is large.\n",
"\n",
"Neyman-Pearson lemma: best range selected with $\\frac{P_{H}(x)}{P_{A}(x)} > c$ (likelihood ratio)"
]
},
{
"cell_type": "markdown",
"id": "b636ff7e-5ba1-4fc5-bb40-fb29b63ca88c",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Finally: Does are model for the Bundesliga work?\n",
"\n",
"**Bundesliga**:\n",
"Hypothesis: \"The $k_i$ goals in each Bundesliga match $i$ are Poisson distributed with a common parameter $\\mu = <k>$.\"\n",
"\n",
"Alternative hypothesis: \"The goals in each Bundesliga match $k_i$ are Poisson distributed with parameter $\\mu_i = ki$ for each match.\"\n",
"\n",
"Neyman-Pearson: likelihood ratio: $\\frac{P_{A}(x)}{P_{H}(x)} = \\frac{\\hat L(\\vec k; \\vec k)}{L(\\mu; \\vec k)} > c$\n",
"\n",
"We look at the $log$ of the likelihood ratio for our model $-lnL(\\mu; \\vec k)$ and the alternative (saturated model) $-ln\\hat L(\\vec k; \\vec k)$.\n",
"\n",
"$P_{H} (\\vec k) = \\prod_{i} P(k_i,\\mu)$; $P_{A} (\\vec k) = \\prod_{i} P(k_i, k_i)$\n",
"\n",
"\n",
"$d = \\ln \\frac{P_{A}(\\vec k)}{P_{H}(\\vec k)} = -\\ln \\frac{P_{H}(\\vec k)}{P_{A}(\\vec k)} = -\\ln L(\\mu; \\vec k)-(-\\ln\\hat L(\\vec k; \\vec k))$\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "75234801-341f-4474-a2bf-93aafe9add77",
"metadata": {},
"outputs": [],
"source": [
"mu = np.mean(summe)\n",
"print(\"mu:\", mu)\n",
"print(\"P(H):\", np.prod(scipy.stats.poisson.pmf(summe,mu)))\n",
"print(\"P(A):\", np.prod(scipy.stats.poisson.pmf(summe,summe)))\n",
"print(\"-ln P(H):\", -np.sum(scipy.stats.poisson.logpmf(summe,mu)))\n",
"print(\"-ln P(A):\", -np.sum(scipy.stats.poisson.logpmf(summe,summe)))\n",
"print(\"d:\", -np.sum(scipy.stats.poisson.logpmf(summe,mu)) + np.sum(scipy.stats.poisson.logpmf(summe,summe)))"
]
},
{
"cell_type": "markdown",
"id": "e840a509-4f5b-48ef-9594-eb1dc0bbdc4a",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### What does this mean?\n",
"\n",
"Which distribution of our test statistic $d$ do we expect for pseudo-data from our model and $\\mu=<k>$?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c2d1227f",
"metadata": {
"cell_style": "split",
"jupyter": {
"source_hidden": true
},
"scrolled": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"#simuliere 1000 Spielzeiten\n",
"muobs = np.mean(summe)\n",
"tore = scipy.stats.poisson.rvs(muobs,size=(1000,306))\n",
"d = np.zeros(len(tore))\n",
"mu = np.zeros(len(tore))\n",
"for i in range(len(tore)):\n",
" mu[i] = np.mean(tore[i,:])\n",
" d[i] = np.sum(-scipy.stats.poisson.logpmf(tore[i],mu[i]) + scipy.stats.poisson.logpmf(tore[i],tore[i]))\n",
" \n",
"plt.hist(mu, bins=50)\n",
"muobs = np.mean(summe)\n",
"plt.plot([muobs, muobs], [0, 100], linestyle = 'dotted')\n",
"plt.grid()\n",
"plt.xlabel(\"$\\hat \\mu$\");"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "90e27f87-36fd-4b11-add1-c82a4eeb79dd",
"metadata": {
"cell_style": "split",
"jupyter": {
"source_hidden": true
},
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"dobs = -np.sum(scipy.stats.poisson.logpmf(summe,muobs)) + np.sum(scipy.stats.poisson.logpmf(summe,summe))\n",
"plt.hist(d, bins=50)\n",
"plt.plot([dobs, dobs], [0, 100], linestyle = 'dotted')\n",
"\n",
"plt.grid()\n",
"plt.xlabel(\"$-\\ln(P(H)/P(A))$\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "f6db4bd4-de67-48e1-add9-4fdc05dbbcf7",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### The $p$-value \n",
"\n",
"Fraction of expected outcomes that have a higher value than we get for data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "18f65b9d",
"metadata": {
"cell_style": "split"
},
"outputs": [],
"source": [
"plt.hist(mu, bins=50)\n",
"plt.plot([muobs, muobs], [0, 100], linestyle = 'dotted')\n",
"plt.grid()\n",
"plt.xlabel(\"$\\hat \\mu$\")\n",
"plt.show()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b82090eb",
"metadata": {
"cell_style": "split"
},
"outputs": [],
"source": [
"dobs = -np.sum(scipy.stats.poisson.logpmf(summe,muobs)) + np.sum(scipy.stats.poisson.logpmf(summe,summe))\n",
"plt.hist(d, bins=50)\n",
"plt.plot([dobs, dobs], [0, 100], linestyle = 'dotted')\n",
"\n",
"plt.grid()\n",
"plt.xlabel(\"$-\\ln(P(H)/P(A))$\")\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6647f14e-2282-4080-9139-c332db4af23a",
"metadata": {},
"outputs": [],
"source": [
"print(\"p:\", np.sum(d > dobs)/len(d))"
]
},
{
"cell_type": "markdown",
"id": "c835358c-1291-431c-b0ab-792a4b940d68",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Can we do this analytically, without Monte Carlo?\n",
"\n",
"Simple case: Gaussian distributions:\n",
"\n",
"$$d = -\\ln L(\\mu; \\vec x)-(-\\ln\\hat L(\\vec x; \\vec x) = -\\ln \\prod_{i} \\frac{G(x_i,\\mu, \\sigma)}{G(x_i, x_i, \\sigma)}=-\\ln \\prod_{i} \\frac{\\exp(-\\frac{1}{2}(\\frac{x_i-\\mu}{\\sigma})^2)}{\\exp(-\\frac{1}{2}(\\frac{x_i-x_i}{\\sigma})^2)} = -\\ln \\exp(-\\frac{1}{2}(\\frac{x_i-\\mu}{\\sigma})^2) = \\frac{1}{2} \\sum_i\\left(\\frac{x_i - \\mu}{\\sigma}\\right)^2 = \\frac{1}{2}\\chi^2$$\n",
"\n",
"For Gaussian: $-2 \\ln\\frac{P(H}{P(A)}$ follows $\\chi^2$ \n",
"\n",
"\n",
"Wilk's Theorem:\n",
"$-2 \\ln\\frac{P(H}{P(A)}$ approaches a $\\chi^2$ distribution (asymptotic limit) with $n$ degrees of freedom $n = \\text{number of data points} - \\text{number of model parameters}$"
]
},
{
"cell_type": "markdown",
"id": "f09138b6",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Let's try it\n",
"\n",
" - use $\\chi^2$ p.d.f. and c.d.f. from [scipy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2.html)\n",
" - what is the value of `df`\n",
" \n",
" - draw the $\\chi^2$ distribution in the interval $[200, 425]$\n",
" \n",
" - compute the $p$-value for our data\n",
" \n",
" - does it work? Add the histgram for the simulated $\\chi^2=2d$ values."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e609ea07",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "9d667a3b-6a89-419a-8be4-79cb7de96151",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"outputs": [],
"source": [
"plt.hist(2*d, bins=50, density=True)\n",
"plt.plot([2*dobs, 2*dobs], [0, 0.02], linestyle = 'dotted')\n",
"ds = np.linspace(200, 425, 100)\n",
"plt.plot(ds,scipy.stats.chi2.pdf(ds, 305))\n",
"\n",
"\n",
"\n",
"print(\"p-Wert via Chi2:\", scipy.stats.chi2.sf(2*dobs, 305))\n",
"\n",
"plt.grid()\n",
"plt.xlabel(\"$-2\\ln(P(H)/P(A))$\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "9f258f33",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### How are the $p$-values using Wilk's theorem from the simulation distributed?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cda0957a-414f-4bea-ab33-e61218a4a4f5",
"metadata": {
"cell_style": "center"
},
"outputs": [],
"source": [
"plt.hist( scipy.stats.chi2.sf(2*d, 305), bins=50, density=True)\n",
"plt.plot([ scipy.stats.chi2.sf(2*dobs, 305), scipy.stats.chi2.sf(2*dobs, 305)], [0, 12], linestyle = 'dotted')\n",
"\n",
"plt.grid()\n",
"plt.xlabel(\"p\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "9378e3ba",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### What distribution should we get if it works?"
]
},
{
"cell_type": "markdown",
"id": "59be1788",
"metadata": {
"cell_style": "split"
},
"source": [
"For the simulation:"
]
},
{
"cell_type": "markdown",
"id": "5d172ac9",
"metadata": {
"cell_style": "split"
},
"source": [
"For the $chi^2$ distribution:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a096afa1",
"metadata": {
"cell_style": "split"
},
"outputs": [],
"source": [
"pvalues = [np.sum(d > dsim)/len(d) for dsim in d]\n",
"\n",
"plt.hist(pvalues, bins=50, density=True)\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "648b37eb",
"metadata": {
"cell_style": "split"
},
"outputs": [],
"source": [
"d2 = scipy.stats.chi2.rvs(305, size=10000)\n",
"pvalues = [np.sum(d2 > d2sim)/len(d2) for d2sim in d2]\n",
"\n",
"plt.hist(pvalues, bins=50, density=True)\n",
"plt.grid()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "557559e1-8e4a-412a-9486-a2fb9b97de90",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Does is work better for large $\\mu$ (hand ball?\n",
"\n",
"- simulate 1000 season for $\\mu_\\text{obs} = 35$\n",
"- and rerun the tests"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "770f4472",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "472aac93-e2dd-424a-ba0e-51c10d409f8e",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"outputs": [],
"source": [
"#simuliere 5000 Spielzeiten\n",
"muobs = 35\n",
"\n",
"tore = scipy.stats.poisson.rvs(muobs,size=(5000,306))\n",
"\n",
"d = np.zeros(len(tore))\n",
"mu = np.zeros(len(tore))\n",
"for i in range(len(tore)):\n",
" mu[i] = np.mean(tore[i,:])\n",
" d[i] = np.sum(-scipy.stats.poisson.logpmf(tore[i],mu[i]) + scipy.stats.poisson.logpmf(tore[i],tore[i]))\n",
" \n",
"plt.hist(mu, bins=50)\n",
"plt.grid()\n",
"plt.xlabel(\"$\\hat \\mu$\")\n",
"plt.show()\n",
"\n",
"plt.hist(2 * d, bins=50, density=True)\n",
"ds = np.linspace(200, 425, 100)\n",
"plt.plot(ds,scipy.stats.chi2.pdf(ds, 305))\n",
"plt.grid()\n",
"plt.xlabel(\"$-2\\ln(P(H)/P(A))$\")\n",
"plt.show()\n",
"\n",
"plt.hist( scipy.stats.chi2.sf(2*d, 305), bins=50)\n",
"plt.grid()\n",
"plt.xlabel(\"p\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"id": "450edccb-2c13-4400-9613-4a1f3e069ce6",
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"source": [
"### Pearson's $\\chi^2$-Test\n",
"\n",
"What are the confidence regions for $n$ degrees of freedom/dimensions:\n",
"\n",
"data: $\\vec y = y_1,\\dots, y_n$\n",
"predicted values from model: $\\vec{f} = f_1,\\dots,f_n$\n",
"\n",
"$$\\chi^2 = (\\overrightarrow{y\\ } - \\vec{f})^{T} V(\\vec{y}- \\vec{f})$$\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f703d4ed",
"metadata": {
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"def conv_chi2_nd(z, n):\n",
" return scipy.stats.chi2.cdf(z,n)\n",
"\n",
"\n",
"for z in [1,2,3,4,5]:\n",
" print(z, conv_chi2_nd(z, 1), conv_chi2_nd(z, 2), conv_chi2_nd(z, 3))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2fd5c302-732b-4c6b-b224-5b9f79df57cc",
"metadata": {
"slideshow": {
"slide_type": ""
},
"tags": []
},
"outputs": [],
"source": [
"print(\"critical chi2 values\")\n",
"for k in [1, 2, 3]:\n",
" c = conv_gaus_nd(k, 1)\n",
" print(c, scipy.optimize.brentq(lambda z: conv_chi2_nd(z, 1)-c,0, 30), scipy.optimize.brentq(lambda z: conv_chi2_nd(z, 2)-c,0, 30), scipy.optimize.brentq(lambda z: conv_chi2_nd(z,3)-c,0, 30))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "23626cec-4198-44f4-a1c8-81e36992268f",
"metadata": {},
"outputs": [],
"source": [
"for c in [0.68, 0.90, 0.95, 0.99]:\n",
" print(c, scipy.optimize.brentq(lambda z: conv_chi2_nd(z, 1)-c,0, 30), scipy.optimize.brentq(lambda z: conv_chi2_nd(z, 2)-c,0, 30),scipy.optimize.brentq(lambda z: conv_chi2_nd(z,3)-c,0, 30))"
]
},
{
"cell_type": "markdown",
"id": "69556eab-a67e-4251-a4b3-0290013aa344",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"# Schätzer\n",
"\n",
"## Fehlerrechnung\n",
"\n",
"### Fehlerrechnung: Beispiel I \n",
"\n",
"Widerstandsmessung: $U = 24$ V und $I = 0{,}6 \\pm 0{,}1$ A \n",
"Annahme: $I$ gaußverteilt $g(I)$ mit $\\mu_I = 0{,}6$ und\n",
"$\\sigma_I = 0{,}1$ \n",
"Was ist $p(R)$? \n",
"$R = U/I$, $I = U/R$ und $|dI/dR| = U/R^2$ \n",
"$$\\begin{aligned}\n",
" p(R) & = & U/R^2 g(I(R)) = \\frac{U}{R^2 \\sqrt{2\\pi}\\sigma_I}e^{-\\frac{(U/R-\\mu_I)^2}{2\\sigma_I^2}} \\\\\n",
" & = & \\frac{U}{R^2 \\sqrt{2\\pi}\\sigma_I}e^{-\\frac{(R-U/\\mu_I)^2}{2\\sigma_I^2*R^2/\\mu_I^2}} \\\\\n",
" \n",
"\\end{aligned}$$ (show in Jupyter)"
]
},
{
"cell_type": "markdown",
"id": "06ff9896-15bc-4404-a789-bbe7557350eb",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"### Fehlerrechnung: Beispiel II \n",
"\n",
"Spannungsmessung: $R = 40$ $\\Omega$ und $I = 0{,}6 \\pm 0{,}1$ A \n",
"Annahme: $I$ gaußverteilt $g(I)$ mit $\\mu_I = 0{,}6$ und\n",
"$\\sigma_I = 0{,}1$ \n",
"Was ist $p(U)$? \n",
"$U = RI$, $I = U/R$ und $|dI/dU| = 1/R$ \n",
"$$\\begin{aligned}\n",
" p(U) & = &1/R g(I(R)) = \\frac{1}{R \\sqrt{2\\pi}\\sigma_I}e^{-\\frac{(U/R-\\mu_I)^2}{2\\sigma_I^2}} \\\\\n",
" & = & \\frac{1}{\\sqrt{2\\pi}R\\sigma_I}e^{-\\frac{(U-R\\mu_I)^2}{2R^2\\sigma_I^2}} \\\\\n",
" & = & g(U) \\text{ mit $\\mu_U = R\\mu_I$ und $\\sigma_U = R\\sigma_I$}\\\\\n",
" \n",
"\\end{aligned}$$"
]
},
{
"cell_type": "markdown",
"id": "0fe93c64-f06b-49ec-addb-22b12cf06ec6",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"### Fehlerrechung: Allgemein \n",
"\n",
"Gaußsche Fehlerfortpflanzung: Annahme: $x$ gaußverteilt $g(x)$ mit\n",
"$\\mu_x$ und $\\sigma_x$ \n",
"Was ist $p(y(x))$? \n",
"$y(x) \\approx f(\\mu_x) + \\frac{dy}{dx}\\big|_{x=\\mu_x}(x-\\mu_x)$ \n",
"$x - \\mu_x \\approx (y-y(\\mu_x))(\\frac{dy}{dx}\\big|_{\\mu_x})^{-1}$,\n",
"$|dx/dy| = |\\frac{dy}{dx}(\\mu_x)|^{-1}$ $$\\begin{aligned}\n",
" p(y) & \\approx & |\\frac{dy}{dx}(\\mu_x)|^{-1} g(x(y)) = \\frac{1}{\\frac{dy}{dx}\\big|_{\\mu_x}\\sqrt{2\\pi}\\sigma_x}e^{-\\frac{(x(y)-\\mu_x)^2}{2\\sigma_x^2}} \\\\\n",
" & = & \\frac{1}{\\frac{dy}{dx}\\big|_{\\mu_x}\\sqrt{2\\pi}\\sigma_x}e^{-\\frac{(y-y(\\mu_x))^2}{2(\\frac{dy}{dx}\\big|_{\\mu_x})^2\\sigma_x^2}}\\\\\n",
" & = & g(y) \\text{ mit $\\mu_y = y(\\mu_x)$ und $\\sigma_y = \\frac{dy}{dx}\\Big|_{\\mu_x}\\sigma_x$}\\\\\n",
" \n",
"\\end{aligned}$$"
]
},
{
"cell_type": "markdown",
"id": "85f35e43-f5cd-4ccf-acd2-cbb5e963ce00",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"### Fehlerrechung: Mehrere Dimensionen \n",
"\n",
"Gaußsche Fehlerfortpflanzung: Annahme: $\\vec x$ gaußverteilt $g(\\vec x)$\n",
"mit $\\vec \\mu$ und Kovarianz $C$ \n",
"Was ist $p(\\vec y(\\vec x))$? \n",
"$y_i(\\vec x) \\approx y_i(\\vec \\mu) + \\frac{dy_i}{dx_j}\\big|_{\\vec \\mu} (x_j- \\mu_j)$;\n",
"$$\\begin{aligned}\n",
" E([y_i y_j]) \\approx & y_i(\\vec \\mu)y_j(\\vec \\mu) + y_i(\\vec \\mu)\\frac{dy_j}{dx_k}\\big|_{\\vec \\mu} E[x_k - \\mu_k] + y_j(\\vec \\mu)\\frac{dy_i}{dx_k}\\big|_{\\vec \\mu}E[x_k- \\mu_k] \\\\\n",
" & + \\frac{dy_i}{dx_k}\\big|_{\\vec \\mu}\\frac{dy_j}{dx_l}\\big|_{\\vec \\mu}E[(x_k- \\mu_k) (x_l - \\mu_l)]\\\\\n",
" = & y_i(\\vec \\mu)y_j(\\vec \\mu) + \\frac{dy_i}{dx_k}\\big|_{\\vec \\mu}\\frac{dy_j}{dx_l}\\big|_{\\vec \\mu}C_{kl} \\text{ und }V[y_iy_j] = E[y_iy_j] - E[y_i]E[y_j]\n",
" \n",
"\\end{aligned}$$ mit $A_{ij} = [ \\frac{dy_i}{dx_j}\\big|_{\\vec \\mu}]$ ist\n",
"$\\vec y(\\vec x)$ gaußverteilt mit\n",
"$$\\vec \\mu_y = \\vec y(\\vec \\mu_x) \\text{ und Kovarianz } C_{yy} = A C_{xx} A^T$$"
]
},
{
"cell_type": "markdown",
"id": "8507d724-f9f8-4d41-a3b0-3ddaf0846dde",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"## Stichproben\n",
"\n",
"### Schätzer: Mittelwert aus Stichprobe\n",
"\n",
"Beispiel: Schätze $I$ aus drei Messungen der Stromstärke $I_1$, $I_2$,\n",
"$I_3$ mit gleichem $\\sigma_I = 0.1$ A.\n",
"\n",
"Annahme: Messungen gauß-verteilt um I. \n",
"Schätzer für $\\mu$: Mittelwert: $$\\hat I = 1/3(I_1+I_2+I_3)$$ Fehler für\n",
"unkorrellierte Messungen (keine syst. Fehler): \n",
"$$C = \\left( \n",
" \\begin{array}{rrr} \n",
" \\sigma_I^2 &0 & 0 \\\\ \n",
" 0 & \\sigma_I^2 & 0\\\\\n",
" 0 & 0 & \\sigma_I^2 \n",
" \\end{array} \\right) \\text{ und } A = \\left( \\begin{array}{rrr} \n",
" d\\hat I/dI_1 & d\\hat I/dI_2 & d\\hat I/dI_3\n",
" \\end{array} \\right)$$"
]
},
{
"cell_type": "markdown",
"id": "6463365e-354d-4e98-8f63-6903e2dbd6e6",
"metadata": {
"jp-MarkdownHeadingCollapsed": true,
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"### Beispiel: Fehlerfortpflanzung \n",
"\n",
"Fehler für unabhängige Messungen (keine syst. Fehler): \n",
"$$C = \\left( \n",
" \\begin{array}{rrr} \n",
" \\sigma_I^2 &0 & 0 \\\\ \n",
" 0 & \\sigma_I^2 & 0\\\\\n",
" 0 & 0 & \\sigma_I^2 \n",
" \\end{array} \\right) \\text{ und } A = \\left( \\begin{array}{rrr} \n",
" 1/3 & 1/3 & 1/3\n",
" \\end{array} \\right)$$ Fehlerfortpflanzung: $$\\begin{aligned}\n",
"C_{\\hat I} & = & ACA^T = \\left( \n",
" \\begin{array}{rrr} \n",
" 1/3 & 1/3 & 1/3 \n",
" \\end{array} \n",
"\\right) \n",
" \\left( \n",
" \\begin{array}{rrr} \n",
" \\sigma_I^2 &0 & 0 \\\\ \n",
" 0 & \\sigma_I^2 & 0\\\\\n",
" 0 & 0 & \\sigma_I^2 \n",
" \\end{array} \\right) \n",
" \\left( \\begin{array}{r} \n",
" 1/3 \\\\ 1/3 \\\\ 1/3 \n",
" \\end{array} \n",
"\\right)\\\\\n",
"& = & \\left( \n",
" \\begin{array}{rrr} \n",
" \\sigma_I^2/3 & \\sigma_I^2/3 & \\sigma_I^2/3 \n",
" \\end{array} \n",
"\\right) \n",
" \\left( \\begin{array}{r} \n",
" 1/3 \\\\ 1/3 \\\\ 1/3 \n",
" \\end{array} \n",
"\\right) = \\sigma_I^2/9 + \\sigma_I^2/9 + \\sigma_I^2/9 = \\sigma_I^2/3 \\\\\n",
"\\hat \\sigma_I & = & \\sigma_I/\\sqrt{3}\n",
"\\end{aligned}$$"
]
},
{
"cell_type": "markdown",
"id": "a038c48f-7143-480d-b0ba-817f0e5211c1",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"### Schätzer \n",
"\n",
"Schätzer $\\hat a$ für $a$ aus Stichprobe $x_1,\\dots x_n$\n",
"\n",
"Anforderungen:\n",
"\n",
"- erwartungstreu: \n",
" $E[\\hat a]= a$\n",
"\n",
"- konsistent: \n",
" $\\lim_{n\\to \\infty } \\hat a = a$\n",
"\n",
"- effizient: $V[\\hat a]$ möglichst klein\n",
"\n",
"Aufgabe: Schätze Mittelwert $\\mu$ und Varianz $\\sigma^2$ einer\n",
"Wahrscheinlichkeitsdichtefunktion $p(x)$ aus einer Stichprobe\n",
"$x_1,\\dots,x_n$"
]
},
{
"cell_type": "markdown",
"id": "d32672c8-da0b-468b-bf73-df40aef7245a",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"### Schätzer für Mittelwert \n",
"\n",
"Schätzer für den Mittelwert $\\mu$:\n",
"$$\\hat \\mu = \\bar x = \\frac{1}{n}\\sum_{i=1}^n x_i$$\n",
"\n",
"Tests:\n",
"\n",
"- erwartungstreu:\n",
" $$E[\\hat \\mu]= E[ \\frac{1}{n}\\sum_{i=1}^n x_i ] = \\frac{1}{n} \\sum_{i=1}^n E[x_i] = \\frac{1}{n} \\sum_{i=1}^n \\mu = \\mu$$\n",
"\n",
"- konsistent:\n",
" $$\\lim_{n\\to \\infty } \\hat \\mu = \\lim_{n\\to \\infty } \\frac{1}{n}\\sum_{i=1}^n x_i = \\int x p(x)\\,dx = \\mu$$"
]
},
{
"cell_type": "markdown",
"id": "4021d6cd-150e-4a6d-bdc4-ed049fd983d3",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"### Schätzer für Mittelwert \n",
"\n",
"Schätzer für den Mittelwert $\\mu$:\n",
"$$\\hat \\mu = \\bar x = \\frac{1}{n}\\sum_{i=1}^n x_i$$\n",
"\n",
"Test:\n",
"\n",
"- effizient: $V[\\hat a]$ möglichst klein\n",
" $$V[\\hat \\mu] = V[ \\frac{1}{n}\\sum_{i=1}^n x_i ] = \\frac{1}{n^2} V[ \\sum_{i=1}^n x_i ] = \\frac{1}{n^2} \\sum_{i=1}^n\\sum_{j=1}^n Cov(x_i, x_j)$$\n",
" $$V[\\hat \\mu] = \\frac{1}{n^2} \\sum_{i=1}^n Cov(x_i, x_i)= \\frac{n\\sigma^2}{n^2} = \\frac{\\sigma^2}{n}$$\n",
" (oder über zentralen Grenzwertsatz)"
]
},
{
"cell_type": "markdown",
"id": "4dfd3266-a5a2-4db7-a96b-ef21118bc85e",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"### Schätzer für Varianz \n",
"\n",
"Schätzer für die Varianz $\\sigma^2$:\n",
"$$\\hat \\sigma^2 = V[x] = \\frac{1}{n}\\sum_{i=1}^n (x_i - <x>)^2$$\n",
"\n",
"Tests:\n",
"\n",
"- erwartungstreu:\n",
" $$E[\\hat {\\sigma^2}]= E[ \\frac{1}{n}\\sum_{i=1}^n (x_i^2 - \\bar x^2)] = \\frac{1}{n} \\sum_{i=1}^n E[x_i^2-\\bar x^2] = E[x^2] - E[\\bar x^2]$$\n",
" $$E[\\hat{\\sigma^2}]= E[x^2] - E[x]^2 + E[x]^2 - E[\\bar x^2] = E[x^2] - E[x]^2 - (E[\\bar x^2] - E[\\bar x]^2)$$\n",
" $$E[\\hat {\\sigma^2}] = V(x) - V(\\bar x) = \\sigma^2 - \\frac{\\sigma^2}{n} = \\frac{n-1}{n}\\sigma^2 \\ne \\sigma^2$$\n",
"\n",
"- konsistent:\n",
" $\\lim_{n\\to \\infty } \\hat {\\sigma^2} = \\lim_{n\\to \\infty } \\frac{n-1}{n}\\sigma^2 =\\sigma^2$"
]
},
{
"cell_type": "markdown",
"id": "1621af0b-8216-4959-9cc9-8c41cdb8cc79",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"### Stimmt das Modell? \n",
"\n",
"Der angewandte Schätzer basiert immer auf Annahmen über die analysierte\n",
"Stichprobe. \n",
"Der Fehler auf den Schätzwert sagt nichts darüber aus, ob die Annahmen\n",
"stimmen."
]
},
{
"cell_type": "markdown",
"id": "8fdd0fc7-af9c-492c-9c7f-5a195ee58c50",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"### Nochmal die Stromstärke \n",
"\n",
"Stromstärke aus zwei Messreihen, $\\sigma_I = 0.1$ A: \n",
"Reihe 1: $[0.64441099 0.63895571 0.73984042 0.62145699 0.66971489]$ \n",
"Reihe 2: $[0.93179978 0.46326547 0.41350898 0.12281948 0.61426579]$ \n",
"Schätzer: $I_1 = 0.66 \\pm 0.04$, $I_2 = 0.51 \\pm \\pm 0.04$ \n",
"Passen die Daten zur Erwartung? \n",
"\n",
"Residuum: $$R_i = \\frac{x_i - \\mu}{\\sigma}$$ ist normal verteilt, wenn\n",
"$\\mu$ und $\\sigma$ stimmen.\n",
"\n",
"(Betrachte Summe der Residuenquadrate $\\sum R_i^2$)"
]
},
{
"cell_type": "markdown",
"id": "0c34b270-4e04-40be-82b1-f305eadf82a9",
"metadata": {
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"### $\\chi^2$-Verteilung\n",
"\n",
"0.5\n",
"\n",
"$$\\chi^2 = \\sum_i \\frac{x_i - \\mu_i}{\\sigma_i}$$ Die $p(\\chi^2, n)$ ist\n",
"die Wahrscheinlichkeitsdichte für die Summe der Quadrate von $n$\n",
"normalverteilten Zufallszahlen. \n",
"Mittelwert: $<\\chi^2> = n$ Zahl der Freiheitsgrade.\n",
"\n",
"0.5 <embed src=\"./figures/10/chi2.pdf\" />\n",
"\n",
"$p$-Wert (Wahrscheinlichkeit für größeres $\\chi^2$):\n",
"$$p = 1- \\int_0^{\\chi 2} p(\\chi^2, n)$$"
]
},
{
"cell_type": "markdown",
"id": "3dbe4a35-771e-4129-9529-76f3d38fddae",
"metadata": {
"jp-MarkdownHeadingCollapsed": true,
"slideshow": {
"slide_type": "skip"
},
"tags": []
},
"source": [
"## Zusammenfassung und Ausblick\n",
"\n",
"Zusammenfassung\n",
"\n",
"- Wahrscheinlichkeitsdichten\n",
"\n",
"- Fehlerrechnung (lineare Näherung)\n",
"\n",
"- Korrelationen nicht vernachlässigen\n",
"\n",
"- Schätzer\n",
"\n",
"- p-Wert\n",
"\n",
"- Literatur: \n",
"\n",
" - Glen Cowan, Statistical Data Analysis,\n",
" [pdf](https://www.sherrytowers.com/cowan_statistical_data_analysis.pdf)\n",
"\n",
" - Roger John Barlow, Statistics: A Guide to the Use of Statistical\n",
" Methods in the Physical Sciences,\n",
" [Skript](https://arxiv.org/pdf/1905.12362.pdf)\n",
"\n",
" - Volker Blobel, Erich Lohrmann, Statistische und numerische\n",
" Methoden der Datenanalyse,\n",
" [pdf](https://www.desy.de/~sschmitt/blobel/eBuch.pdf)\n",
"\n",
"# Bibliography\n",
"\n",
"Bibliography"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.20"
},
"livereveal": {
"autolaunch": true,
"overlay": "<div class='myfooter'><h2 style='text-align:center;'>Hartmut Stadie - University of Hamburg</h2></div>",
"scroll": true
},
"toc": {
"base_numbering": 43
}
},
"nbformat": 4,
"nbformat_minor": 5
}