{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "74976eec-e9a9-46b1-824d-01dad15478bc",
   "metadata": {
    "cell_style": "center",
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "# Lecture 2\n",
    "\n",
    "---\n",
    "\n",
    "## Probability density functions and confidence intervals\n",
    "\n",
    "<br>\n",
    "<br>\n",
    "\n",
    " Hartmut Stadie\n",
    "\n",
    "hartmut.stadie@uni-hamburg.de"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2552c4a9",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "# Examples of probability density functions"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0383c396",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## Discrete distributions\n",
    "\n",
    "### Binomial distribution\n",
    "\n",
    "Consider a series of independent tries, each having two possible outcomes where $p$ is the probability for success.<br>\n",
    "The probability for $k$ success with $n$ tries is given by the **binomial distribution**:\n",
    "$$P(k) = {n \\choose k} p^k(1-p)^{n-k} \\text{,  } k = 0,1,2...n$$\n",
    "\n",
    "Expectation value and variance\n",
    "$$<k> = E[k] = \\sum \\limits_{k = 0}^{n} k P(k) = np$$\n",
    "$$V[k] = \\sigma^2 = np(1-p)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9350625b",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Example "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2039f885",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "Tossing five coins $n = 5$, $p = 0.5$  \n",
    "\n",
    "| k    |  0   |  1   |   2   |   3   |  4   |  5   |\n",
    "|:-----|:----:|:----:|:-----:|:-----:|:----:|:----:|\n",
    "| P(k) | 1/32 | 5/32 | 10/32 | 10/32 | 5/32 | 1/32 |\n",
    "\n",
    "\n",
    "\n",
    "<br>\n",
    "\n",
    "\n",
    "plot using [`scipy.stats.binom`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binom.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "662bb993",
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import scipy\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "ks = np.arange(0,5.1)\n",
    "plt.plot(ks, scipy.stats.binom.pmf(ks, 5, 0.5),'+')\n",
    "plt.xlabel(\"$k$\")\n",
    "plt.ylabel(\"$P(k)$\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ec58fe0",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Exercise II \n",
    "\n",
    "Statistical uncertainty on efficiency of a device/selection criterion.\n",
    "<p>\n",
    "Estimate the efficiency and its uncertainty from a sample with $n$ events where $k$ events are selected.  \n",
    "\n",
    "Random variable is the measured efficiency: $h_k = \\frac{k}{n}$. \n",
    "<br>\n",
    "How large is the uncertainty?\n",
    "<br>\n",
    "    \n",
    "Assumption (maybe wrong):  $k$ is binomially distributed with $p_k = E[h_k] = E[\\frac{k}{n}]$: \n",
    "    $$\\begin{aligned}\n",
    "      \\sigma(h_k) &= &\\sqrt{V[\\frac{k}{n}]} = \\sqrt{\\frac{1}{n^2} V[k]} = \\sqrt{\\frac{1}{n^2}\\cdot np_k(1-p_k)}\\\\\n",
    "      &=& \\sqrt{\\frac{p_k(1-p_k)}{n}}\\\\   \n",
    "\\end{aligned}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1180ce3f",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Poisson distribution\n",
    "\n",
    "Consider the binomial case for very large $n$, very small $p$ and $Np$ remains equal to some finite value $\\mu$:\n",
    "$$P(k) = \\frac{\\mu^ke^{-\\mu}}{k!}$$\n",
    "\n",
    "Expectation value and variance:\n",
    "$$E[k] = \\sum \\limits_{k = 1}^{\\infty} k \\frac{e^{-\\mu}\\mu^k}{k!}\n",
    "      = \\mu \\sum \\limits_{k = 1}^{\\infty} k \\frac{e^{-\\mu}\\mu^{k-1}}{(k-1)! k}\n",
    "      = \\mu \\sum \\limits_{s = 0}^{\\infty} \\frac{e^{-\\mu}\\mu^{s}}{s!} = \\mu$$\n",
    "$$V[k] = \\sigma^2 = \\mu$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06c5b8a1",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Poisson  and binomial distributions\n",
    "\n",
    "binomial distribution with $n= 1000$ and $p = 0.01$<br>\n",
    "poisson distribution with $\\mu = 10$ (hatched)  \n",
    "\n",
    "<img src=\"./figures/08/bp.jpg\" width=\"55.0%\"\n",
    "alt=\"image\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d43ae1c",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Example from old statistic books"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd995e3f",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "Deaths from horse kicks in the Prussian army\n",
    "\n",
    "The deaths from horse kicks were recorded for each year and army corps (?). \n",
    "\n",
    "<br>\n",
    "For 20 years (1875-1894) and 14 army corps, one gets:\n",
    "\n",
    "\n",
    "| number of deaths $k$                |   0 |   1 |   2 |   3 |   4 |   5 |   6 |\n",
    "|:------------------------------------------|----:|----:|----:|----:|----:|----:|----:|\n",
    "| number of corps-years with $k$ death | 144 |  91 |  32 |  11 |   2 |   0 |   0 |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fce70dd5",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "<img src=\"./figures/08/poisson70.png\" width=\"80.0%\" />\n",
    "\n",
    "poisson distribution for $\\mu = \\frac{196}{280} = 0.70$ and data with (wrong) error bars"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3097b69-91ec-413e-935e-e310c96d5c25",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## Continuous distributions\n",
    "\n",
    "### Normal distribution\n",
    "\n",
    "Normal- oder Gaussian distribution\n",
    "$$f(x) = \\frac{1}{\\sqrt{2\\pi}\\sigma}e^{-\\frac{(x-\\mu)^2}{2\\sigma^2}}$$\n",
    "\n",
    "Expectation value and variance:\n",
    "$$<x> = E[x] = \\mu$$ \n",
    "$$V[x] = \\sigma^2$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32da631f",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### Standard  Gaussian distribution\n",
    "\n",
    "normal distribution with  $\\mu = 0$ and $\\sigma = 1$\n",
    "\n",
    "<img src=\"./figures/10/gauss_alpha.png\" width=\"60.0%\"\n",
    "alt=\"image\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7050597-71f8-45a4-a4ac-64567d17827a",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "Probability for some intervals:\n",
    "\n",
    "|                       |                                  |             |\n",
    "|:----------------------|---------------------------------:|------------:|\n",
    "| $|x-\\mu| \\ge \\sigma$  | (x außerhalb $\\pm 1\\sigma$) ist: | $31{,}74$ % |\n",
    "| $|x-\\mu| \\ge 2\\sigma$ | (x außerhalb $\\pm 2\\sigma$) ist: |  $4{,}55$ % |\n",
    "| $|x-\\mu| \\ge 3\\sigma$ | (x außerhalb $\\pm 3\\sigma$) ist: |  $0{,}27$ % |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b5910862-3a33-46be-91e0-868a3093cd48",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### Multivariate  normal distribution \n",
    "\n",
    "$$f_{X}(\\vec x )= \\frac {1}{\\sqrt {(2\\pi )^{p}\\det({C)}}}\\exp \\left(-{\\frac {1}{2}}(\\vec x- \\vec\\mu)^{\\top}C^{-1}(\\vec x - \\vec \\mu)\\right)$$\n",
    "\n",
    "<img src=\"./figures/10/Multivariate_Gaussian.png\"\n",
    "width=\"60.0%\" alt=\"image\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39087e01",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### 2-D multivariate normal distribution\n",
    "\n",
    "$$\\vec\\mu = (\\bar x, \\bar y) \\text{ und } C = \\left( \n",
    "  \\begin{array}{rr} \n",
    "  \\sigma_x  ^2 & \\rho \\sigma_x \\sigma_y \\\\ \n",
    "   \\rho \\sigma_x \\sigma_y &  \\sigma_y^2\\\\ \n",
    "  \\end{array} \\right)$$\n",
    "\n",
    "<img src=\"./figures/10/error_elipse.png\" width=\"70%\" alt=\"image\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf784648-04e4-4d6b-a6d0-e85f4210ee1d",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### Poisson- Binomial- und Gauß-Verteilung \n",
    "\n",
    "binomial distribution with $n= 1000$ and $p = 0.01$<br>  \n",
    "poisson distribution with  $\\mu = 10$(schraffiert)<br>\n",
    "Gaussian distribution with $\\mu = 10$, $\\sigma = \\sqrt{10}$\n",
    "\n",
    "<img src=\"./figures/08/bpg.jpg\" width=\"55.0%\"\n",
    "alt=\"image\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91e4ed52",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### Uniform distribution\n",
    "\n",
    "Uniform distribution: the probability density function  f(x) is constant and equal to  \n",
    "$\\frac{1}{b-a}$ for $a \\leq x \\leq b$ and zero outside the interval."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a3b105a2-4332-46f7-a423-df435d4b06e8",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": []
   },
   "source": [
    "Expectation value and variance:\n",
    "$$<x> = E[x] = \\int_a^b \\frac{x}{b-a}\\, dx = \\frac{1}{2(b-a)} [b^2-a^2] = \\frac{a + b}{2}$$\n",
    "\n",
    "$$\\begin{aligned}\n",
    "      V[x] & = &\\sigma^2 = E[(x-<x>)^2] = E[x^2] - <x>^2 \\\\\n",
    "           & = & \\int_a^b \\frac{x^2}{b-a}\\, dx - <x>^2 = \\frac{b^3 - a^3}{3(b-a)} - \\frac{(a+b)^2}{4}\\\\\n",
    "           & = & \\frac{b^2+ab+a^2}{3}-\\frac{a^2 + 2ab + b^2}{4} = \\frac{(b-a)^2}{12} \\\\ \n",
    "\\end{aligned}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dfb30ccb",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### Example:  strip detectors"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b00e1a6",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": "-"
    },
    "tags": []
   },
   "source": [
    "<img src=\"./figures/10/cms_strip.jpg\" width=\"80.0%\"\n",
    "alt=\"image\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ddd2a24f",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": "-"
    },
    "tags": []
   },
   "source": [
    "<img src=\"./figures/10/strip.png\" width=\"80.0%\"\n",
    "alt=\"image\" />  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2acc164e-255b-43f4-93ee-d569d0bb9e50",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    },
    "tags": []
   },
   "source": [
    "one measurement: signal in strip $i$, $p(x)$ is uniformly distributed between\n",
    "$b_i$ und $a_i$  \n",
    "\n",
    "How is a combination of measurements distributed?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e4d5dad2-983e-4f8f-b305-fb0a787282d6",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "### Zentraler Grenzwertsatz \n",
    "\n",
    "Zentraler Grenzwertsatz: Die Wahrscheinlichkeitsdichte der Summe\n",
    "$\\sum_{i=0}^{n} x_i$ einer Stichprobe aus $n$ unabhängigen\n",
    "Zufallsvariablen $x_i$ mit einer beliebigen Wahrscheinlichkeitsdichte\n",
    "mit Mittelwert $<x>$ und Varianz $\\sigma^2$ geht in der Grenze\n",
    "$n \\to \\infty$ gegen eine Gau\"s-Wahrscheinlichkeitsdichte mit Mittelwert\n",
    "$\\mu = n \\cdot <x>$ und Varianz $V[w] = n \\cdot \\sigma^2$.\n",
    "\n",
    "<embed src=\"./figures/08/Summe-von-Gleichverteilungen3.pdf\"\n",
    "style=\"width:60.0%\" />  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb2a4c3c",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## What is meant with error/uncertainty on a measured quantity?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ffd68924",
   "metadata": {},
   "source": [
    "If we quote $a = 1 \\pm 0.5$, we usually mean that the probability for the *true* value of $a$ is Gaussian $G(a, \\mu, \\sigma)$ distributed with $\\mu = 1$ and $\\sigma = 0.5$.  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8037279",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Exercise: How often can/should the measurement be outside one sigma?\n",
    "\n",
    "Let's use pseudo-experiments/Monte Carlo:\n",
    "\n",
    " * generate 10.000 Gaussian distributed measurements\n",
    " * count how ofter they differ by more than one sigma\n",
    " \n",
    " Relatively easy with *scipy* and *numpy*:\n",
    " * use [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html)\n",
    " * use [scipy.stats.norm](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html) class\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9b4557d9",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "a92fa7d1",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "import scipy.stats as stats\n",
    "import numpy as np\n",
    "\n",
    "pseudo_a = stats.norm.rvs(1, 0.5, 10000)\n",
    "print(pseudo_a)\n",
    "is_outside = abs(pseudo_a - 1) > 0.5\n",
    "print(is_outside)\n",
    "print(\"fraction outside one sigma:\", sum(is_outside)/len(pseudo_a)) "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54d58609",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Why is it a Gaussian?\n",
    "\n",
    "Central limit theorem:\n",
    "\n",
    "\"let $X_{1},X_{2},\\dots ,X_{n}$ denote a statistical sample of size $n$  from a population with expected value (average) $\\mu$ and finite positive variance $\\sigma ^{2}$, and let $\\bar {X_{n}}$ denote the sample mean (which is itself a random variable). Then the limit as $n\\to \\infty$ of the distribution of $\\frac {({\\bar {X}}_{n}-\\mu )}{\\frac {\\sigma }{\\sqrt {n}}}$, is a normal distribution with mean 0  and variance 1.\"\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "977b88f5-3cb7-445b-adac-f44be4d69c90",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## Back to the Bundesliga"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "641ec6cc-6f1a-4399-b652-7e3da333782a",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "data = np.loadtxt(\"./exercises/09_data.txt\")\n",
    "summe = data[:,0] + data[:,1]\n",
    "freq = plt.hist(summe, bins=10, range=(-0.5,9.5))\n",
    "plt.xlabel(\"k\")\n",
    "print(freq)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6bcf92b-386d-492b-9f09-e29118e93505",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": []
   },
   "source": [
    "In 18 matches exactly six goals were scored.<br>\n",
    "How large is the error on this 18?<br>\n",
    "What is a 68,27\\%-confidence interval for the 6-goals bin?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c4a1016-7fc9-4eb6-b515-b51c0d16fed8",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    },
    "tags": []
   },
   "source": [
    "naively: $k=18$, poisson distribution with $\\hat \\mu = 18$ and $\\sigma = \\sqrt{\\hat \\mu} = \\sqrt{18}$: $18\\pm 4.12$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b26cd06e-7700-4011-b5d8-6bf56e22489f",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### Confidence: \n",
    "$P(x_- \\le x \\le x_+) = \\int_{x_-}^{x^+} P(x) dx$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1903f299-4d16-41b0-bb48-5f17a07a9006",
   "metadata": {
    "cell_style": "center",
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": []
   },
   "source": [
    "### However, what is  $x$? $\\mu$ or $k$???"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b54b11f9-e897-4d2a-a5df-961c16cdbb3f",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": []
   },
   "source": [
    "Confidence for $P(k=18) = 1$ (our measurement!) \n",
    "\n",
    "We need the confidence interval for $\\mu$.\n",
    "\n",
    "\n",
    "Hence: $P(\\mu_- \\le \\mu \\le\\mu_+) = \\frac{\\int_{\\mu_-}^{\\mu_+} P(k, \\mu) d\\mu}{\\int_{0}^{\\infty} P(k, \\mu) d\\mu}$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5052bdb2-6e4b-4ea6-b08e-ba22f2a4a3d2",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import scipy\n",
    "\n",
    "mus = np.linspace(0,50,1000)\n",
    "plt.plot(mus, scipy.stats.poisson.pmf(18,mus))\n",
    "plt.xlabel(r\"$\\mu$\")\n",
    "scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(18,x), 0, 100)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d278b736-1f45-4b6c-8e1a-c851bb618aeb",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### Test of naive interval"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "71c0df91-c1e0-4fbf-a94b-fd88f839ef07",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    },
    "tags": []
   },
   "source": [
    "$P(\\mu_- \\le \\mu \\le\\mu_+) = \\int_{\\mu_-}^{\\mu_+} P(k, \\mu) d\\mu$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c595233d",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7c461ddf-0cb8-49b0-adc6-a7ae6f11212b",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(18,x), 18-np.sqrt(18), 18+np.sqrt(18))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e34f6d9d",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "...Ok-ish..."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88f176b2-deb2-4212-b4ba-d86b7964e545",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## Example from Publication:\n",
    "\"Search for flavor changing neutral currents in decays of top quarks\" (D0)\n",
    "\n",
    "Physics Letters B 701 (2011), pp. 313-320\n",
    "[https://arxiv.org/abs/1103.4574]\n",
    "\n",
    "<img src=\"./figures/12/ht_bq_1jets_comb_5pc.png width=\"100%\" \\>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5135b8e6-711b-440e-8507-f049d017c137",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### Let's try another bin\n",
    "\n",
    "1 match had exactly seven goals: \n",
    "\n",
    "Naive confidence interval: $1\\pm 1$\n",
    "\n",
    "Test it:\n",
    " - draw the Poisson probability $P(1,\\mu$ vs. \\mu\n",
    " - compute the confidence for the interval $[0, 2]$ numerically with [scipy.integrate.quad](https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.quad.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "06885d9b",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e6102fec-1275-439a-b6b4-8cb1ee4960fc",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "mus = np.linspace(0,10,1000)\n",
    "plt.plot(mus, scipy.stats.poisson.pmf(1,mus))\n",
    "plt.xlabel(r\"$\\mu$\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a0595e97",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(1,x), 1-np.sqrt(1), 1+np.sqrt(1))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c24eace",
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "too small!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3139be1a-8a4d-4ae9-9fa7-aa99d7efffe6",
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### Let's find a better interval: $[\\mu_-, \\mu_+]$\n",
    "\n",
    "here:\n",
    "$\\mu_- = 0$\n",
    "\n",
    "\n",
    "find $\\mu_+$, so that the integral from $[0,\\mu_+]$ is $0.6827$:\n",
    " - define a function that returns the integral for a given $mu_+$\n",
    " - plot the function \n",
    " - use root finding with [scipy.optimize.brentq](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.brentq.html) to find the value of $\\mu_+$ where the integral minus 0.6827 is 0."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2d187a99",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "630c83e0-2be4-41dd-a124-2ac672a2c507",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "def intervall(mu_plus):\n",
    "    return scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(1,x), 0, mu_plus)[0]\n",
    "\n",
    "mus = np.linspace(0,10,100)\n",
    "plt.plot(mus, np.vectorize(intervall)(mus))\n",
    "plt.grid()\n",
    "plt.xlabel(r\"$\\mu$\")\n",
    "\n",
    "scipy.optimize.brentq(lambda x: intervall(x)-0.6827, 0,10)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f20e18c3",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "Korrektes Intervall: $[0; 2.36]$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c4ad7b4-bbdb-4f9d-a795-0e2dae39a181",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### What is now the correct interval for the six-goals bin?\n",
    "\n",
    "Which one?\n",
    "\n",
    "symmetric around $\\mu$ or...\n",
    "\n",
    "e.g. symmetric in the confidence:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5fffd76e-14da-4854-8af0-5262e0c3c9a3",
   "metadata": {
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def intervall_minus(mu):\n",
    "    return scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(18,x), mu, 18)[0]\n",
    "\n",
    "def intervall_plus(mu):\n",
    "    return scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(18,x), 18, mu)[0]\n",
    "\n",
    "print(\"naive:\", 18-np.sqrt(18), 18+np.sqrt(18))\n",
    "mu_minus = scipy.optimize.brentq(lambda x: intervall_minus(x)-0.6827/2, 0,18)\n",
    "mu_plus = scipy.optimize.brentq(lambda x: intervall_plus(x)-0.6827/2, 18,40)\n",
    "print(\"symmetric in P:\", mu_minus, mu_plus) \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "52f8f412-cb94-4c32-95b6-03f178cbcdb6",
   "metadata": {},
   "outputs": [],
   "source": [
    "scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(18,x), mu_minus, mu_plus)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7ef04e8-1dc9-4e8b-b597-39d277bde591",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### Confidence intervals for  Poisson"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e0a6084d-dd5d-4ffc-8849-a61a8327fb4e",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def intervall_minus(k, mu):\n",
    "    return scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(k,x), mu, k)[0]\n",
    "\n",
    "def intervall_plus(k, mu):\n",
    "    return scipy.integrate.quad(lambda x: scipy.stats.poisson.pmf(k,x), k, mu)[0]\n",
    "\n",
    "def conv_limits(k, c):\n",
    "    c_low = intervall_minus(k, 0)\n",
    "    c_up = c / 2\n",
    "    if c_low < c/2:\n",
    "        mu_minus = 0\n",
    "    else:\n",
    "        c_low = c / 2\n",
    "        mu_minus = scipy.optimize.brentq(lambda x: intervall_minus(k, x)-c_low, 0,k)\n",
    "    c_up = c - c_low\n",
    "    mu_plus = scipy.optimize.brentq(lambda x: intervall_plus(k, x)-c_up, k,  2*k + 20)\n",
    "    return mu_minus, mu_plus\n",
    "\n",
    "conv_limits(3, 0.6827)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "382e5a13-5f5f-44e5-90dd-7272fbdb171a",
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "ks = np.arange(0, 20)\n",
    "\n",
    "limits = np.zeros((len(ks), 2))\n",
    "for i,k in enumerate(ks):\n",
    "    limits[i] = conv_limits(k, 0.6827)\n",
    "#print(limits)\n",
    "plt.plot(ks, limits[:,0], 'b_')\n",
    "plt.plot(ks, limits[:,1], 'r_')\n",
    "plt.plot(ks, ks-np.sqrt(ks), 'b.')\n",
    "plt.plot(ks, ks+np.sqrt(ks), 'r.')\n",
    "plt.xlabel(\"$k$\")\n",
    "plt.ylabel(r\"$\\mu$\")\n",
    "plt.grid()\n",
    "\n",
    "print(limits[2])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "61d94d51-fc1c-4ddb-9cbc-c416a28158f8",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## Confidence intervals for Gaussian distribution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cb5826ae-29df-41c3-ad8f-dde61b5d90ba",
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "def intervall_minus(x, mu, sigma):\n",
    "    return scipy.integrate.quad(lambda y: scipy.stats.norm.pdf(x,y, sigma), mu, x)[0]\n",
    "\n",
    "def intervall_plus(x, mu, sigma):\n",
    "    return scipy.integrate.quad(lambda y: scipy.stats.norm.pdf(x, y, sigma), x, mu)[0]\n",
    "\n",
    "def conv_limits(x, c, sigma):\n",
    "    c_low = c / 2\n",
    "    mu_minus = scipy.optimize.brentq(lambda mu: intervall_minus(x, mu, sigma)-c_low,  x - 10*sigma,x)\n",
    "    c_up = c - c_low\n",
    "    mu_plus = scipy.optimize.brentq(lambda mu: intervall_plus(x, mu, sigma)-c_up, x,  x + 10*sigma)\n",
    "    return mu_minus, mu_plus\n",
    "\n",
    "conv_limits(3, 0.6827, 1)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e87da118-0c11-4345-a656-096de6f980e9",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "xs = np.linspace(0,20,100)\n",
    "sigma = 1\n",
    "limits = np.zeros((len(xs), 2))\n",
    "for i,k in enumerate(xs):\n",
    "    limits[i] = conv_limits(k, 0.6827, sigma)\n",
    "#print(limits)\n",
    "plt.plot(xs, limits[:,0], 'b')\n",
    "plt.plot(xs, limits[:,1], 'r')\n",
    "plt.plot(xs, xs-sigma, 'b.')\n",
    "plt.plot(xs, xs+sigma, 'r.')\n",
    "plt.xlabel(\"$x$\")\n",
    "plt.ylabel(r\"$\\mu$\")\n",
    "plt.grid()\n",
    "\n",
    "print(limits[2])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "202144ae-1057-41fc-8757-7f6f40c75868",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(scipy.integrate.quad(lambda y: scipy.stats.norm.pdf(3, y, 1), 2, 3)[0], scipy.integrate.quad(lambda x: scipy.stats.norm.pdf(x, 3, 1), 2,3)[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24359dd4-7289-4235-872e-1138d5fe0b0b",
   "metadata": {},
   "source": [
    "For Gaussian: $G(x, \\mu, \\sigma) = G(\\mu, x, \\sigma)$: $\\int_{\\mu_-}^{\\mu_+} G(x, \\mu, \\sigma) d\\mu = \\int_{\\mu_-}^{\\mu_+} G(\\mu, x, \\sigma) d\\mu$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10f482d3-6667-4250-aefa-5df896e46936",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## Confidence regions\n",
    "\n",
    "Confidence values for Gaussian distribution and intervals: $[\\mu - z\\sigma, \\mu - z\\sigma]$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "580b83dc-3cb0-40df-bc77-8ad48d90fce5",
   "metadata": {},
   "outputs": [],
   "source": [
    "def conv_gaus(z):\n",
    "    return scipy.stats.norm.cdf(z) - scipy.stats.norm.cdf(-z)\n",
    "\n",
    "for z in [1,2,3,4,5]:\n",
    "    print(z, conv_gaus(z))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3904ad68-2b4d-4207-a7e8-35def0372238",
   "metadata": {},
   "source": [
    "$z$ for 68\\%, 90\\%, 95\\% und 99\\%: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5a71ee06-db32-4349-a3da-ae0e394fbd74",
   "metadata": {},
   "outputs": [],
   "source": [
    "for c in [0.68, 0.90, 0.95, 0.99]:\n",
    "    print(c, scipy.optimize.brentq(lambda z: conv_gaus(z)-c,0, 10))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d5f1dab-cebd-45fe-934e-36cd06ba2782",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### Regions in two or three dimensions:\n",
    "\n",
    "$\\vec x^T = (x_1, x_2, \\dots, x_n)$ \n",
    "\n",
    "Let $x_i$ be independent random variables and Gaussian distributed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "73c11f8d-c8e4-4dc7-ad37-4ca4a33c55ea",
   "metadata": {},
   "outputs": [],
   "source": [
    "def conv_gaus_nd(z, n):\n",
    "    return np.power(scipy.stats.norm.cdf(z) - scipy.stats.norm.cdf(-z),n)\n",
    "\n",
    "for z in [1,2,3,4,5]:\n",
    "    print(z, conv_gaus_nd(z, 1), conv_gaus_nd(z, 2), conv_gaus_nd(z, 3))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8dd1036c-edf9-40c4-be16-56025f49134b",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Regions in two or three dimensions:\n",
    "\n",
    "$z$ for 68\\%, 90\\%, 95\\% und 99\\%: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7f14d38e-1e41-460e-9876-2afb62bed08b",
   "metadata": {},
   "outputs": [],
   "source": [
    "for c in [0.68, 0.90, 0.95, 0.99]:\n",
    "    print(c, scipy.optimize.brentq(lambda z: conv_gaus_nd(z, 1)-c,0, 10), scipy.optimize.brentq(lambda z: conv_gaus_nd(z, 2)-c,0, 10), scipy.optimize.brentq(lambda z: conv_gaus_nd(z,3)-c,0, 10))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9aca662e-bdd6-408e-8f28-852614e51577",
   "metadata": {},
   "source": [
    "1,2,3-$\\sigma$-equivalents in two or three dimensions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "702daac8-81e8-406c-b70f-c3146c58a493",
   "metadata": {},
   "outputs": [],
   "source": [
    "for k in [1, 2, 3]:\n",
    "    c = conv_gaus_nd(k, 1)\n",
    "    print(c, scipy.optimize.brentq(lambda z: conv_gaus_nd(z, 2)-c,0, 10), scipy.optimize.brentq(lambda z: conv_gaus_nd(z,3)-c,0, 10))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bba9c7fc-9e11-4441-90e0-f1d186a1726e",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "# Hypothesis tests\n",
    "\n",
    "**Bundesliga**:\n",
    "Hypothesis: \"The $k_i$ goals in each Bundesliga match $i$ are Poisson distributed with a common parameter $\\mu = <k>$.\"\n",
    "\n",
    "We need an alternative Hypothesis for the test: \"The goals in each Bundesliga match $k_i$ are Poisson distributed with parameter $\\mu_i = ki$ for each match.\"\n",
    "\n",
    "Errors of first and second kind:\n",
    "* error of first kind: The hypothesis is true, but is rejected.\n",
    "  \n",
    "  significance: $\\alpha$ \n",
    "  \n",
    "  specificity: $1-\\alpha$ (efficiency) \n",
    "\n",
    "\n",
    "* error of the second kind: The hypothesis is accepted, but is wrong (false positive).\n",
    "\n",
    "  probability for error: $\\beta$\n",
    "  \n",
    "  power: $1-\\beta$ "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "644cb69d",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### Example\n",
    "Gaussian distributed random variable $x$ ($\\sigma = 1$)\n",
    "\n",
    "Hypothesis: $\\mu = 0$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e67fff2-211b-4eef-95e2-44e3b363098c",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": "-"
    },
    "tags": []
   },
   "source": [
    "For which region $x_- < x < x_+$ shall we accept the hypothesis for an error of the first kind of $\\alpha = 5\\%$?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "85e25147-c902-4434-b3dc-908486408c5b",
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": "fragment"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "x = np.linspace(-5,5,100)\n",
    "xout1 = np.linspace(-5, -1.9599639845401602)\n",
    "xout2 = np.linspace(1.9599639845401602, 5)\n",
    "plt.plot(x, scipy.stats.norm.pdf(x))\n",
    "plt.fill_between(xout1, scipy.stats.norm.pdf(xout1), np.zeros_like(xout1),color=\"c\")\n",
    "plt.fill_between(xout2, scipy.stats.norm.pdf(xout2), np.zeros_like(xout2),color=\"c\")\n",
    "\n",
    "plt.xlabel(\"$x$\")\n",
    "\n",
    "plt.grid()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6cbb0d56-981d-4dfa-9a7a-0d6d4a51a357",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### What is the error of the second kind?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cb79afd6-ba74-448b-9096-05cedeb842a6",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "Needs an alternative hypothesis!\n",
    "\n",
    "Example: true $\\mu = 3$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "603a31e1-1425-4eb6-9b12-c44ba31d06a6",
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "x = np.linspace(-5,8,200)\n",
    "xin = np.linspace(-1.9599639845401602, 1.9599639845401602)\n",
    "xout1 = np.linspace(-5, -1.9599639845401602)\n",
    "xout2 = np.linspace(1.9599639845401602, 5)\n",
    "plt.plot(x, scipy.stats.norm.pdf(x))\n",
    "plt.fill_between(xout1, scipy.stats.norm.pdf(xout1), np.zeros_like(xout1),color=\"c\")\n",
    "plt.fill_between(xout2, scipy.stats.norm.pdf(xout2), np.zeros_like(xout2),color=\"c\")\n",
    "plt.plot(x, scipy.stats.norm.pdf(x, 3))\n",
    "plt.fill_between(xin, scipy.stats.norm.pdf(xin, 3), np.zeros_like(xin),color=\"orange\")\n",
    "plt.xlabel(\"$x$\")\n",
    "plt.grid()\n",
    "print(\"error of second kind: beta:\", scipy.stats.norm.cdf(1.9599639845401602, 3) - scipy.stats.norm.cdf(-1.9599639845401602, 3))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6a454b7-fce4-4dca-ad3c-9e220956706b",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "## Neyman-Pearson Lemma\n",
    "\n",
    "Hypothesis: $H$\n",
    "\n",
    "Alternative hypothesis: $A$\n",
    "\n",
    "Goal: a criterion for an acceptance region giving the highest power (signal purity).\n",
    "\n",
    "Assume $H$ is accepted for $x$ in range $V$:\n",
    "\n",
    "$\\int_{V} P_{H}(x) dx = 1 - \\alpha$ (large)\n",
    "\n",
    "$\\int_{V} P_{A}(x) dx = \\beta$ (small)\n",
    "\n",
    "In $V$ are the values of $x$, for which $\\frac{P_{H}(x)}{P_{A}(x)}$ is large.\n",
    "\n",
    "Neyman-Pearson lemma: best range selected with  $\\frac{P_{H}(x)}{P_{A}(x)} > c$ (likelihood ratio)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b636ff7e-5ba1-4fc5-bb40-fb29b63ca88c",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### Finally: Does are model for the Bundesliga work?\n",
    "\n",
    "**Bundesliga**:\n",
    "Hypothesis: \"The $k_i$ goals in each Bundesliga match $i$ are Poisson distributed with a common parameter $\\mu = <k>$.\"\n",
    "\n",
    "Alternative hypothesis: \"The goals in each Bundesliga match $k_i$ are Poisson distributed with parameter $\\mu_i = ki$ for each match.\"\n",
    "\n",
    "Neyman-Pearson: likelihood ratio: $\\frac{P_{A}(x)}{P_{H}(x)} = \\frac{\\hat L(\\vec k; \\vec k)}{L(\\mu; \\vec k)} > c$\n",
    "\n",
    "We look at the $log$ of the likelihood ratio for our model $-lnL(\\mu; \\vec k)$  and the alternative (saturated model) $-ln\\hat L(\\vec k; \\vec k)$.\n",
    "\n",
    "$P_{H} (\\vec k) = \\prod_{i} P(k_i,\\mu)$; $P_{A} (\\vec k) = \\prod_{i} P(k_i, k_i)$\n",
    "\n",
    "\n",
    "$d = \\ln \\frac{P_{A}(\\vec k)}{P_{H}(\\vec k)} = -\\ln \\frac{P_{H}(\\vec k)}{P_{A}(\\vec k)} = -\\ln L(\\mu; \\vec k)-(-\\ln\\hat L(\\vec k; \\vec k))$\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "75234801-341f-4474-a2bf-93aafe9add77",
   "metadata": {},
   "outputs": [],
   "source": [
    "mu = np.mean(summe)\n",
    "print(\"mu:\", mu)\n",
    "print(\"P(H):\", np.prod(scipy.stats.poisson.pmf(summe,mu)))\n",
    "print(\"P(A):\",  np.prod(scipy.stats.poisson.pmf(summe,summe)))\n",
    "print(\"-ln P(H):\", -np.sum(scipy.stats.poisson.logpmf(summe,mu)))\n",
    "print(\"-ln P(A):\", -np.sum(scipy.stats.poisson.logpmf(summe,summe)))\n",
    "print(\"d:\", -np.sum(scipy.stats.poisson.logpmf(summe,mu)) + np.sum(scipy.stats.poisson.logpmf(summe,summe)))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e840a509-4f5b-48ef-9594-eb1dc0bbdc4a",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### What does this mean?\n",
    "\n",
    "Which distribution of our test statistic $d$ do we expect for pseudo-data from our model and $\\mu=<k>$?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c2d1227f",
   "metadata": {
    "cell_style": "split",
    "jupyter": {
     "source_hidden": true
    },
    "scrolled": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "#simuliere 1000 Spielzeiten\n",
    "muobs = np.mean(summe)\n",
    "tore = scipy.stats.poisson.rvs(muobs,size=(1000,306))\n",
    "d = np.zeros(len(tore))\n",
    "mu = np.zeros(len(tore))\n",
    "for i in range(len(tore)):\n",
    "    mu[i] = np.mean(tore[i,:])\n",
    "    d[i] = np.sum(-scipy.stats.poisson.logpmf(tore[i],mu[i]) + scipy.stats.poisson.logpmf(tore[i],tore[i]))\n",
    "    \n",
    "plt.hist(mu, bins=50)\n",
    "muobs = np.mean(summe)\n",
    "plt.plot([muobs, muobs], [0, 100], linestyle = 'dotted')\n",
    "plt.grid()\n",
    "plt.xlabel(\"$\\hat \\mu$\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "90e27f87-36fd-4b11-add1-c82a4eeb79dd",
   "metadata": {
    "cell_style": "split",
    "jupyter": {
     "source_hidden": true
    },
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "dobs = -np.sum(scipy.stats.poisson.logpmf(summe,muobs)) + np.sum(scipy.stats.poisson.logpmf(summe,summe))\n",
    "plt.hist(d, bins=50)\n",
    "plt.plot([dobs, dobs], [0, 100], linestyle = 'dotted')\n",
    "\n",
    "plt.grid()\n",
    "plt.xlabel(\"$-\\ln(P(H)/P(A))$\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6db4bd4-de67-48e1-add9-4fdc05dbbcf7",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### The $p$-value \n",
    "\n",
    "Fraction of expected outcomes that have a higher value than we get for data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18f65b9d",
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "plt.hist(mu, bins=50)\n",
    "plt.plot([muobs, muobs], [0, 100], linestyle = 'dotted')\n",
    "plt.grid()\n",
    "plt.xlabel(\"$\\hat \\mu$\")\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b82090eb",
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "dobs = -np.sum(scipy.stats.poisson.logpmf(summe,muobs)) + np.sum(scipy.stats.poisson.logpmf(summe,summe))\n",
    "plt.hist(d, bins=50)\n",
    "plt.plot([dobs, dobs], [0, 100], linestyle = 'dotted')\n",
    "\n",
    "plt.grid()\n",
    "plt.xlabel(\"$-\\ln(P(H)/P(A))$\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6647f14e-2282-4080-9139-c332db4af23a",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"p:\", np.sum(d > dobs)/len(d))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c835358c-1291-431c-b0ab-792a4b940d68",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### Can we do this analytically, without Monte Carlo?\n",
    "\n",
    "Simple case: Gaussian distributions:\n",
    "\n",
    "$$d =  -\\ln L(\\mu; \\vec x)-(-\\ln\\hat L(\\vec x; \\vec x) = -\\ln \\prod_{i} \\frac{G(x_i,\\mu, \\sigma)}{G(x_i, x_i, \\sigma)}=-\\ln \\prod_{i} \\frac{\\exp(-\\frac{1}{2}(\\frac{x_i-\\mu}{\\sigma})^2)}{\\exp(-\\frac{1}{2}(\\frac{x_i-x_i}{\\sigma})^2)} = -\\ln \\exp(-\\frac{1}{2}(\\frac{x_i-\\mu}{\\sigma})^2) = \\frac{1}{2} \\sum_i\\left(\\frac{x_i - \\mu}{\\sigma}\\right)^2 = \\frac{1}{2}\\chi^2$$\n",
    "\n",
    "For Gaussian: $-2 \\ln\\frac{P(H}{P(A)}$ follows $\\chi^2$ \n",
    "\n",
    "\n",
    "Wilk's Theorem:\n",
    "$-2 \\ln\\frac{P(H}{P(A)}$ approaches a  $\\chi^2$ distribution (asymptotic limit) with $n$ degrees of freedom $n = \\text{number of data points} - \\text{number of model parameters}$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f09138b6",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Let's try it\n",
    "\n",
    " - use $\\chi^2$ p.d.f. and c.d.f. from [scipy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2.html)\n",
    " - what is the value of `df`\n",
    " \n",
    " - draw the $\\chi^2$ distribution in the interval $[200, 425]$\n",
    " \n",
    " - compute the $p$-value for our data\n",
    " \n",
    " - does it work? Add the histgram for the simulated $\\chi^2=2d$ values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e609ea07",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9d667a3b-6a89-419a-8be4-79cb7de96151",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [],
   "source": [
    "plt.hist(2*d, bins=50, density=True)\n",
    "plt.plot([2*dobs, 2*dobs], [0, 0.02], linestyle = 'dotted')\n",
    "ds = np.linspace(200, 425, 100)\n",
    "plt.plot(ds,scipy.stats.chi2.pdf(ds, 305))\n",
    "\n",
    "\n",
    "\n",
    "print(\"p-Wert via Chi2:\", scipy.stats.chi2.sf(2*dobs, 305))\n",
    "\n",
    "plt.grid()\n",
    "plt.xlabel(\"$-2\\ln(P(H)/P(A))$\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f258f33",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### How are the  $p$-values using Wilk's theorem from the simulation distributed?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cda0957a-414f-4bea-ab33-e61218a4a4f5",
   "metadata": {
    "cell_style": "center"
   },
   "outputs": [],
   "source": [
    "plt.hist( scipy.stats.chi2.sf(2*d, 305), bins=50, density=True)\n",
    "plt.plot([ scipy.stats.chi2.sf(2*dobs, 305),  scipy.stats.chi2.sf(2*dobs, 305)], [0, 12], linestyle = 'dotted')\n",
    "\n",
    "plt.grid()\n",
    "plt.xlabel(\"p\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9378e3ba",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### What distribution should we get if it works?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "59be1788",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "For the simulation:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d172ac9",
   "metadata": {
    "cell_style": "split"
   },
   "source": [
    "For the $chi^2$ distribution:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a096afa1",
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "pvalues = [np.sum(d > dsim)/len(d) for dsim in d]\n",
    "\n",
    "plt.hist(pvalues, bins=50, density=True)\n",
    "plt.grid()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "648b37eb",
   "metadata": {
    "cell_style": "split"
   },
   "outputs": [],
   "source": [
    "d2 = scipy.stats.chi2.rvs(305, size=10000)\n",
    "pvalues = [np.sum(d2 > d2sim)/len(d2) for d2sim in d2]\n",
    "\n",
    "plt.hist(pvalues, bins=50, density=True)\n",
    "plt.grid()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "557559e1-8e4a-412a-9486-a2fb9b97de90",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Does is work better for large $\\mu$ (hand ball?\n",
    "\n",
    "- simulate 1000 season for $\\mu_\\text{obs} = 35$\n",
    "- and rerun the tests"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "770f4472",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "472aac93-e2dd-424a-ba0e-51c10d409f8e",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "#simuliere 5000 Spielzeiten\n",
    "muobs = 35\n",
    "\n",
    "tore = scipy.stats.poisson.rvs(muobs,size=(5000,306))\n",
    "\n",
    "d = np.zeros(len(tore))\n",
    "mu = np.zeros(len(tore))\n",
    "for i in range(len(tore)):\n",
    "    mu[i] = np.mean(tore[i,:])\n",
    "    d[i] = np.sum(-scipy.stats.poisson.logpmf(tore[i],mu[i]) + scipy.stats.poisson.logpmf(tore[i],tore[i]))\n",
    "    \n",
    "plt.hist(mu, bins=50)\n",
    "plt.grid()\n",
    "plt.xlabel(\"$\\hat \\mu$\")\n",
    "plt.show()\n",
    "\n",
    "plt.hist(2 * d, bins=50, density=True)\n",
    "ds = np.linspace(200, 425, 100)\n",
    "plt.plot(ds,scipy.stats.chi2.pdf(ds, 305))\n",
    "plt.grid()\n",
    "plt.xlabel(\"$-2\\ln(P(H)/P(A))$\")\n",
    "plt.show()\n",
    "\n",
    "plt.hist( scipy.stats.chi2.sf(2*d, 305), bins=50)\n",
    "plt.grid()\n",
    "plt.xlabel(\"p\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "450edccb-2c13-4400-9613-4a1f3e069ce6",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "### Pearson's $\\chi^2$-Test\n",
    "\n",
    "What are the confidence regions for $n$ degrees of freedom/dimensions:\n",
    "\n",
    "data: $\\vec y = y_1,\\dots, y_n$\n",
    "predicted values from model: $\\vec{f} = f_1,\\dots,f_n$\n",
    "\n",
    "$$\\chi^2 = (\\overrightarrow{y\\ } - \\vec{f})^{T} V(\\vec{y}- \\vec{f})$$\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f703d4ed",
   "metadata": {
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "def conv_chi2_nd(z, n):\n",
    "    return scipy.stats.chi2.cdf(z,n)\n",
    "\n",
    "\n",
    "for z in [1,2,3,4,5]:\n",
    "    print(z, conv_chi2_nd(z, 1), conv_chi2_nd(z, 2), conv_chi2_nd(z, 3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2fd5c302-732b-4c6b-b224-5b9f79df57cc",
   "metadata": {
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "print(\"critical chi2 values\")\n",
    "for k in [1, 2, 3]:\n",
    "    c = conv_gaus_nd(k, 1)\n",
    "    print(c, scipy.optimize.brentq(lambda z: conv_chi2_nd(z, 1)-c,0, 30), scipy.optimize.brentq(lambda z: conv_chi2_nd(z, 2)-c,0, 30), scipy.optimize.brentq(lambda z: conv_chi2_nd(z,3)-c,0, 30))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23626cec-4198-44f4-a1c8-81e36992268f",
   "metadata": {},
   "outputs": [],
   "source": [
    "for c in [0.68, 0.90, 0.95, 0.99]:\n",
    "    print(c, scipy.optimize.brentq(lambda z: conv_chi2_nd(z, 1)-c,0, 30), scipy.optimize.brentq(lambda z: conv_chi2_nd(z, 2)-c,0, 30),scipy.optimize.brentq(lambda z: conv_chi2_nd(z,3)-c,0, 30))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69556eab-a67e-4251-a4b3-0290013aa344",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "# Schätzer\n",
    "\n",
    "## Fehlerrechnung\n",
    "\n",
    "### Fehlerrechnung: Beispiel I \n",
    "\n",
    "Widerstandsmessung: $U = 24$ V und $I = 0{,}6 \\pm 0{,}1$ A  \n",
    "Annahme: $I$ gaußverteilt $g(I)$ mit $\\mu_I = 0{,}6$ und\n",
    "$\\sigma_I = 0{,}1$  \n",
    "Was ist $p(R)$?  \n",
    "$R = U/I$, $I = U/R$ und $|dI/dR| = U/R^2$  \n",
    "$$\\begin{aligned}\n",
    " p(R) & = & U/R^2 g(I(R)) = \\frac{U}{R^2 \\sqrt{2\\pi}\\sigma_I}e^{-\\frac{(U/R-\\mu_I)^2}{2\\sigma_I^2}} \\\\\n",
    "         & = & \\frac{U}{R^2 \\sqrt{2\\pi}\\sigma_I}e^{-\\frac{(R-U/\\mu_I)^2}{2\\sigma_I^2*R^2/\\mu_I^2}} \\\\\n",
    "   \n",
    "\\end{aligned}$$ (show in Jupyter)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "06ff9896-15bc-4404-a789-bbe7557350eb",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "### Fehlerrechnung: Beispiel II \n",
    "\n",
    "Spannungsmessung: $R = 40$ $\\Omega$ und $I = 0{,}6 \\pm 0{,}1$ A  \n",
    "Annahme: $I$ gaußverteilt $g(I)$ mit $\\mu_I = 0{,}6$ und\n",
    "$\\sigma_I = 0{,}1$  \n",
    "Was ist $p(U)$?  \n",
    "$U = RI$, $I = U/R$ und $|dI/dU| = 1/R$  \n",
    "$$\\begin{aligned}\n",
    "  p(U) & =  &1/R g(I(R)) = \\frac{1}{R \\sqrt{2\\pi}\\sigma_I}e^{-\\frac{(U/R-\\mu_I)^2}{2\\sigma_I^2}} \\\\\n",
    "          & = & \\frac{1}{\\sqrt{2\\pi}R\\sigma_I}e^{-\\frac{(U-R\\mu_I)^2}{2R^2\\sigma_I^2}} \\\\\n",
    "          & = & g(U) \\text{ mit $\\mu_U = R\\mu_I$ und $\\sigma_U = R\\sigma_I$}\\\\\n",
    "  \n",
    "\\end{aligned}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0fe93c64-f06b-49ec-addb-22b12cf06ec6",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "### Fehlerrechung: Allgemein \n",
    "\n",
    "Gaußsche Fehlerfortpflanzung: Annahme: $x$ gaußverteilt $g(x)$ mit\n",
    "$\\mu_x$ und $\\sigma_x$  \n",
    "Was ist $p(y(x))$?  \n",
    "$y(x) \\approx f(\\mu_x) + \\frac{dy}{dx}\\big|_{x=\\mu_x}(x-\\mu_x)$  \n",
    "$x - \\mu_x \\approx (y-y(\\mu_x))(\\frac{dy}{dx}\\big|_{\\mu_x})^{-1}$,\n",
    "$|dx/dy| = |\\frac{dy}{dx}(\\mu_x)|^{-1}$ $$\\begin{aligned}\n",
    "  p(y) & \\approx  &  |\\frac{dy}{dx}(\\mu_x)|^{-1} g(x(y)) = \\frac{1}{\\frac{dy}{dx}\\big|_{\\mu_x}\\sqrt{2\\pi}\\sigma_x}e^{-\\frac{(x(y)-\\mu_x)^2}{2\\sigma_x^2}} \\\\\n",
    "          & = & \\frac{1}{\\frac{dy}{dx}\\big|_{\\mu_x}\\sqrt{2\\pi}\\sigma_x}e^{-\\frac{(y-y(\\mu_x))^2}{2(\\frac{dy}{dx}\\big|_{\\mu_x})^2\\sigma_x^2}}\\\\\n",
    "          & = & g(y) \\text{ mit $\\mu_y = y(\\mu_x)$ und $\\sigma_y = \\frac{dy}{dx}\\Big|_{\\mu_x}\\sigma_x$}\\\\\n",
    "  \n",
    "\\end{aligned}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85f35e43-f5cd-4ccf-acd2-cbb5e963ce00",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "### Fehlerrechung: Mehrere Dimensionen \n",
    "\n",
    "Gaußsche Fehlerfortpflanzung: Annahme: $\\vec x$ gaußverteilt $g(\\vec x)$\n",
    "mit $\\vec \\mu$ und Kovarianz $C$  \n",
    "Was ist $p(\\vec y(\\vec x))$?  \n",
    "$y_i(\\vec x) \\approx y_i(\\vec \\mu) + \\frac{dy_i}{dx_j}\\big|_{\\vec \\mu} (x_j- \\mu_j)$;\n",
    "$$\\begin{aligned}\n",
    "  E([y_i y_j]) \\approx  & y_i(\\vec \\mu)y_j(\\vec \\mu) + y_i(\\vec \\mu)\\frac{dy_j}{dx_k}\\big|_{\\vec \\mu} E[x_k - \\mu_k] + y_j(\\vec \\mu)\\frac{dy_i}{dx_k}\\big|_{\\vec \\mu}E[x_k- \\mu_k] \\\\\n",
    "  & + \\frac{dy_i}{dx_k}\\big|_{\\vec \\mu}\\frac{dy_j}{dx_l}\\big|_{\\vec \\mu}E[(x_k- \\mu_k) (x_l - \\mu_l)]\\\\\n",
    "  = & y_i(\\vec \\mu)y_j(\\vec \\mu)  + \\frac{dy_i}{dx_k}\\big|_{\\vec \\mu}\\frac{dy_j}{dx_l}\\big|_{\\vec \\mu}C_{kl} \\text{ und  }V[y_iy_j] = E[y_iy_j] - E[y_i]E[y_j]\n",
    "    \n",
    "\\end{aligned}$$ mit $A_{ij} = [ \\frac{dy_i}{dx_j}\\big|_{\\vec \\mu}]$ ist\n",
    "$\\vec y(\\vec x)$ gaußverteilt mit\n",
    "$$\\vec \\mu_y = \\vec y(\\vec \\mu_x) \\text{ und Kovarianz } C_{yy} = A C_{xx} A^T$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8507d724-f9f8-4d41-a3b0-3ddaf0846dde",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "## Stichproben\n",
    "\n",
    "### Schätzer: Mittelwert aus Stichprobe\n",
    "\n",
    "Beispiel: Schätze $I$ aus drei Messungen der Stromstärke $I_1$, $I_2$,\n",
    "$I_3$ mit gleichem $\\sigma_I = 0.1$  A.\n",
    "\n",
    "Annahme: Messungen gauß-verteilt um I.  \n",
    "Schätzer für $\\mu$: Mittelwert: $$\\hat I = 1/3(I_1+I_2+I_3)$$ Fehler für\n",
    "unkorrellierte Messungen (keine syst. Fehler):  \n",
    "$$C = \\left( \n",
    "  \\begin{array}{rrr} \n",
    "  \\sigma_I^2 &0 & 0 \\\\ \n",
    "   0 &  \\sigma_I^2 & 0\\\\\n",
    "   0 & 0 & \\sigma_I^2 \n",
    "  \\end{array} \\right) \\text{ und } A = \\left(  \\begin{array}{rrr} \n",
    "  d\\hat I/dI_1 &  d\\hat I/dI_2  & d\\hat I/dI_3\n",
    "  \\end{array} \\right)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6463365e-354d-4e98-8f63-6903e2dbd6e6",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true,
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "### Beispiel: Fehlerfortpflanzung \n",
    "\n",
    "Fehler für unabhängige Messungen (keine syst. Fehler):  \n",
    "$$C = \\left( \n",
    "  \\begin{array}{rrr} \n",
    "  \\sigma_I^2 &0 & 0 \\\\ \n",
    "   0 &  \\sigma_I^2 & 0\\\\\n",
    "   0 & 0 & \\sigma_I^2 \n",
    "  \\end{array} \\right) \\text{ und } A = \\left(  \\begin{array}{rrr} \n",
    "  1/3 & 1/3 & 1/3\n",
    "  \\end{array} \\right)$$ Fehlerfortpflanzung: $$\\begin{aligned}\n",
    "C_{\\hat I} & = & ACA^T = \\left( \n",
    " \\begin{array}{rrr} \n",
    "  1/3 & 1/3 & 1/3 \n",
    "   \\end{array} \n",
    "\\right) \n",
    " \\left( \n",
    "  \\begin{array}{rrr} \n",
    "  \\sigma_I^2 &0 & 0 \\\\ \n",
    "   0 &  \\sigma_I^2 & 0\\\\\n",
    "   0 & 0 & \\sigma_I^2 \n",
    "  \\end{array} \\right) \n",
    "   \\left(  \\begin{array}{r} \n",
    "  1/3 \\\\ 1/3 \\\\ 1/3 \n",
    "   \\end{array} \n",
    "\\right)\\\\\n",
    "& =  & \\left( \n",
    " \\begin{array}{rrr} \n",
    "   \\sigma_I^2/3 &  \\sigma_I^2/3 &  \\sigma_I^2/3 \n",
    "   \\end{array} \n",
    "\\right) \n",
    " \\left(  \\begin{array}{r} \n",
    "  1/3 \\\\ 1/3 \\\\ 1/3 \n",
    "   \\end{array} \n",
    "\\right) =  \\sigma_I^2/9 +  \\sigma_I^2/9 +  \\sigma_I^2/9  =  \\sigma_I^2/3 \\\\\n",
    "\\hat \\sigma_I & =  & \\sigma_I/\\sqrt{3}\n",
    "\\end{aligned}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a038c48f-7143-480d-b0ba-817f0e5211c1",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "### Schätzer \n",
    "\n",
    "Schätzer $\\hat a$ für $a$ aus Stichprobe $x_1,\\dots x_n$\n",
    "\n",
    "Anforderungen:\n",
    "\n",
    "-   erwartungstreu:  \n",
    "    $E[\\hat a]= a$\n",
    "\n",
    "-   konsistent:  \n",
    "    $\\lim_{n\\to \\infty } \\hat a = a$\n",
    "\n",
    "-   effizient: $V[\\hat a]$ möglichst klein\n",
    "\n",
    "Aufgabe: Schätze Mittelwert $\\mu$ und Varianz $\\sigma^2$ einer\n",
    "Wahrscheinlichkeitsdichtefunktion $p(x)$ aus einer Stichprobe\n",
    "$x_1,\\dots,x_n$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d32672c8-da0b-468b-bf73-df40aef7245a",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "### Schätzer für Mittelwert \n",
    "\n",
    "Schätzer für den Mittelwert $\\mu$:\n",
    "$$\\hat \\mu = \\bar x = \\frac{1}{n}\\sum_{i=1}^n x_i$$\n",
    "\n",
    "Tests:\n",
    "\n",
    "-   erwartungstreu:\n",
    "    $$E[\\hat \\mu]= E[ \\frac{1}{n}\\sum_{i=1}^n x_i ] =  \\frac{1}{n} \\sum_{i=1}^n E[x_i] = \\frac{1}{n} \\sum_{i=1}^n \\mu = \\mu$$\n",
    "\n",
    "-   konsistent:\n",
    "    $$\\lim_{n\\to \\infty } \\hat \\mu = \\lim_{n\\to \\infty }  \\frac{1}{n}\\sum_{i=1}^n x_i = \\int x p(x)\\,dx = \\mu$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4021d6cd-150e-4a6d-bdc4-ed049fd983d3",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "### Schätzer für Mittelwert \n",
    "\n",
    "Schätzer für den Mittelwert $\\mu$:\n",
    "$$\\hat \\mu = \\bar x = \\frac{1}{n}\\sum_{i=1}^n x_i$$\n",
    "\n",
    "Test:\n",
    "\n",
    "-   effizient: $V[\\hat a]$ möglichst klein\n",
    "    $$V[\\hat \\mu] = V[ \\frac{1}{n}\\sum_{i=1}^n x_i ] =   \\frac{1}{n^2} V[ \\sum_{i=1}^n x_i ]  =   \\frac{1}{n^2}  \\sum_{i=1}^n\\sum_{j=1}^n Cov(x_i, x_j)$$\n",
    "    $$V[\\hat \\mu] =   \\frac{1}{n^2}  \\sum_{i=1}^n Cov(x_i, x_i)=  \\frac{n\\sigma^2}{n^2} =  \\frac{\\sigma^2}{n}$$\n",
    "    (oder über zentralen Grenzwertsatz)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4dfd3266-a5a2-4db7-a96b-ef21118bc85e",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "### Schätzer für Varianz \n",
    "\n",
    "Schätzer für die Varianz $\\sigma^2$:\n",
    "$$\\hat \\sigma^2  = V[x] = \\frac{1}{n}\\sum_{i=1}^n (x_i  - <x>)^2$$\n",
    "\n",
    "Tests:\n",
    "\n",
    "-   erwartungstreu:\n",
    "    $$E[\\hat {\\sigma^2}]= E[ \\frac{1}{n}\\sum_{i=1}^n (x_i^2 - \\bar x^2)] =  \\frac{1}{n} \\sum_{i=1}^n E[x_i^2-\\bar x^2] = E[x^2] - E[\\bar x^2]$$\n",
    "    $$E[\\hat{\\sigma^2}]=  E[x^2] - E[x]^2 + E[x]^2  - E[\\bar x^2] =  E[x^2] - E[x]^2   - (E[\\bar x^2] - E[\\bar x]^2)$$\n",
    "    $$E[\\hat {\\sigma^2}] =  V(x) - V(\\bar x) = \\sigma^2 - \\frac{\\sigma^2}{n} = \\frac{n-1}{n}\\sigma^2 \\ne  \\sigma^2$$\n",
    "\n",
    "-   konsistent:\n",
    "    $\\lim_{n\\to \\infty } \\hat {\\sigma^2} = \\lim_{n\\to \\infty }   \\frac{n-1}{n}\\sigma^2 =\\sigma^2$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1621af0b-8216-4959-9cc9-8c41cdb8cc79",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "### Stimmt das Modell? \n",
    "\n",
    "Der angewandte Schätzer basiert immer auf Annahmen über die analysierte\n",
    "Stichprobe.  \n",
    "Der Fehler auf den Schätzwert sagt nichts darüber aus, ob die Annahmen\n",
    "stimmen."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8fdd0fc7-af9c-492c-9c7f-5a195ee58c50",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "### Nochmal die Stromstärke \n",
    "\n",
    "Stromstärke aus zwei Messreihen, $\\sigma_I = 0.1$  A:  \n",
    "Reihe 1: $[0.64441099 0.63895571 0.73984042 0.62145699 0.66971489]$  \n",
    "Reihe 2: $[0.93179978 0.46326547 0.41350898 0.12281948 0.61426579]$  \n",
    "Schätzer: $I_1 = 0.66 \\pm 0.04$, $I_2 = 0.51 \\pm \\pm 0.04$  \n",
    "Passen die Daten zur Erwartung?  \n",
    "\n",
    "Residuum: $$R_i = \\frac{x_i - \\mu}{\\sigma}$$ ist normal verteilt, wenn\n",
    "$\\mu$ und $\\sigma$ stimmen.\n",
    "\n",
    "(Betrachte Summe der Residuenquadrate $\\sum R_i^2$)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c34b270-4e04-40be-82b1-f305eadf82a9",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "### $\\chi^2$-Verteilung\n",
    "\n",
    "0.5\n",
    "\n",
    "$$\\chi^2 = \\sum_i \\frac{x_i - \\mu_i}{\\sigma_i}$$ Die $p(\\chi^2, n)$ ist\n",
    "die Wahrscheinlichkeitsdichte für die Summe der Quadrate von $n$\n",
    "normalverteilten Zufallszahlen.  \n",
    "Mittelwert: $<\\chi^2> =  n$ Zahl der Freiheitsgrade.\n",
    "\n",
    "0.5 <embed src=\"./figures/10/chi2.pdf\" />\n",
    "\n",
    "$p$-Wert (Wahrscheinlichkeit für größeres $\\chi^2$):\n",
    "$$p  = 1- \\int_0^{\\chi   2} p(\\chi^2, n)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3dbe4a35-771e-4129-9529-76f3d38fddae",
   "metadata": {
    "jp-MarkdownHeadingCollapsed": true,
    "slideshow": {
     "slide_type": "skip"
    },
    "tags": []
   },
   "source": [
    "## Zusammenfassung und Ausblick\n",
    "\n",
    "Zusammenfassung\n",
    "\n",
    "-   Wahrscheinlichkeitsdichten\n",
    "\n",
    "-   Fehlerrechnung (lineare Näherung)\n",
    "\n",
    "-   Korrelationen nicht vernachlässigen\n",
    "\n",
    "-   Schätzer\n",
    "\n",
    "-   p-Wert\n",
    "\n",
    "-   Literatur:  \n",
    "\n",
    "    -   Glen Cowan, Statistical Data Analysis,\n",
    "        [pdf](https://www.sherrytowers.com/cowan_statistical_data_analysis.pdf)\n",
    "\n",
    "    -   Roger John Barlow, Statistics: A Guide to the Use of Statistical\n",
    "        Methods in the Physical Sciences,\n",
    "        [Skript](https://arxiv.org/pdf/1905.12362.pdf)\n",
    "\n",
    "    -   Volker Blobel, Erich Lohrmann, Statistische und numerische\n",
    "        Methoden der Datenanalyse,\n",
    "        [pdf](https://www.desy.de/~sschmitt/blobel/eBuch.pdf)\n",
    "\n",
    "# Bibliography\n",
    "\n",
    "Bibliography"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.20"
  },
  "livereveal": {
   "autolaunch": true,
   "overlay": "<div class='myfooter'><h2 style='text-align:center;'>Hartmut Stadie - University of Hamburg</h2></div>",
   "scroll": true
  },
  "toc": {
   "base_numbering": 43
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}