change metadata

f2557c32 · Hartmut Stadie · 266f2ea5 · f2557c32
Commit f2557c32 authored 9 months ago by Hartmut Stadie
--- a/lecture_1.ipynb
+++ b/lecture_1.ipynb
@@ -54,7 +54,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "37cae6fd",
+   "id": "f9a9306c",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
@@ -198,7 +198,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "0ba7c3f0",
+   "id": "6dcfe3e6",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
@@ -244,7 +244,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "dcc75692",
+   "id": "fa1054a6",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
@@ -288,7 +288,7 @@
  {
   "cell_type": "code",
   "execution_count": 33,
-   "id": "e30c1224",
+   "id": "765d2de4",
   "metadata": {
    "cell_style": "split"
   },
@@ -395,7 +395,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "3f0e1262",
+   "id": "694119ec",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
@@ -432,7 +432,7 @@
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "2ea96198",
+   "id": "05e7e4b9",
   "metadata": {},
   "outputs": [],
   "source": []
@@ -508,7 +508,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "539b2083",
+   "id": "b776b008",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
@@ -527,7 +527,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "27a53f23",
+   "id": "d39a9f7b",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
@@ -540,7 +540,7 @@
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "ef09b3b5",
+   "id": "2f063c17",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
@@ -552,7 +552,7 @@
  {
   "cell_type": "code",
   "execution_count": 23,
-   "id": "de7df23f",
+   "id": "f09b7bc1",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
@@ -575,7 +575,7 @@
  {
   "cell_type": "code",
   "execution_count": 26,
-   "id": "1edc9156",
+   "id": "2c8e6805",
   "metadata": {},
   "outputs": [
    {
@@ -638,7 +638,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "f087bd39",
+   "id": "8e1a7f44",
   "metadata": {
    "cell_style": "split"
   },
@@ -648,7 +648,7 @@
  },
  {
   "cell_type": "markdown",
-   "id": "3f86b608",
+   "id": "66c8b422",
   "metadata": {
    "cell_style": "split"
   },
@@ -1272,11 +1272,12 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.9.20"
+   "version": "3.10.2"
  },
-  "rise": {
+  "livereveal": {
   "autolaunch": true,
-   "overlay": "<div class='myfooter'><h2 style='text-align:center;'>Hartmut Stadie - University of Hamburg</h2></div>"
+   "overlay": "<div class='myfooter'><h2 style='text-align:center;'>Hartmut Stadie - University of Hamburg</h2></div>",
+   "scroll": true
  },
  "toc": {
   "base_numbering": 1

 %% Cell type:markdown id:87c9d786-6d5c-4366-9701-5fa127727caa tags:

 # Lecture 1

 ---

 ## Basic statistics

 #

 <br>
 <br>

 Hartmut Stadie

 hartmut.stadie@uni-hamburg.de

 %% Cell type:markdown id:9eae9199-9401-40c4-b212-ae57f1ccab38 tags:

 ## Samples

 ---

 Sample: $X = x_1, x_2,\dots, x_N$

 Expectation value:
 $$<f(x)> = E[f(x)] = \frac{1}{N}\sum_i^N f(x_i)$$

 linear operator:
 $$E[a f(x) + b g(x)] = aE[f(x)] + bE[g(x)]$$

 Describing samples: minimum, maximum, frequency/histogram, means, variance, standard deviation,....


-%% Cell type:markdown id:37cae6fd tags:
+%% Cell type:markdown id:f9a9306c tags:

 ### Describing samples

 minimum, maximum, frequency/histogram, means, variance, standard deviation,....

 %% Cell type:code id:b44e356e-b829-4879-b5fc-9706fffe873d tags:

 ``` python
 import numpy as np
 data = np.loadtxt('./exercises/09_data.txt')

 data[0:9]
 ```

 %% Output

    array([[5., 0.],
           [2., 1.],
           [0., 1.],
           [0., 3.],
           [0., 1.],
           [2., 2.],
           [4., 0.],
           [2., 1.],
           [1., 3.]])

 %% Cell type:code id:2aeacb94-518d-464f-a7ab-5282b97bc225 tags:

 ``` python
 data[0:9,0]
 ```

 %% Output

    array([5., 2., 0., 0., 0., 2., 4., 2., 1.])

 %% Cell type:code id:1829fa78-7d7b-4bac-9f24-5129dca63629 tags:

 ``` python
 np.min(data), np.max(data)
 ```

 %% Output

    (np.float64(0.0), np.float64(6.0))

 %% Cell type:markdown id:caa12e94-0b72-4f60-875d-5a306a09d036 tags:

 ### Histograms

 %% Cell type:code id:871f915d-09f8-4d14-a79a-8fbd2f16ab75 tags:

 ``` python
 import matplotlib.pyplot as plt

 plt.hist(data[:, 0])
 #plt.hist(data[:, 0], bins=np.arange(-0.25,6.25,0.5))
 #plt.xlabel("k")
 ```

 %% Output

    (array([74., 96.,  0., 67.,  0., 43., 13.,  0., 10.,  3.]),
     array([0. , 0.6, 1.2, 1.8, 2.4, 3. , 3.6, 4.2, 4.8, 5.4, 6. ]),
     <BarContainer object of 10 artists>)



-%% Cell type:markdown id:0ba7c3f0 tags:
+%% Cell type:markdown id:6dcfe3e6 tags:

 ### Histograms

 %% Cell type:code id:f8263279-48b5-4421-be05-604ddbfd8d6f tags:

 ``` python
 plt.hist(data[:, 0], bins=np.arange(-0.25,6.25,0.5))
 plt.xlabel("k")
 ```

 %% Output

    Text(0.5, 0, 'k')



-%% Cell type:markdown id:dcc75692 tags:
+%% Cell type:markdown id:fa1054a6 tags:

 ### Histograms

 %% Cell type:code id:5056a717-6561-41ec-be39-1757984863a9 tags:

 ``` python

 plt.hist(data[:, 0], bins=np.arange(-0.25,6.26,0.5))
 plt.xlabel("k")
 #plt.savefig("hist.pdf")
 plt.show()
 ```

 %% Output



-%% Cell type:code id:e30c1224 tags:
+%% Cell type:code id:765d2de4 tags:

 ``` python
 plt.hist(data[:, 1], bins=np.arange(-0.25,6.26,0.5))
 plt.xlabel("l")
 plt.show()
 ```

 %% Output



 %% Cell type:markdown id:4183331e-8a9f-4af5-829f-3fcb9b2abb31 tags:

 ### Cumulated Distribution

 %% Cell type:code id:4ecfb198-c621-4ee4-9d24-9f9a8cb8bec7 tags:

 ``` python
 plt.hist(data[:, 0], bins=100, cumulative=True, density = True, label="kumuliert")
 plt.xlabel("k")
 #plt.savefig("hist2.pdf")
 print("median", np.median(data[:, 0]))
 ```

 %% Output

    median 1.0



 %% Cell type:code id:d00cadf3-0cfa-4f59-9962-4b9e6b707e0b tags:

 ``` python
 ```

 %% Cell type:markdown id:ceb5ca27-a96a-4ee3-967b-5f02efe66540 tags:

 ### Means

 ---

 different means:
 -  arithmetric mean: $$ \overline{x} = E[x] = <x> = \frac{1}{n}\sum\limits_{i=1}^n x_i (= \mu)$$
 -  geometric mean: $$ \overline{{x}}_\mathrm {geom} = \sqrt[n]{\prod\limits_{i=1}^{n}{x_i}}$$
 -  quadratic mean: $$ \overline{{x}}_\mathrm{quadr} = \sqrt{E[x^2]} = \sqrt {\frac {1}{n} \sum\limits_{i=1}^{n}x_i^2} = \sqrt{\overline{x^2}} $$


-%% Cell type:markdown id:3f0e1262 tags:
+%% Cell type:markdown id:694119ec tags:

 ### Variance

 -   variance
    $$V = E[(x - \mu)^2]$$
    with $\mu = E[x] = \overline x$:
    $$V = E[(x - {\overline x}^2]$ = E[x^2 -  2x{\overline x} + {\overline x}^2] = E[x^2] -  2{\overline x}E[x] + {\overline x}^2 = E[x^2] - 2 {\overline x}^2 + {\overline x}^2 =  E[x^2] - (E[x])^2$$
    for sample $X = x_1, x_2,\dots, x_N$;
    $$V= \frac{1}{n}\sum\limits_{i=1}^n (x_i - \overline x)^2$$

 -   standard deviation:
    $$\sigma = \sqrt{V}$$


 %% Cell type:markdown id:68cea392-8d4f-48d7-aacc-0f10d46af3af tags:

 ### Exercise: Compute mean and variance of $X$

-%% Cell type:code id:2ea96198 tags:
+%% Cell type:code id:05e7e4b9 tags:

 ``` python
 ```

 %% Cell type:code id:76d92d18-1d77-40d2-a910-592183635d3b tags:

 ``` python
 print("mean", np.mean(data, axis=0))
 ```

 %% Output

    mean [1.56535948 1.26470588]

 %% Cell type:code id:5f0f34b4-5bbe-439b-8b4e-fbb05794b790 tags:

 ``` python
 print("variance", np.var(data, axis=0))
 ```

 %% Output

    variance [1.85357128 1.27306805]

 %% Cell type:code id:70a1920f-beda-4154-ad77-22a9ffe2e39f tags:

 ``` python
 print("standard deviation:", np.std(data, axis=0))
 ```

 %% Output

    standard deviation: [1.36145925 1.12830317]

-%% Cell type:markdown id:539b2083 tags:
+%% Cell type:markdown id:b776b008 tags:

 ### Covariance

 - covariance: $$\text{cov}(X,Y) = E[(X-E[X])(Y-E[Y])] = E[XY] - E[X]E[Y]$$

 for samples $X = x_1, x_2,\dots, x_N$ and $Y = y_1, y_2,\dots, y_N$:
 $$\text{cov}(X,Y) = \frac{1}{n}\sum\limits_{i=1}^n (x_i - \overline x)(y_i - \overline y)$$

 - correlation $$\rho_{xy} = \frac{\text{cov}(X,Y)}{\sqrt{\text{cov}(X,X)\text{cov}(Y,Y)}}$$

-%% Cell type:markdown id:27a53f23 tags:
+%% Cell type:markdown id:d39a9f7b tags:

 ### Exercise: compute covariance and correlation column 1 and 2

-%% Cell type:code id:ef09b3b5 tags:
+%% Cell type:code id:2f063c17 tags:

 ``` python
 ```

-%% Cell type:code id:de7df23f tags:
+%% Cell type:code id:f09b7bc1 tags:

 ``` python
 print(np.cov(data, rowvar=False))
 ```

 %% Output

    [[ 1.85964856 -0.1927676 ]
     [-0.1927676   1.27724204]]

-%% Cell type:code id:1edc9156 tags:
+%% Cell type:code id:2c8e6805 tags:

 ``` python
 print(np.corrcoef(data, rowvar=False))
 ```

 %% Output

    [[ 1.         -0.12507831]
     [-0.12507831  1.        ]]

 %% Cell type:markdown id:b8ab42a5-b7af-4942-9a7a-91b73e0ce625 tags:

 # Probability Density Functions


 Sie $x$ eine reelle Zahl, die das Ergebnis eines Zufallsexperiments
 beschreibt:

 Wahrscheinlichkeitsdichte $f(x)$: (probability density function (pdf))

 -   Wahrscheinlichkeit, dass x im Intervall $[x, x + dx]$ liegt:
    $f(x)\,dx$

 -   Normierung: $$\int_S  f(x)\,dx = 1$$

 Kumulierte Dichte $F(x)$: (cumulative density function (cdf);
 Mathematik: Verteilungsfunktion)
 Wahrscheinlichkeit, dass x kleiner x ist:
 $$F(x) = \int_{-\infty}^x  f(x^\prime)\,dx^\prime$$

 %% Cell type:markdown id:13563524-0626-4a69-a6ab-f87d7f19a016 tags:

 ### Example

 $$P(a \le x \le b) =  \int_a^b  f(x)\,dx = F(b) - F(a)$$


-%% Cell type:markdown id:f087bd39 tags:
+%% Cell type:markdown id:8e1a7f44 tags:

 <img src="./figures/08/Lognormal_distribution_PDF.svg" style="width:100%" />

-%% Cell type:markdown id:3f86b608 tags:
+%% Cell type:markdown id:66c8b422 tags:

 <img src="./figures/08/CDF-log_normal_distributions.svg" style="width:200%" />

 %% Cell type:markdown id:14e4b782-9573-4e87-89d0-901dade25538 tags:

 ### Quantiles


 Quantile $x_\alpha$ is the value of the random variable $x$ with
 $$F(x_\alpha) = \int_{-\infty}^{x_\alpha} f(x)\,dx = \alpha$$
 Hence: $$x_\alpha = F^{-1}(\alpha)$$

 Median: $x_{\frac{1}{2}}$
 $$F(x_{\frac{1}{2}}) = 0.5$$ $$x_{\frac{1}{2}} = F^{-1}(0.5)$$


 <img src="./figures/09/Normalverteilung.png" width=38% alt="image" />


 %% Cell type:markdown id:8e74c511-b730-4781-a751-e1895983cc4c tags:

 ### Mehrdimensionale Wahrscheinlichkeitsdichten

 Beispiel: Es werden zwei Größen auf einmal gemessen mit Zufallsvektor:
 $x,y$.
 Ereignis A: $x$ innerhalb $[x, x + dx]$, y beliebig
 Ereignis B: $y$ innerhalb $[y, y + dy]$, x beliebig
 $$P(A \cap B) = \text{W. für $x$ in $[x, x + dx]$ und $y$ in $[y, y + dy]$} = f(x, y)\,dxdy$$


 Randverteilung: $$f_x(x) =  \int_{-\infty}^\infty f(x,y)\,dy$$
 $$f_y(y) =  \int_{-\infty}^\infty f(x,y)\,dx$$

 <img src="./figures/09/W_top.png" alt="image" />

 %% Cell type:markdown id:7624d391-9fd5-4567-bf84-632eead1bb20 tags:

 ### Randverteilungen

 <img src="./figures/09/W.png" style="width:47.0%"
 alt="image" />
 <img src="./figures/09/top.png" style="width:47.0%"
 alt="image" />


 Bedingte Verteilung: $$g(x|y) = \frac{f(x,y)}{f_y(y)}$$
 $$h(y|x) = \frac{f(x,y)}{f_x(x)}$$

 <img src="./figures/09/top_cond.png" style="width:96.0%"
 alt="image" />

 %% Cell type:markdown id:ba8fae9b-636b-4b7e-a36d-3be582c000f5 tags:

 ### Satz von Bayes

 $g(x|y) = \frac{f(x,y)}{f_y(y)}$ und
 $h(y|x) = \frac{f(x,y)}{f_x(x)}$

 Satz: $$g(x|y) = \frac{h(y|x) f_x(x)}{f_y(y)}$$

 Mit $f(x,y) = h(y|x) f_x(x) = g(x|y) f_y(y)$:
 $$f_x(x) =  \int_{-\infty}^\infty g(x|y) f_y(y)\,dy$$
 $$f_y(y) =  \int_{-\infty}^\infty h(y|x) f_x(x)\,dy$$

 <img src="./figures/08/bayes.gif" style="width:80.0%" />
 höchstwahrscheinlich nicht Bayes

 %% Cell type:markdown id:fdd11743-1e7b-48b6-9e9b-41bee7d4f23b tags:

 ## Funktionen von Zufallsvariablen

 ### Funktionen von Zufallsvariablen

 Sei $x$ eine Zufallsvariable, $f(x)$ ihre Wahrscheinlichkeitsdichte und
 $a(x)$ eine stetige Funktion:

 Was ist die Wahrscheinlichkeitsdichte $g(a)$? gleiche Wahrscheinlichkeit
 für $x$ in $[x, x+dx]$ und $a$ in $[a, a+da]$:
 $$g(a) da = \int_{dS} f(x)\,dx$$ Wenn die Umkehrfunktion $x(a)$
 existiert:
 $$g(a) da = \left| \int_{x(a)}^{x(a +da)} f(x^\prime)\,dx^\prime \right| = \int_{x(a)}^{x(a) + |\frac{dx}{da}|da} f(x^\prime)\,dx^\prime$$
 oder $$g(a) = f(x(a)) \left|\frac{dx}{da}\right|$$

 %% Cell type:markdown id:525847ad-2ddf-468c-883f-35bc853bac8c tags:

 ### Beispiel:

 Beispiel 1: $a(x) = \sqrt{x}$, $x(a) = a^2$ Für $x$ gleichverteilt
 zwischen 0 und 1, also $u(x) = 1$, ist die Wahrscheinlichkeitsdichte
 $g(a)$:
 $$g(a) =  u(x(a)) \left|\frac{dx}{da}\right| = 1 \cdot   \left|\frac{da^2}{da}\right| = 2a \text{ (linear verteilt)}$$

 Beispiel 1: $a(x) = F^{-1}(x)$, $x(a) = F(a)$ Für $x$ gleich verteilt
 zwischen 0 und 1, also $u(x) = 1$, ist die Wahrscheinlichkeitsdichte
 $g(a)$:
 $$g(a) =  u(x(a)) \left|\frac{dx}{da}\right| = 1 \cdot   \left|\frac{dF(a)}{da}\right| = f(a) \text{ (qed).}$$

 %% Cell type:markdown id:0c669bab-6c81-45fb-a506-8ef709cd4687 tags:

 ### Funktionen von Zufallsvektoren

 Sei $\vec x$ ein Zufallsvektor, $f(\vec x)$ seine
 Wahrscheinlichkeitsdichte und $\vec a(\vec x)$ eine stetige Funktion:

 Was ist die Wahrscheinlichkeitsdichte $g(\vec a)$?
 $$g(\vec a) = f(\vec x) \left| J \right| \text{mit } J =
 \begin{array}{rrrr}
 \frac{\partial x_1}{\partial a_1} &   \frac{\partial x_1}{\partial a_2}  & \dots  & \frac{\partial x_1}{\partial a_m} \\[6pt]
 \frac{\partial x_2}{\partial a_1} &   \frac{\partial x_2}{\partial a_2}  & \dots &  \frac{\partial x_2}{\partial a_m} \\[6pt]
 \vdots                & \vdots & \ddots & \vdots \\[6pt]
 \frac{\partial x_n}{\partial a_1} &    \frac{\partial x_n}{\partial a_2}  &  \dots &  \frac{\partial x_n}{\partial a_m} \\[6pt]
 \end{array}$$

 %% Cell type:markdown id:4b7f6667-c019-4b23-83fd-a1758d748762 tags:

 ## Grundbegriffe

 ### Grundbegriffe

 Diskrete Zufallsvariable Mittelwert:
 $$<r> =  \bar r = \sum _{i=1}^N r_i P(r_i)$$

 Kontinuierliche Zufallsvariable Wahrscheinlichkeitsdichte $f(x)$ mit

 -   $P(a \leq x \leq b) = \int_a^b f(x)\,dx$

 -   $f(x) \geq 0$

 -   $\int_{-\infty}^{\infty} f(x)\,dx = 1$

 Mittelwert:
 $$<x> = \bar x = \int_{-\infty}^{\infty} x \, f(x)\,dx = \mu_x$$

 %% Cell type:markdown id:2907584f-bb17-463e-b084-749f2011bd4c tags:

 ## Erwartungswerte und Momente

 ### Erwartungswert

 Definition Erwartungswert der Funktion $h(x)$ f"ur die
 Wahrscheinlichkeitsdichte $f(x)$:
 $$E[h] = \int_{-\infty}^{\infty} h(x) \, f(x)\,dx$$

 Spezialfall $h(x) = x$
 $$E[x] = \int_{-\infty}^{\infty} x \, f(x)\,dx = <x>$$

 Erwartungswert ist ein linearer Operator
 $$E[a\cdot g(x) + b \cdot h(x)] = a\cdot E[g(x)] + b\cdot E[h(x)]$$

 %% Cell type:markdown id:5cd273e7-3129-454b-a31c-4e91c4316bf4 tags:

 ### Varianz und Standardabweichung

 Varianz $V[x]$

 -   ein Maß für die Breite einer Wahrscheinlichkeitsdichte

 -   zweites zentrales Moment

 -   Definition
    $$V[x] =  E[(x - \mu_x)^2] = \int_{-\infty}^{\infty} (x-\mu_x)^2 \, f(x)\,dx$$

 -   n"utzliche Formeln:
    $V[x] = E[x^2] - <x>^2$ und
    $V[ax] = a^2 V[x]$

 %% Cell type:markdown id:30d01c8b-50a5-4d90-8e0b-89a68f52f9bd tags:

 ### Varianz und Standardabweichung

 Standardabweichung $\sigma$

 -   ein Maß für die Größe der statistischen Schwankungen der
    Zufallsvariablen um den Mittelwert

 -   in der Physik oft “der Fehler”

 -   Definition $$\sigma = \sqrt{V[x]}$$

 %% Cell type:markdown id:ae67472e-f9b2-46fa-9679-15553d31aaaf tags:

 ### Kovarianz

 Kovarianz $V_{xy}$ für zwei Zufallsvariablen $x$ und $y$:
 $$V_{xy} =  E[(x - \mu_x)(y - \mu_y)] = E[xy] - \mu_x \mu_y$$
 $$V_{xy} = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} xy\, f(x, y)\,dx \,dy - \mu_x\mu_y$$

 Kovarianz $V_{ab} = \text{cov}[a, b]$ seien $a$ und $b$ Funktionen des
 Zufallsvektors $\vec x$:
 $$\text{cov}[a, b] =  E[(a - \mu_a)(b - \mu_b)] = E[ab] - \mu_a \mu_b$$
 $$\text{cov}[a, b]  = \int_{-\infty}^{\infty} \dots \int_{-\infty}^{\infty} a(x) b(x)\, f(\vec x)\,dx_1 \dots \,dx_n - \mu_a\mu_b$$

 %% Cell type:markdown id:576b4570-8a06-4224-b867-31459483e6bb tags:

 ### Kovarianzmatrix

 $$C = \left(
  \begin{array}{rr}
  V_{xx} & V_{xy} \\
  V_{yx} & V_{yy}\\
  \end{array}
  \right)$$

 Anmerkungen:

 -   auch Fehlermatrix genannt

 -   $V_{xy} = V_{yx}$, Matrix symmetrisch

 -   $V_{ii} > 0$ Matrix positiv (semi)definit

 -   Korrelationsmatrix: $$C^\prime = \left(
      \begin{array}{rr}
      V_{xx}/V_{xx} & V_{xy}/\sqrt{V_{xx}V_{yy}} \\
      V_{xy}/\sqrt{V_{xx}V_{yy}} & V_{yy}/V_{yy}\\
      \end{array}
      \right) = \left(
      \begin{array}{rr}
      1 & \rho_{xy} \\
      \rho_{xy} & 1\\
      \end{array}
      \right)$$

 -   Korrelationskoeffizient:
    $\rho_{xy} = \frac{V_{xy}}{\sqrt{V_{xx}V_{yy}}}$

 %% Cell type:markdown id:ba08bfd1-4c17-4a1d-bdf6-9c517edffdd8 tags:

 # Wahrscheinlichkeitsdichten

 ## Diskrete Verteilungen

 ### Binomialverteilung

 Binomialverteilung Ist $p$ die Wahrscheinlichkeit f"ur das Auftreten
 eines Ereignisses, so ist die Wahrscheinlichkeit, dass es bei $n$
 Versuchen $k$-mal auftritt, gegeben durch die Binomialverteilung:
 $$P(k) = {n \choose k} p^k(1-p)^{n-k} \text{,  } k = 0,1,2...n$$

 Erwartungswert und Varianz
 $$<k> = E[k] = \sum \limits_{k = 0}^{n} k P(k) = np$$
 $$V[k] = \sigma^2 = np(1-p)$$

 %% Cell type:markdown id:c1dbdd5b-c1f4-40b3-bad6-d6639a9a635c tags:

 ### Beispiel

 Werfen von fünf Münzen $n = 5$, $p = 0.5$

 | k    |  0   |  1   |   2   |   3   |  4   |  5   |
 |:-----|:----:|:----:|:-----:|:-----:|:----:|:----:|
 | P(k) | 1/32 | 5/32 | 10/32 | 10/32 | 5/32 | 1/32 |

 <img src="./figures/08/binom5.pdf" style="width:75.0%" />

 ### Beispiel II

 Fehler in der Effizienzbestimmung eines Selektionsschittes Es soll die
 Effizienz eines Selektionschnittes und ihr Fehler bestimmt werden, wenn
 in einer Stichprobe von $n$ Datenpunkten $k$ Punkte diesen Schnitt
 überleben.
 Die Zufallsvariable ist die gefundene Effizienz $h_k = \frac{k}{n}$.
 Wie groß ist der Fehler?
 Die Zahlen $k$ folgen einer Binomialverteilung mit der
 Wahrscheinlichkeit $p_k = E[h_k] = E[\frac{k}{n}]$: $$\begin{aligned}
      \sigma(h_k) &= &\sqrt{V[\frac{k}{n}]} = \sqrt{\frac{1}{n^2} V[k]} = \sqrt{\frac{1}{n^2}\cdot np_k(1-p_k)}\\
      &=& \sqrt{\frac{p_k(1-p_k)}{n}}\\
 \end{aligned}$$

 %% Cell type:markdown id:60849c4a-dad2-4acd-9780-f9560da2ed9b tags:

 ### Poisson-Verteilung

 Poisson-Verteilung Die Possionverteilung gibt die Wahrscheinlichkeit an,
 genau $k$ Ereignisse zu erhalten, wenn die Zahl der Versuche $n$ sehr
 groß und die Wahrscheinlichkeit $p$ sehr klein ist. Mit $\mu = np$
 $$P(k) = \frac{\mu^ke^{-\mu}}{k!}$$

 Erwartungswert und Varianz
 $$E[k] = \sum \limits_{k = 1}^{\infty} k \frac{e^{-\mu}\mu^k}{k!}
      = \mu \sum \limits_{k = 1}^{\infty} k \frac{e^{-\mu}\mu^{k-1}}{(k-1)! k}
      = \mu \sum \limits_{s = 0}^{\infty} \frac{e^{-\mu}\mu^{s}}{s!} = \mu$$
 $$V[k] = \sigma^2 = \mu$$

 ### Poisson- und Binomialverteilung

 Binomialverteilung mit $n= 1000$ und $p = 0.01$
 Poisson-Verteilung mit $\mu = 10$(schraffiert)

 <img src="./figures/08//bp.jpg" style="width:85.0%"
 alt="image" />

 %% Cell type:markdown id:7e80723a-ac12-4af1-a43a-70593ef791b8 tags:

 ### Beispiel aus vielen alten Statistikbüchern

 Tod durch Pferdetritte in der preußischen Armee

 In der preußischen Armee wurde f"ur jedes Jahr und jedes Armeekorps die
 Anzahl der Todesfälle durch Huftritte registriert. Für 20 Jahre
 (1875–1894) und 14 Armeekorps ergibt sich:

 | Anzahl des Todesf"alle $k$                |   0 |   1 |   2 |   3 |   4 |   5 |   6 |
 |:------------------------------------------|----:|----:|----:|----:|----:|----:|----:|
 | Zahl der Korps-Jahre mit $k$ Todesf"allen | 144 |  91 |  32 |  11 |   2 |   0 |   0 |

 <img src="./figures/08/poisson70.png" style="width:55.0%" />

 Poisson-Verteilung f"ur $\mu = \frac{196}{280} = 0.70$

 %% Cell type:markdown id:b9772cfd-74fb-4a8b-9c0c-fd2c3554a986 tags:

 # What is meant with error/uncertainty on a measured quantity?

 %% Cell type:markdown id:42e65c7a-4636-4319-b21a-acc95140c2de tags:

 If we quote $a = 1 \pm 0.5$, we usually mean that the probability for the *true* value of $a$ is Gaussian $G(a, \mu, \sigma)$ distributed with $\mu = 1$ and $\sigma = 0.5$.

 %% Cell type:markdown id:4b8a5b72-aa82-4acf-9499-8736ed6246f8 tags:

 # How often can/should the measurement be outside one sigma?

 %% Cell type:markdown id:1f03bf12-f17b-409d-8932-4e3e24023445 tags:

 Let's use pseudo-experiments/Monte Carlo:

 * generate 10.000 Gaussian distributed measurements
 * count how ofter they differ by more than one sigma

 %% Cell type:markdown id:40541a16-abc8-4f5b-b504-71951aa891f5 tags:

 Relatively easy with *scipy* and *numpy*:
 * use [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html)
 * use [scipy.stats.norm](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html) class

 %% Cell type:code id:5e55929e-7028-4ae2-9e05-29812e933733 tags:

 ``` python
 import scipy.stats as stats
 import numpy as np

 pseudo_a = stats.norm.rvs(1, 0.5, 10000)
 print(pseudo_a)
 is_outside = abs(pseudo_a - 1) > 0.5
 print(is_outside)
 print("fraction outside one sigma:", sum(is_outside)/len(pseudo_a))
 ```

 %% Output

    [0.62483765 1.14886134 0.26443731 ... 1.71203625 1.6365167  1.52227868]
    [False False  True ...  True  True  True]
    fraction outside one sigma: 0.3178

 %% Cell type:markdown id:8abb14b4-80fd-494a-89a0-310bceb277dc tags:

 # Why is it a Gaussian

 %% Cell type:markdown id:85185cef-6b18-4c03-8040-437f1fd40b9e tags:

 Central limit theorem:

 "let $X_{1},X_{2},\dots ,X_{n}$ denote a statistical sample of size $n$  from a population with expected value (average) $\mu$ and finite positive variance $\sigma ^{2}$, and let $\bar {X_{n}}$ denote the sample mean (which is itself a random variable). Then the limit as $n\to \infty$ of the distribution of $\frac {({\bar {X}}_{n}-\mu )}{\frac {\sigma }{\sqrt {n}}}$, is a normal distribution with mean 0  and variance 1."

 %% Cell type:code id:ac354d95-cede-4215-8138-b0d7c6ae9a5e tags:

 ``` python
 ```