diff --git a/lecture_1.ipynb b/lecture_1.ipynb index 46c180dd727f5bf2107fd8f7b5436cd1706a7b1f..25b8417bfbe8900a7cb8ecb52acfa510fde758da 100644 --- a/lecture_1.ipynb +++ b/lecture_1.ipynb @@ -28,7 +28,7 @@ }, { "cell_type": "markdown", - "id": "48dfeb27", + "id": "a3347273", "metadata": { "slideshow": { "slide_type": "slide" @@ -63,7 +63,7 @@ }, { "cell_type": "markdown", - "id": "15bc8aec", + "id": "a7de85eb", "metadata": { "slideshow": { "slide_type": "slide" @@ -75,7 +75,7 @@ }, { "cell_type": "markdown", - "id": "3679ec79", + "id": "67a031e2", "metadata": { "cell_style": "split" }, @@ -91,13 +91,13 @@ "$P(A \\cap B) = P(A) P(B)$\n", "\n", "If $A$ and $B$ are independent:\n", - "$P(A|B) = P(A)$ und $P(B|A) = P(B)$ \n", + "$P(A|B) = P(A)$ and $P(B|A) = P(B)$ \n", "\n" ] }, { "cell_type": "markdown", - "id": "d9d3eca5", + "id": "eb310ab8", "metadata": { "cell_style": "split" }, @@ -108,7 +108,7 @@ }, { "cell_type": "markdown", - "id": "316055dc", + "id": "8a49042a", "metadata": { "slideshow": { "slide_type": "slide" @@ -120,7 +120,7 @@ }, { "cell_type": "markdown", - "id": "dab653bc", + "id": "a6fd8465", "metadata": { "cell_style": "split" }, @@ -133,7 +133,7 @@ }, { "cell_type": "markdown", - "id": "3014a200", + "id": "72effe84", "metadata": { "cell_style": "split" }, @@ -144,7 +144,7 @@ }, { "cell_type": "markdown", - "id": "bf095444", + "id": "deb1f10f", "metadata": { "slideshow": { "slide_type": "slide" @@ -169,7 +169,7 @@ }, { "cell_type": "markdown", - "id": "41a16773", + "id": "c9c11bcb", "metadata": { "slideshow": { "slide_type": "slide" @@ -183,7 +183,7 @@ }, { "cell_type": "markdown", - "id": "0b5c2af0", + "id": "5104a1d9", "metadata": { "cell_style": "center", "slideshow": { @@ -193,14 +193,14 @@ "source": [ "**Objective interpretation**:\n", "\n", - "Probabaility as a relative frequency: \n", - "$$P(A) = \\lim_{n\\to\\infty} \\dfrac{\\text{number of occurences of outcome $A$ in $n$ measurments}}{n}$$\n", + "Probability as a relative frequency: \n", + "$$P(A) = \\lim_{n\\to\\infty} \\dfrac{\\text{number of occurrences of outcome $A$ in $n$ measurements}}{n}$$\n", "(*frequentist)*" ] }, { "cell_type": "markdown", - "id": "174a2f0d", + "id": "389d7d78", "metadata": { "cell_style": "center", "slideshow": { @@ -210,7 +210,7 @@ "source": [ "**Subjective interpretation**:\n", "\n", - "$$P(A) = \\text{degree of belief that hyptheses $A$ is true}$$ \n", + "$$P(A) = \\text{degree of belief that hypotheses $A$ is true}$$ \n", "Typical example: \n", "$$P(\\text{theory}|\\text{data}) \\propto P(\\text{data}|\\text{theory}) P(\\text{theory})$$\n", "(*Bayesian*)" @@ -233,7 +233,7 @@ "\n", "probability density function (p.d.f.) $f(x)$:\n", "\n", - "- probabiity to observe $x$ in the interval $[x, x + dx]$:\n", + "- probability to observe $x$ in the interval $[x, x + dx]$:\n", " $f(x)\\,dx$ \n", "\n", "- normalization $$\\int_S f(x)\\,dx = 1$$\n", @@ -263,7 +263,7 @@ }, { "cell_type": "markdown", - "id": "ccc9fb4b", + "id": "99c06502", "metadata": { "slideshow": { "slide_type": "slide" @@ -275,7 +275,7 @@ }, { "cell_type": "markdown", - "id": "db26cebe", + "id": "165ca1d8", "metadata": { "cell_style": "split" }, @@ -285,7 +285,7 @@ }, { "cell_type": "markdown", - "id": "0859ee3d", + "id": "7b97cfc3", "metadata": { "cell_style": "split" }, @@ -295,7 +295,7 @@ }, { "cell_type": "markdown", - "id": "ad619992", + "id": "8eb9fa5b", "metadata": { "slideshow": { "slide_type": "slide" @@ -308,7 +308,7 @@ }, { "cell_type": "markdown", - "id": "de012fe6", + "id": "8e7b412c", "metadata": { "cell_style": "split", "slideshow": { @@ -341,7 +341,7 @@ }, { "cell_type": "markdown", - "id": "35ed307f", + "id": "1258f494", "metadata": { "slideshow": { "slide_type": "slide" @@ -354,7 +354,7 @@ }, { "cell_type": "markdown", - "id": "b430d1ef", + "id": "783ab1f3", "metadata": { "cell_style": "split", "slideshow": { @@ -390,7 +390,7 @@ }, { "cell_type": "markdown", - "id": "a1125793", + "id": "8f2cf729", "metadata": { "slideshow": { "slide_type": "slide" @@ -403,7 +403,7 @@ }, { "cell_type": "markdown", - "id": "4492a30e", + "id": "438662b0", "metadata": { "cell_style": "split", "slideshow": { @@ -418,7 +418,7 @@ }, { "cell_type": "markdown", - "id": "1d383725", + "id": "55461265", "metadata": { "cell_style": "split", "slideshow": { @@ -433,7 +433,7 @@ }, { "cell_type": "markdown", - "id": "67fbe91e", + "id": "c88542a8", "metadata": { "slideshow": { "slide_type": "slide" @@ -446,7 +446,7 @@ }, { "cell_type": "markdown", - "id": "b14e6085", + "id": "ec5c20d4", "metadata": { "cell_style": "split", "slideshow": { @@ -476,7 +476,7 @@ }, { "cell_type": "markdown", - "id": "b4722262", + "id": "2e0275c5", "metadata": { "slideshow": { "slide_type": "slide" @@ -489,7 +489,7 @@ }, { "cell_type": "markdown", - "id": "ae64bfa0", + "id": "76ce2f83", "metadata": { "cell_style": "split", "slideshow": { @@ -535,7 +535,7 @@ "source": [ "## Functions of random variables\n", "\n", - "### functions of randam variables\n", + "### functions of random variables\n", "\n", "Let $x$ be a random variable, $f(x)$ its p.d.f. and\n", "$a(x)$ a continuous function: \n", @@ -589,7 +589,7 @@ "source": [ "### Functions of vectors of random variables\n", "\n", - "Let $\\vec x$ be vector of random variables, $f(\\vec x)$ the p.d.f. and $\\vec a(\\vec x)$ a continous function: \n", + "Let $\\vec x$ be vector of random variables, $f(\\vec x)$ the p.d.f. and $\\vec a(\\vec x)$ a continuous function: \n", "\n", "What is the p.d.f. $g(\\vec a)$?\n", "$$g(\\vec a) = f(\\vec x) \\left| J \\right| \\text{, where $\\left| J \\right|$ is the absolute value of Jacobian determinant of } J = \n", @@ -611,7 +611,7 @@ "tags": [] }, "source": [ - "### Expection value and moments\n", + "### Expectation value and moments\n", "\n", "- **Definition:**\n", "expectation value of the function $h(x)$ for a p.d.f. $f(x)$:\n", @@ -623,7 +623,7 @@ "$E[x]$ is called the population mean or just mean, $\\bar x$ or $\\mu$.\n", "\n", "\n", - "- Expectation value is a linear operatur:\n", + "- Expectation value is a linear operator:\n", "$$E[a\\cdot g(x) + b \\cdot h(x)] = a\\cdot E[g(x)] + b\\cdot E[h(x)]$$\n", "\n", "- $n$th moment:\n", @@ -724,7 +724,7 @@ "source": [ "### Covariance \n", "\n", - "- covariane $V_{xy}$ for two random variables $x$ and $y$ with p.d.f. $f(x,y)$:\n", + "- covariance $V_{xy}$ for two random variables $x$ and $y$ with p.d.f. $f(x,y)$:\n", "$$V_{xy} = E[(x - \\mu_x)(y - \\mu_y)] = E[xy] - \\mu_x \\mu_y$$\n", "$$V_{xy} = \\int_{-\\infty}^{\\infty} \\int_{-\\infty}^{\\infty} xy\\, f(x, y)\\,dx \\,dy - \\mu_x\\mu_y$$\n", "\n", @@ -735,7 +735,7 @@ }, { "cell_type": "markdown", - "id": "a93e852b", + "id": "e9c61f18", "metadata": { "slideshow": { "slide_type": "skip" @@ -800,7 +800,65 @@ }, { "cell_type": "markdown", - "id": "2e027b6b", + "id": "c3b63f0e", + "metadata": {}, + "source": [ + "### Error propagation\n", + "\n", + "Suppose we have a random vector $\\vec x$ distributed according to joint p.d.f. $f(\\vec x)$ with mean values $\\vec \\mu$ and covariance matrix $V$:\n", + "\n", + "What is the variance of the function $y(\\vec x)$?\n", + "\n", + "Expand $y$ around $x = \\vec \\mu$:\n", + "$$y(x) \\approx y(\\vec \\mu) + \\sum_{i = 1}^{N} \\frac{\\partial y}{\\partial x_i}\\big|_{\\vec \\mu}(x_i-\\mu_i)$$ \n", + "\n", + "Expectation value of $y$:\n", + "$$E[y] \\approx y(\\vec \\mu)$$\n", + "\n", + "Expectation value of $y^2$:\n", + "$$E[y^2] \\approx E[(y(\\vec \\mu) + \\sum_{i = 1}^{N} \\frac{\\partial y}{\\partial x_i}\\big|_{\\vec \\mu}(x_i-\\mu_i))(y(\\vec \\mu) + \\sum_{j = 1}^{N} \\frac{\\partial y}{\\partial x_j}\\big|_{\\vec \\mu}(x_i-\\mu_j))] = y^2(\\vec \\mu) + \\sum_{i = 1}^{N} \\sum_{j = 1}^{N} \\frac{\\partial y}{\\partial x_i}\\big|_{\\vec \\mu} \\frac{\\partial y}{\\partial x_j}\\big|_{\\vec \\mu}E[(x_i-\\mu_i)(x_j- \\mu_j)]$$\n", + "$$E[y^2] = y^2(\\vec \\mu) + \\sum_{i = 1}^{N} \\sum_{j = 1}^{N} \\frac{\\partial y}{\\partial x_i}\\big|_{\\vec \\mu} \\frac{\\partial y}{\\partial x_j}\\big|_{\\vec \\mu} V_{ij}$$\n", + "\n", + "variance of $y$:\n", + "$$\\sigma^2_y = E[y^2] - E[y]^2 \\approx \\sum_{i = 1}^{N} \\sum_{j = 1}^{N} \\frac{\\partial y}{\\partial x_i}\\big|_{\\vec \\mu} \\frac{\\partial y}{\\partial x_j}\\big|_{\\vec \\mu} V_{ij} $$\n" + ] + }, + { + "cell_type": "markdown", + "id": "38fedec1", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### Error propagation in more dimensions\n", + "\n", + "Now assume a vector function $\\vec y(\\vec x)= y_1(\\vec x),\\dots,y_M(\\vec x))$:\n", + "\n", + "Covariance $U_{kl}$ for $y_k$ and $y_l$:\n", + "$$U_{kl} = \\text{cov}[y_k, y_l] = \\sum_{i = 1}^{N} \\sum_{j = 1}^{N} \\frac{\\partial y_k}{\\partial x_i}\\big|_{\\vec \\mu} \\frac{\\partial y_l}{\\partial x_j}\\big|_{\\vec \\mu} V_{ij}$$ \n", + "\n", + "\n", + "With matrix of derivatives $A$ with $A_{ij} = \\frac{\\partial y_i}{\\partial x_j}\\big|_{\\vec \\mu} $):\n", + "$$ U = A V A^{T}$$\n", + "\n", + "Example: $y = x_1 + x_2$ and, hence, $A = (1, 1)$\n", + "$$U = \\left(\\begin{array}{rr}1 & 1\\\\ \\end{array}\\right)\n", + "\\left(\n", + "\\begin{array}{rr}\\sigma_1^2 & V_{12} \\\\ V_{12} & \\sigma_2^2\\\\ \\end{array}\n", + "\\right)\n", + "\\left(\\begin{array}{r}1 \\\\ 1\\\\ \\end{array}\\right) =\n", + "\\left(\\begin{array}{rr}\\sigma_1^2 + V_{12} & V_{12}+ \\sigma_2^2\\\\ \\end{array}\n", + "\\right) \\left(\\begin{array}{r}1 \\\\ 1\\\\ \\end{array}\\right) = \\sigma_1^2 + \\sigma_2^2 + 2V_{12}$$\n", + "\n", + "Example: $y = x_1 x_2$ and, hence, $A = (x_2, x_1)$\n", + "$$\\frac{\\sigma^2_y}{y^2} = \\frac{\\sigma^2_1}{x_1^2} + \\frac{\\sigma^2_2}{x_2^2} + 2 \\frac{V_{12}}{x_1 x_2}$$" + ] + }, + { + "cell_type": "markdown", + "id": "9f09c066", "metadata": { "slideshow": { "slide_type": "slide" @@ -840,7 +898,7 @@ }, { "cell_type": "markdown", - "id": "130d2811", + "id": "a0bccc47", "metadata": { "slideshow": { "slide_type": "slide" @@ -849,12 +907,15 @@ "source": [ "### Describing samples\n", "\n", - "minimum, maximum, frequency/histogram, means, variance, standard deviation,....\n" + "minimum, maximum, frequency/histogram, means, variance, standard deviation,....\n", + "\n", + "\n", + "Here: home and away goals in Bundesliga matches" ] }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 61, "id": "b44e356e-b829-4879-b5fc-9706fffe873d", "metadata": {}, "outputs": [ @@ -872,7 +933,7 @@ " [1., 3.]])" ] }, - "execution_count": 29, + "execution_count": 61, "metadata": {}, "output_type": "execute_result" } @@ -886,7 +947,7 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 62, "id": "2aeacb94-518d-464f-a7ab-5282b97bc225", "metadata": {}, "outputs": [ @@ -896,7 +957,7 @@ "array([5., 2., 0., 0., 0., 2., 4., 2., 1.])" ] }, - "execution_count": 30, + "execution_count": 62, "metadata": {}, "output_type": "execute_result" } @@ -984,7 +1045,7 @@ }, { "cell_type": "markdown", - "id": "ac4a0d55", + "id": "e9327372", "metadata": { "slideshow": { "slide_type": "subslide" @@ -1030,7 +1091,7 @@ }, { "cell_type": "markdown", - "id": "0061793a", + "id": "61619361", "metadata": { "slideshow": { "slide_type": "slide" @@ -1074,7 +1135,7 @@ { "cell_type": "code", "execution_count": 33, - "id": "1c35006a", + "id": "3f64d35c", "metadata": { "cell_style": "split" }, @@ -1173,7 +1234,7 @@ "---\n", "\n", "different means:\n", - " - arithmetric mean: $$ \\overline{x} = E[x] = <x> = \\frac{1}{n}\\sum\\limits_{i=1}^n x_i (= \\mu)$$\n", + " - arithmetic mean: $$ \\overline{x} = E[x] = <x> = \\frac{1}{n}\\sum\\limits_{i=1}^n x_i (= \\mu)$$\n", " - geometric mean: $$ \\overline{{x}}_\\mathrm {geom} = \\sqrt[n]{\\prod\\limits_{i=1}^{n}{x_i}}$$\n", " - quadratic mean: $$ \\overline{{x}}_\\mathrm{quadr} = \\sqrt{E[x^2]} = \\sqrt {\\frac {1}{n} \\sum\\limits_{i=1}^{n}x_i^2} = \\sqrt{\\overline{x^2}} $$\n", "\n" @@ -1181,7 +1242,7 @@ }, { "cell_type": "markdown", - "id": "e53ae979", + "id": "2503d92a", "metadata": { "slideshow": { "slide_type": "skip" @@ -1218,7 +1279,7 @@ { "cell_type": "code", "execution_count": null, - "id": "6404bfe5", + "id": "5f98b25d", "metadata": { "slideshow": { "slide_type": "-" @@ -1298,7 +1359,7 @@ }, { "cell_type": "markdown", - "id": "8142e1a9", + "id": "20d86d18", "metadata": { "slideshow": { "slide_type": "slide" @@ -1311,7 +1372,7 @@ { "cell_type": "code", "execution_count": 40, - "id": "c18fdbb0", + "id": "bf609514", "metadata": {}, "outputs": [ { @@ -1343,7 +1404,7 @@ { "cell_type": "code", "execution_count": null, - "id": "abc2a8ab", + "id": "6fff5415", "metadata": {}, "outputs": [], "source": [] @@ -1351,7 +1412,7 @@ { "cell_type": "code", "execution_count": 23, - "id": "b7891912", + "id": "751f9384", "metadata": { "slideshow": { "slide_type": "notes" @@ -1374,7 +1435,7 @@ { "cell_type": "code", "execution_count": 26, - "id": "479931df", + "id": "abda1c9f", "metadata": { "slideshow": { "slide_type": "notes" @@ -1396,7 +1457,84 @@ }, { "cell_type": "markdown", - "id": "de9cd02f", + "id": "eed39982", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "### Exercise: Compute variance of goals per match\n", + "\n", + "Compute the variance of the sum of the home and away goals per match in three ways, where $V$ is the covariance matrix on home,away goals from before:\n", + "- wrong error propagation $U = \\sigma_1^2 + \\sigma_2^2 = V_{11} + V_{22}$\n", + "<br>\n", + "- correct error propagation $U = \\left(\\begin{array}{rr}1 & 1\\\\ \\end{array}\\right)V\\left(\\begin{array}{r}1 \\\\ 1\\\\ \\end{array}\\right)$\n", + "\n", + " > You can use: `A = np.array([[1, 1]])` to define the matrix of derivatives and <br> `U=A@V@A.T`for the matrix transformation \n", + "<br>\n", + "- directly with`np.var`\n", + "<br>" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6b0966b7", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "id": "404bffbd", + "metadata": {}, + "source": [ + "What changes when you look at the goal difference?" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "id": "f88ff1a1", + "metadata": { + "slideshow": { + "slide_type": "notes" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3.136890603235832\n", + "[[2.75135541]]\n", + "2.7423640480157205\n", + "3.510914605493613\n" + ] + } + ], + "source": [ + "A = np.array([[1, 1]])\n", + "V = np.cov(data, rowvar=False)\n", + "\n", + "print(V[0,0] + V[1,1])\n", + "\n", + "U = A@V@A.T\n", + "\n", + "\n", + "\n", + "print(U)\n", + "\n", + "print(np.var(data[:,0] + data[:,1]))\n", + "\n", + "\n", + "print(np.var(data[:,0] - data[:,1]))\n" + ] + }, + { + "cell_type": "markdown", + "id": "f826b603", "metadata": {}, "source": [ "### Exercise: Check \"functions of random variables\"" @@ -1404,18 +1542,18 @@ }, { "cell_type": "markdown", - "id": "90569b20", + "id": "ac750c83", "metadata": {}, "source": [ "Let's use pseudo-experiments/Monte Carlo:\n", "\n", " * generate 100.000 uniformly distributed values $u$\n", - " * make a histgram of $u$ and of $\\sqrt(u)$\n" + " * make a histogram of $u$ and of $\\sqrt(u)$\n" ] }, { "cell_type": "markdown", - "id": "d3c6a3a9", + "id": "0d93b2f0", "metadata": {}, "source": [ "Relatively easy with *scipy* and *numpy*:\n", @@ -1423,14 +1561,14 @@ "<br>\n", "or\n", "<br>\n", - " * use [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html)\n", - " * use [scipy.stats.norm](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.uniform.html) class\n" + " * use [`scipy.stats`](https://docs.scipy.org/doc/scipy/reference/stats.html)\n", + " * use [`scipy.stats.norm`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.uniform.html) class\n" ] }, { "cell_type": "code", - "execution_count": 56, - "id": "a58583f0", + "execution_count": 72, + "id": "b424b5b0", "metadata": {}, "outputs": [ { @@ -1453,7 +1591,7 @@ { "cell_type": "code", "execution_count": 59, - "id": "faad7477", + "id": "785734fb", "metadata": { "slideshow": { "slide_type": "notes" @@ -1498,130 +1636,10 @@ }, { "cell_type": "markdown", - "id": "ba08bfd1-4c17-4a1d-bdf6-9c517edffdd8", + "id": "fd133d10", "metadata": { "slideshow": { - "slide_type": "slide" - }, - "tags": [] - }, - "source": [ - "# Wahrscheinlichkeitsdichten\n", - "\n", - "## Diskrete Verteilungen\n", - "\n", - "### Binomialverteilung \n", - "\n", - "Binomialverteilung Ist $p$ die Wahrscheinlichkeit f\"ur das Auftreten\n", - "eines Ereignisses, so ist die Wahrscheinlichkeit, dass es bei $n$\n", - "Versuchen $k$-mal auftritt, gegeben durch die Binomialverteilung:\n", - "$$P(k) = {n \\choose k} p^k(1-p)^{n-k} \\text{, } k = 0,1,2...n$$\n", - "\n", - "Erwartungswert und Varianz\n", - "$$<k> = E[k] = \\sum \\limits_{k = 0}^{n} k P(k) = np$$\n", - "$$V[k] = \\sigma^2 = np(1-p)$$" - ] - }, - { - "cell_type": "markdown", - "id": "c1dbdd5b-c1f4-40b3-bad6-d6639a9a635c", - "metadata": { - "slideshow": { - "slide_type": "slide" - }, - "tags": [] - }, - "source": [ - "### Beispiel \n", - "\n", - "Werfen von fünf Münzen $n = 5$, $p = 0.5$ \n", - "\n", - "| k | 0 | 1 | 2 | 3 | 4 | 5 |\n", - "|:-----|:----:|:----:|:-----:|:-----:|:----:|:----:|\n", - "| P(k) | 1/32 | 5/32 | 10/32 | 10/32 | 5/32 | 1/32 |\n", - "\n", - "<img src=\"./figures/08/binom5.pdf\" style=\"width:75.0%\" />\n", - "\n", - "### Beispiel II \n", - "\n", - "Fehler in der Effizienzbestimmung eines Selektionsschittes Es soll die\n", - "Effizienz eines Selektionschnittes und ihr Fehler bestimmt werden, wenn\n", - "in einer Stichprobe von $n$ Datenpunkten $k$ Punkte diesen Schnitt\n", - "überleben. \n", - "Die Zufallsvariable ist die gefundene Effizienz $h_k = \\frac{k}{n}$. \n", - "Wie groß ist der Fehler? \n", - "Die Zahlen $k$ folgen einer Binomialverteilung mit der\n", - "Wahrscheinlichkeit $p_k = E[h_k] = E[\\frac{k}{n}]$: $$\\begin{aligned}\n", - " \\sigma(h_k) &= &\\sqrt{V[\\frac{k}{n}]} = \\sqrt{\\frac{1}{n^2} V[k]} = \\sqrt{\\frac{1}{n^2}\\cdot np_k(1-p_k)}\\\\\n", - " &=& \\sqrt{\\frac{p_k(1-p_k)}{n}}\\\\ \n", - "\\end{aligned}$$" - ] - }, - { - "cell_type": "markdown", - "id": "60849c4a-dad2-4acd-9780-f9560da2ed9b", - "metadata": { - "slideshow": { - "slide_type": "slide" - }, - "tags": [] - }, - "source": [ - "### Poisson-Verteilung \n", - "\n", - "Poisson-Verteilung Die Possionverteilung gibt die Wahrscheinlichkeit an,\n", - "genau $k$ Ereignisse zu erhalten, wenn die Zahl der Versuche $n$ sehr\n", - "groß und die Wahrscheinlichkeit $p$ sehr klein ist. Mit $\\mu = np$\n", - "$$P(k) = \\frac{\\mu^ke^{-\\mu}}{k!}$$\n", - "\n", - "Erwartungswert und Varianz\n", - "$$E[k] = \\sum \\limits_{k = 1}^{\\infty} k \\frac{e^{-\\mu}\\mu^k}{k!}\n", - " = \\mu \\sum \\limits_{k = 1}^{\\infty} k \\frac{e^{-\\mu}\\mu^{k-1}}{(k-1)! k}\n", - " = \\mu \\sum \\limits_{s = 0}^{\\infty} \\frac{e^{-\\mu}\\mu^{s}}{s!} = \\mu$$\n", - "$$V[k] = \\sigma^2 = \\mu$$\n", - "\n", - "### Poisson- und Binomialverteilung \n", - "\n", - "Binomialverteilung mit $n= 1000$ und $p = 0.01$ \n", - "Poisson-Verteilung mit $\\mu = 10$(schraffiert) \n", - "\n", - "<img src=\"./figures/08//bp.jpg\" style=\"width:85.0%\"\n", - "alt=\"image\" />" - ] - }, - { - "cell_type": "markdown", - "id": "7e80723a-ac12-4af1-a43a-70593ef791b8", - "metadata": { - "slideshow": { - "slide_type": "slide" - }, - "tags": [] - }, - "source": [ - "### Beispiel aus vielen alten Statistikbüchern \n", - "\n", - "Tod durch Pferdetritte in der preußischen Armee\n", - "\n", - "In der preußischen Armee wurde f\"ur jedes Jahr und jedes Armeekorps die\n", - "Anzahl der Todesfälle durch Huftritte registriert. Für 20 Jahre\n", - "(1875–1894) und 14 Armeekorps ergibt sich:\n", - "\n", - "| Anzahl des Todesf\"alle $k$ | 0 | 1 | 2 | 3 | 4 | 5 | 6 |\n", - "|:------------------------------------------|----:|----:|----:|----:|----:|----:|----:|\n", - "| Zahl der Korps-Jahre mit $k$ Todesf\"allen | 144 | 91 | 32 | 11 | 2 | 0 | 0 |\n", - "\n", - "<img src=\"./figures/08/poisson70.png\" style=\"width:55.0%\" />\n", - "\n", - "Poisson-Verteilung f\"ur $\\mu = \\frac{196}{280} = 0.70$" - ] - }, - { - "cell_type": "markdown", - "id": "b9772cfd-74fb-4a8b-9c0c-fd2c3554a986", - "metadata": { - "slideshow": { - "slide_type": "slide" + "slide_type": "skip" }, "tags": [] }, @@ -1631,10 +1649,10 @@ }, { "cell_type": "markdown", - "id": "42e65c7a-4636-4319-b21a-acc95140c2de", + "id": "d369a48b", "metadata": { "slideshow": { - "slide_type": "fragment" + "slide_type": "skip" }, "tags": [] }, @@ -1644,10 +1662,10 @@ }, { "cell_type": "markdown", - "id": "4b8a5b72-aa82-4acf-9499-8736ed6246f8", + "id": "a56aa4e8", "metadata": { "slideshow": { - "slide_type": "slide" + "slide_type": "skip" }, "tags": [] }, @@ -1657,10 +1675,10 @@ }, { "cell_type": "markdown", - "id": "1f03bf12-f17b-409d-8932-4e3e24023445", + "id": "cf26407a", "metadata": { "slideshow": { - "slide_type": "fragment" + "slide_type": "skip" }, "tags": [] }, @@ -1673,10 +1691,10 @@ }, { "cell_type": "markdown", - "id": "40541a16-abc8-4f5b-b504-71951aa891f5", + "id": "0ab4af7a", "metadata": { "slideshow": { - "slide_type": "subslide" + "slide_type": "skip" }, "tags": [] }, @@ -1689,10 +1707,10 @@ { "cell_type": "code", "execution_count": 11, - "id": "5e55929e-7028-4ae2-9e05-29812e933733", + "id": "d0adc358", "metadata": { "slideshow": { - "slide_type": "fragment" + "slide_type": "skip" }, "tags": [] }, @@ -1720,10 +1738,10 @@ }, { "cell_type": "markdown", - "id": "8abb14b4-80fd-494a-89a0-310bceb277dc", + "id": "fa7c75ae", "metadata": { "slideshow": { - "slide_type": "slide" + "slide_type": "skip" }, "tags": [] }, @@ -1733,10 +1751,10 @@ }, { "cell_type": "markdown", - "id": "85185cef-6b18-4c03-8040-437f1fd40b9e", + "id": "bf5c16d9", "metadata": { "slideshow": { - "slide_type": "fragment" + "slide_type": "skip" }, "tags": [] }, @@ -1749,8 +1767,12 @@ { "cell_type": "code", "execution_count": null, - "id": "ac354d95-cede-4215-8138-b0d7c6ae9a5e", - "metadata": {}, + "id": "4823f38f", + "metadata": { + "slideshow": { + "slide_type": "skip" + } + }, "outputs": [], "source": [] } @@ -1772,7 +1794,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.2" + "version": "3.9.20" }, "livereveal": { "autolaunch": true,