Skip to content
Snippets Groups Projects
Commit 6ce17d05 authored by Hartmut Stadie's avatar Hartmut Stadie
Browse files

lecture 1, remove outputs

parent 58ebc69a
Branches
No related tags found
No related merge requests found
%% Cell type:markdown id:87c9d786-6d5c-4366-9701-5fa127727caa tags: %% Cell type:markdown id:87c9d786-6d5c-4366-9701-5fa127727caa tags:
# Lecture 1 # Lecture 1
--- ---
## Basic statistics ## Basic statistics
# #
<br> <br>
<br> <br>
Hartmut Stadie Hartmut Stadie
hartmut.stadie@uni-hamburg.de hartmut.stadie@uni-hamburg.de
%% Cell type:markdown id:a3347273 tags: %% Cell type:markdown id:a3347273 tags:
## Probability ## Probability
### Probability ### Probability
When is a system *random*? When is a system *random*?
The degree of randomness can a quantified with the concept of *probability*. The degree of randomness can a quantified with the concept of *probability*.
Consider the sample space $S$: Consider the sample space $S$:
Axioms by Kolmogorov: Axioms by Kolmogorov:
- for every subset $A$ in $S$, - for every subset $A$ in $S$,
$P(A) \ge 0$ $P(A) \ge 0$
- for two disjoint subsets $A$ and $B$ - for two disjoint subsets $A$ and $B$
($A \cap B = \emptyset$), ($A \cap B = \emptyset$),
$P(A \cup B) = P(A) + P(B)$ $P(A \cup B) = P(A) + P(B)$
- for the whole sample space $S$, - for the whole sample space $S$,
$P(S) = 1$ $P(S) = 1$
*random variable*: a variable that takes on a specific value for each element of $S$ *random variable*: a variable that takes on a specific value for each element of $S$
%% Cell type:markdown id:a7de85eb tags: %% Cell type:markdown id:a7de85eb tags:
### Conditional probability ### Conditional probability
%% Cell type:markdown id:67a031e2 tags: %% Cell type:markdown id:67a031e2 tags:
Definition: **Conditional probability** Definition: **Conditional probability**
Conditional probability for two subsets $A$ and $B$ in $S$: Conditional probability for two subsets $A$ and $B$ in $S$:
$P(A|B) = \dfrac{P(A \cap B)}{P(B)}$ $P(A|B) = \dfrac{P(A \cap B)}{P(B)}$
Definition: **independence** Definition: **independence**
Two subsets $A$ and $B$ in $S$ are independent, if Two subsets $A$ and $B$ in $S$ are independent, if
$P(A \cap B) = P(A) P(B)$ $P(A \cap B) = P(A) P(B)$
If $A$ and $B$ are independent: If $A$ and $B$ are independent:
$P(A|B) = P(A)$ and $P(B|A) = P(B)$ $P(A|B) = P(A)$ and $P(B|A) = P(B)$
%% Cell type:markdown id:eb310ab8 tags: %% Cell type:markdown id:eb310ab8 tags:
<img src="./figures/08/Conditional_probability.png" /> <img src="./figures/08/Conditional_probability.png" />
CC0, via Wikimedia Commons CC0, via Wikimedia Commons
%% Cell type:markdown id:8a49042a tags: %% Cell type:markdown id:8a49042a tags:
### Bayes' theorem ### Bayes' theorem
%% Cell type:markdown id:a6fd8465 tags: %% Cell type:markdown id:a6fd8465 tags:
$P(A|B) = \dfrac{P(A \cap B)}{P(B)}$ $P(A|B) = \dfrac{P(A \cap B)}{P(B)}$
$P(B|A) = \dfrac{P(B \cap A)}{P(A)}$ $P(B|A) = \dfrac{P(B \cap A)}{P(A)}$
Theorem: $$P(A|B) = \dfrac{P(B|A) P(A)}{P(B)}$$ Theorem: $$P(A|B) = \dfrac{P(B|A) P(A)}{P(B)}$$
%% Cell type:markdown id:72effe84 tags: %% Cell type:markdown id:72effe84 tags:
<img src="./figures/08/bayes.gif" style="width:80.0%" /> <img src="./figures/08/bayes.gif" style="width:80.0%" />
most likely not Bayes most likely not Bayes
%% Cell type:markdown id:deb1f10f tags: %% Cell type:markdown id:deb1f10f tags:
### Classic example ### Classic example
Test for disease: Prior: $P(\text{disease}) = 0.001$; Test for disease: Prior: $P(\text{disease}) = 0.001$;
$P(\text{no disease}) = 0.999$ $P(\text{no disease}) = 0.999$
Test: $P(+|\text{disease}) = 0.98$; Test: $P(+|\text{disease}) = 0.98$;
$P(+|\text{no disease}) = 0.03$ $P(+|\text{no disease}) = 0.03$
What is the implication of a positive test result? What is the implication of a positive test result?
$$\begin{aligned} $$\begin{aligned}
P(\text{disease}|+) &= & \dfrac{P(+|\text{disease}) P(\text{disease})}{P(+)} \\ P(\text{disease}|+) &= & \dfrac{P(+|\text{disease}) P(\text{disease})}{P(+)} \\
& = &\dfrac{P(+|\text{disease}) P(\text{disease})}{P(+|\text{disease})P(\text{disease})+P(+|\text{no disease})P(\text{no disease})}\\ & = &\dfrac{P(+|\text{disease}) P(\text{disease})}{P(+|\text{disease})P(\text{disease})+P(+|\text{no disease})P(\text{no disease})}\\
& = &\dfrac{0.98\cdot 0.001}{0.98\cdot 0.001 + 0.03\cdot 0.999}\\ & = &\dfrac{0.98\cdot 0.001}{0.98\cdot 0.001 + 0.03\cdot 0.999}\\
& = &0.32\\ & = &0.32\\
\end{aligned}$$ \end{aligned}$$
%% Cell type:markdown id:c9c11bcb tags: %% Cell type:markdown id:c9c11bcb tags:
## Interpretation of probability ## Interpretation of probability
### Interpretations of probability ### Interpretations of probability
%% Cell type:markdown id:5104a1d9 tags: %% Cell type:markdown id:5104a1d9 tags:
**Objective interpretation**: **Objective interpretation**:
Probability as a relative frequency: Probability as a relative frequency:
$$P(A) = \lim_{n\to\infty} \dfrac{\text{number of occurrences of outcome $A$ in $n$ measurements}}{n}$$ $$P(A) = \lim_{n\to\infty} \dfrac{\text{number of occurrences of outcome $A$ in $n$ measurements}}{n}$$
(*frequentist)* (*frequentist)*
%% Cell type:markdown id:389d7d78 tags: %% Cell type:markdown id:389d7d78 tags:
**Subjective interpretation**: **Subjective interpretation**:
$$P(A) = \text{degree of belief that hypotheses $A$ is true}$$ $$P(A) = \text{degree of belief that hypotheses $A$ is true}$$
Typical example: Typical example:
$$P(\text{theory}|\text{data}) \propto P(\text{data}|\text{theory}) P(\text{theory})$$ $$P(\text{theory}|\text{data}) \propto P(\text{data}|\text{theory}) P(\text{theory})$$
(*Bayesian*) (*Bayesian*)
%% Cell type:markdown id:b8ab42a5-b7af-4942-9a7a-91b73e0ce625 tags: %% Cell type:markdown id:b8ab42a5-b7af-4942-9a7a-91b73e0ce625 tags:
# Probability Density Functions # Probability Density Functions
Single continuous variable $x$ which describes the outcome of an experiment: Single continuous variable $x$ which describes the outcome of an experiment:
probability density function (p.d.f.) $f(x)$: probability density function (p.d.f.) $f(x)$:
- probability to observe $x$ in the interval $[x, x + dx]$: - probability to observe $x$ in the interval $[x, x + dx]$:
$f(x)\,dx$ $f(x)\,dx$
- normalization $$\int_S f(x)\,dx = 1$$ - normalization $$\int_S f(x)\,dx = 1$$
Cumulative distribution (cumulative density function (cdf)) $F(x)$:<br> Cumulative distribution (cumulative density function (cdf)) $F(x)$:<br>
(mathematics: distribution function) (mathematics: distribution function)
- probability to observe values less of equal than $x$: - probability to observe values less of equal than $x$:
$$F(x) = \int_{-\infty}^x f(x^\prime)\,dx^\prime$$ $$F(x) = \int_{-\infty}^x f(x^\prime)\,dx^\prime$$
%% Cell type:markdown id:13563524-0626-4a69-a6ab-f87d7f19a016 tags: %% Cell type:markdown id:13563524-0626-4a69-a6ab-f87d7f19a016 tags:
### Example ### Example
$$P(a \le x \le b) = \int_a^b f(x)\,dx = F(b) - F(a)$$ $$P(a \le x \le b) = \int_a^b f(x)\,dx = F(b) - F(a)$$
%% Cell type:markdown id:99c06502 tags: %% Cell type:markdown id:99c06502 tags:
# Probability Density Functions # Probability Density Functions
%% Cell type:markdown id:165ca1d8 tags: %% Cell type:markdown id:165ca1d8 tags:
<img src="./figures/08/Lognormal_distribution_PDF.svg" width="150%" /> <img src="./figures/08/Lognormal_distribution_PDF.svg" width="150%" />
%% Cell type:markdown id:7b97cfc3 tags: %% Cell type:markdown id:7b97cfc3 tags:
<img src="./figures/08/CDF-log_normal_distributions.svg" width="75%" /> <img src="./figures/08/CDF-log_normal_distributions.svg" width="75%" />
%% Cell type:markdown id:8eb9fa5b tags: %% Cell type:markdown id:8eb9fa5b tags:
### Quantiles ### Quantiles
%% Cell type:markdown id:8e7b412c tags: %% Cell type:markdown id:8e7b412c tags:
Quantile $x_\alpha$ is the value of the random variable $x$ with Quantile $x_\alpha$ is the value of the random variable $x$ with
$$F(x_\alpha) = \int_{-\infty}^{x_\alpha} f(x)\,dx = \alpha$$ $$F(x_\alpha) = \int_{-\infty}^{x_\alpha} f(x)\,dx = \alpha$$
Hence: $$x_\alpha = F^{-1}(\alpha)$$ Hence: $$x_\alpha = F^{-1}(\alpha)$$
Median: $x_{\frac{1}{2}}$ Median: $x_{\frac{1}{2}}$
$$F(x_{\frac{1}{2}}) = 0.5$$ $$x_{\frac{1}{2}} = F^{-1}(0.5)$$ $$F(x_{\frac{1}{2}}) = 0.5$$ $$x_{\frac{1}{2}} = F^{-1}(0.5)$$
%% Cell type:markdown id:14e4b782-9573-4e87-89d0-901dade25538 tags: %% Cell type:markdown id:14e4b782-9573-4e87-89d0-901dade25538 tags:
<img src="./figures/09/Normalverteilung.png" width=80% alt="image" /> <img src="./figures/09/Normalverteilung.png" width=80% alt="image" />
%% Cell type:markdown id:1258f494 tags: %% Cell type:markdown id:1258f494 tags:
### Joint probability densisty function ### Joint probability densisty function
%% Cell type:markdown id:783ab1f3 tags: %% Cell type:markdown id:783ab1f3 tags:
Example: outcome of a measurement characterized by two continuous variables $x,y$. <br> Example: outcome of a measurement characterized by two continuous variables $x,y$. <br>
Event $A$: $x$ observed in $[x, x + dx]$, y anywhere<br> Event $A$: $x$ observed in $[x, x + dx]$, y anywhere<br>
Event $B$: $y$ observed in $[y, y + dy]$, x anywhere Event $B$: $y$ observed in $[y, y + dy]$, x anywhere
$$P(A \cap B) = \text{probability for $x$ in $[x, x + dx]$ and $y$ in $[y, y + dy]$} = f(x, y)\,dxdy$$ $$P(A \cap B) = \text{probability for $x$ in $[x, x + dx]$ and $y$ in $[y, y + dy]$} = f(x, y)\,dxdy$$
Marginal p.d.f.: $$f_x(x) = \int_{-\infty}^\infty f(x,y)\,dy$$ Marginal p.d.f.: $$f_x(x) = \int_{-\infty}^\infty f(x,y)\,dy$$
$$f_y(y) = \int_{-\infty}^\infty f(x,y)\,dx$$ $$f_y(y) = \int_{-\infty}^\infty f(x,y)\,dx$$
%% Cell type:markdown id:8e74c511-b730-4781-a751-e1895983cc4c tags: %% Cell type:markdown id:8e74c511-b730-4781-a751-e1895983cc4c tags:
<img src="./figures/09/W_top.png" width="150%" alt="image" /> <img src="./figures/09/W_top.png" width="150%" alt="image" />
%% Cell type:markdown id:8f2cf729 tags: %% Cell type:markdown id:8f2cf729 tags:
### Marginal pdfs ### Marginal pdfs
%% Cell type:markdown id:438662b0 tags: %% Cell type:markdown id:438662b0 tags:
<img src="./figures/09/W.png" style="width:47.0%" <img src="./figures/09/W.png" style="width:47.0%"
alt="image" /> alt="image" />
%% Cell type:markdown id:55461265 tags: %% Cell type:markdown id:55461265 tags:
<img src="./figures/09/top.png" style="width:47.0%" <img src="./figures/09/top.png" style="width:47.0%"
alt="image" /> alt="image" />
%% Cell type:markdown id:c88542a8 tags: %% Cell type:markdown id:c88542a8 tags:
### Conditional p.d.f. ### Conditional p.d.f.
%% Cell type:markdown id:ec5c20d4 tags: %% Cell type:markdown id:ec5c20d4 tags:
Conditional p.d.f.: $$g(x|y) = \frac{f(x,y)}{f_y(y)}$$ Conditional p.d.f.: $$g(x|y) = \frac{f(x,y)}{f_y(y)}$$
$$h(y|x) = \frac{f(x,y)}{f_x(x)}$$ $$h(y|x) = \frac{f(x,y)}{f_x(x)}$$
%% Cell type:markdown id:7624d391-9fd5-4567-bf84-632eead1bb20 tags: %% Cell type:markdown id:7624d391-9fd5-4567-bf84-632eead1bb20 tags:
<img src="./figures/09/top_cond.png" width="150%" <img src="./figures/09/top_cond.png" width="150%"
alt="image" /> alt="image" />
%% Cell type:markdown id:2e0275c5 tags: %% Cell type:markdown id:2e0275c5 tags:
### Bayes' theorem ### Bayes' theorem
%% Cell type:markdown id:76ce2f83 tags: %% Cell type:markdown id:76ce2f83 tags:
$g(x|y) = \frac{f(x,y)}{f_y(y)}$ and $g(x|y) = \frac{f(x,y)}{f_y(y)}$ and
$h(y|x) = \frac{f(x,y)}{f_x(x)}$ $h(y|x) = \frac{f(x,y)}{f_x(x)}$
Theorem: $$g(x|y) = \frac{h(y|x) f_x(x)}{f_y(y)}$$ Theorem: $$g(x|y) = \frac{h(y|x) f_x(x)}{f_y(y)}$$
With $f(x,y) = h(y|x) f_x(x) = g(x|y) f_y(y)$: With $f(x,y) = h(y|x) f_x(x) = g(x|y) f_y(y)$:
$$f_x(x) = \int_{-\infty}^\infty g(x|y) f_y(y)\,dy$$ $$f_x(x) = \int_{-\infty}^\infty g(x|y) f_y(y)\,dy$$
$$f_y(y) = \int_{-\infty}^\infty h(y|x) f_x(x)\,dy$$ $$f_y(y) = \int_{-\infty}^\infty h(y|x) f_x(x)\,dy$$
%% Cell type:markdown id:ba8fae9b-636b-4b7e-a36d-3be582c000f5 tags: %% Cell type:markdown id:ba8fae9b-636b-4b7e-a36d-3be582c000f5 tags:
<img src="./figures/08/bayes.gif" style="width:80.0%" /> <img src="./figures/08/bayes.gif" style="width:80.0%" />
most likely not Bayes most likely not Bayes
%% Cell type:markdown id:fdd11743-1e7b-48b6-9e9b-41bee7d4f23b tags: %% Cell type:markdown id:fdd11743-1e7b-48b6-9e9b-41bee7d4f23b tags:
## Functions of random variables ## Functions of random variables
### functions of random variables ### functions of random variables
Let $x$ be a random variable, $f(x)$ its p.d.f. and Let $x$ be a random variable, $f(x)$ its p.d.f. and
$a(x)$ a continuous function: $a(x)$ a continuous function:
What is the p.d.f $g(a)$? What is the p.d.f $g(a)$?
equal probability for $x$ in $[x, x+dx]$ and $a$ in $[a, a+da]$: equal probability for $x$ in $[x, x+dx]$ and $a$ in $[a, a+da]$:
$$g(a) da = \int_{dS} f(x)\,dx$$ $$g(a) da = \int_{dS} f(x)\,dx$$
if $x(a)$ (the inverse of $a(x)$) exists: if $x(a)$ (the inverse of $a(x)$) exists:
$$g(a) da = \left| \int_{x(a)}^{x(a +da)} f(x^\prime)\,dx^\prime \right| = \int_{x(a)}^{x(a) + |\frac{dx}{da}|da} f(x^\prime)\,dx^\prime$$ $$g(a) da = \left| \int_{x(a)}^{x(a +da)} f(x^\prime)\,dx^\prime \right| = \int_{x(a)}^{x(a) + |\frac{dx}{da}|da} f(x^\prime)\,dx^\prime$$
or or
$$g(a) = f(x(a)) \left|\frac{dx}{da}\right|$$ $$g(a) = f(x(a)) \left|\frac{dx}{da}\right|$$
%% Cell type:markdown id:525847ad-2ddf-468c-883f-35bc853bac8c tags: %% Cell type:markdown id:525847ad-2ddf-468c-883f-35bc853bac8c tags:
### Examples ### Examples
- example 1: - example 1:
For $x$ equally distributed between 0 and 1, p.d.f. of $x$: $u(x) = 1$ and $a(x) = \sqrt{x}$, $x(a) = a^2$<br> For $x$ equally distributed between 0 and 1, p.d.f. of $x$: $u(x) = 1$ and $a(x) = \sqrt{x}$, $x(a) = a^2$<br>
p.d.f. $g(a)$: p.d.f. $g(a)$:
$$g(a) = u(x(a)) \left|\frac{dx}{da}\right| = 1 \cdot \left|\frac{da^2}{da}\right| = 2a \text{ (linearly distributed)}$$ $$g(a) = u(x(a)) \left|\frac{dx}{da}\right| = 1 \cdot \left|\frac{da^2}{da}\right| = 2a \text{ (linearly distributed)}$$
<br> <br>
- example 2: - example 2:
For $x$ equally distributed between 0 and 1, p.d.f. of $x$: $u(x) = 1$ and $a(x) = F^{-1}(x)$, $x(a) = F(a)$<br> For $x$ equally distributed between 0 and 1, p.d.f. of $x$: $u(x) = 1$ and $a(x) = F^{-1}(x)$, $x(a) = F(a)$<br>
p.d.f. $g(a)$: p.d.f. $g(a)$:
$$g(a) = u(x(a)) \left|\frac{dx}{da}\right| = 1 \cdot \left|\frac{dF(a)}{da}\right| = f(a$$ $$g(a) = u(x(a)) \left|\frac{dx}{da}\right| = 1 \cdot \left|\frac{dF(a)}{da}\right| = f(a$$
%% Cell type:markdown id:0c669bab-6c81-45fb-a506-8ef709cd4687 tags: %% Cell type:markdown id:0c669bab-6c81-45fb-a506-8ef709cd4687 tags:
### Functions of vectors of random variables ### Functions of vectors of random variables
Let $\vec x$ be vector of random variables, $f(\vec x)$ the p.d.f. and $\vec a(\vec x)$ a continuous function: Let $\vec x$ be vector of random variables, $f(\vec x)$ the p.d.f. and $\vec a(\vec x)$ a continuous function:
What is the p.d.f. $g(\vec a)$? What is the p.d.f. $g(\vec a)$?
$$g(\vec a) = f(\vec x) \left| J \right| \text{, where $\left| J \right|$ is the absolute value of Jacobian determinant of } J = $$g(\vec a) = f(\vec x) \left| J \right| \text{, where $\left| J \right|$ is the absolute value of Jacobian determinant of } J =
\begin{array}{rrrr} \begin{array}{rrrr}
\frac{\partial x_1}{\partial a_1} & \frac{\partial x_1}{\partial a_2} & \dots & \frac{\partial x_1}{\partial a_m} \\[6pt] \frac{\partial x_1}{\partial a_1} & \frac{\partial x_1}{\partial a_2} & \dots & \frac{\partial x_1}{\partial a_m} \\[6pt]
\frac{\partial x_2}{\partial a_1} & \frac{\partial x_2}{\partial a_2} & \dots & \frac{\partial x_2}{\partial a_m} \\[6pt] \frac{\partial x_2}{\partial a_1} & \frac{\partial x_2}{\partial a_2} & \dots & \frac{\partial x_2}{\partial a_m} \\[6pt]
\vdots & \vdots & \ddots & \vdots \\[6pt] \vdots & \vdots & \ddots & \vdots \\[6pt]
\frac{\partial x_n}{\partial a_1} & \frac{\partial x_n}{\partial a_2} & \dots & \frac{\partial x_n}{\partial a_m} \\[6pt] \frac{\partial x_n}{\partial a_1} & \frac{\partial x_n}{\partial a_2} & \dots & \frac{\partial x_n}{\partial a_m} \\[6pt]
\end{array}$$ \end{array}$$
%% Cell type:markdown id:2907584f-bb17-463e-b084-749f2011bd4c tags: %% Cell type:markdown id:2907584f-bb17-463e-b084-749f2011bd4c tags:
### Expectation value and moments ### Expectation value and moments
- **Definition:** - **Definition:**
expectation value of the function $h(x)$ for a p.d.f. $f(x)$: expectation value of the function $h(x)$ for a p.d.f. $f(x)$:
$$E[h] = \int_{-\infty}^{\infty} h(x) \, f(x)\,dx$$ $$E[h] = \int_{-\infty}^{\infty} h(x) \, f(x)\,dx$$
- **special case:** $h(x) = x$: - **special case:** $h(x) = x$:
$$E[x] = \int_{-\infty}^{\infty} x \, f(x)\,dx = <x>$$ $$E[x] = \int_{-\infty}^{\infty} x \, f(x)\,dx = <x>$$
$E[x]$ is called the population mean or just mean, $\bar x$ or $\mu$. $E[x]$ is called the population mean or just mean, $\bar x$ or $\mu$.
- Expectation value is a linear operator: - Expectation value is a linear operator:
$$E[a\cdot g(x) + b \cdot h(x)] = a\cdot E[g(x)] + b\cdot E[h(x)]$$ $$E[a\cdot g(x) + b \cdot h(x)] = a\cdot E[g(x)] + b\cdot E[h(x)]$$
- $n$th moment: - $n$th moment:
$$E[x^n] = \int_{-\infty}^{\infty} x \, f(x)\,dx$$ $$E[x^n] = \int_{-\infty}^{\infty} x \, f(x)\,dx$$
- $n$th central moment: - $n$th central moment:
$$E[(x - E[x])^n] = E[(x-\mu)^n] = \int_{-\infty}^{\infty} x \, f(x)\,dx$$ $$E[(x - E[x])^n] = E[(x-\mu)^n] = \int_{-\infty}^{\infty} x \, f(x)\,dx$$
%% Cell type:markdown id:4b7f6667-c019-4b23-83fd-a1758d748762 tags: %% Cell type:markdown id:4b7f6667-c019-4b23-83fd-a1758d748762 tags:
## Grundbegriffe ## Grundbegriffe
### Grundbegriffe ### Grundbegriffe
Diskrete Zufallsvariable Mittelwert: Diskrete Zufallsvariable Mittelwert:
$$<r> = \bar r = \sum _{i=1}^N r_i P(r_i)$$ $$<r> = \bar r = \sum _{i=1}^N r_i P(r_i)$$
Kontinuierliche Zufallsvariable Wahrscheinlichkeitsdichte $f(x)$ mit Kontinuierliche Zufallsvariable Wahrscheinlichkeitsdichte $f(x)$ mit
- $P(a \leq x \leq b) = \int_a^b f(x)\,dx$ - $P(a \leq x \leq b) = \int_a^b f(x)\,dx$
- $f(x) \geq 0$ - $f(x) \geq 0$
- $\int_{-\infty}^{\infty} f(x)\,dx = 1$ - $\int_{-\infty}^{\infty} f(x)\,dx = 1$
Mittelwert: Mittelwert:
$$<x> = \bar x = \int_{-\infty}^{\infty} x \, f(x)\,dx = \mu_x$$ $$<x> = \bar x = \int_{-\infty}^{\infty} x \, f(x)\,dx = \mu_x$$
%% Cell type:markdown id:5cd273e7-3129-454b-a31c-4e91c4316bf4 tags: %% Cell type:markdown id:5cd273e7-3129-454b-a31c-4e91c4316bf4 tags:
### Variance and standard deviation ### Variance and standard deviation
variance $V[x]$: variance $V[x]$:
- measure for the width of a p.d.f. - measure for the width of a p.d.f.
- second central moment - second central moment
- definition: - definition:
$$V[x] = E[(x - \mu_x)^2] = \int_{-\infty}^{\infty} (x-\mu_x)^2 \, f(x)\,dx$$ $$V[x] = E[(x - \mu_x)^2] = \int_{-\infty}^{\infty} (x-\mu_x)^2 \, f(x)\,dx$$
- useful relations: - useful relations:
$$V = E[(x - \mu)^2] = E[x^2 - 2x\mu + \mu^2] = E[x^2] - 2\mu E[x] + \mu^2 = E[x^2] - 2 \mu^2 + \mu^2 = E[x^2] - (E[x])^2$$ $$V = E[(x - \mu)^2] = E[x^2 - 2x\mu + \mu^2] = E[x^2] - 2\mu E[x] + \mu^2 = E[x^2] - 2 \mu^2 + \mu^2 = E[x^2] - (E[x])^2$$
$$V[ax] = a^2 V[x]$$ $$V[ax] = a^2 V[x]$$
%% Cell type:markdown id:30d01c8b-50a5-4d90-8e0b-89a68f52f9bd tags: %% Cell type:markdown id:30d01c8b-50a5-4d90-8e0b-89a68f52f9bd tags:
### Variance and standard deviation ### Variance and standard deviation
standard deviation $\sigma$: standard deviation $\sigma$:
- measure for the variation of a random variable around its mean - measure for the variation of a random variable around its mean
<br> <br>
- in physics: “the error” - in physics: “the error”
<br> <br>
- definition $$\sigma = \sqrt{V[x]}$$<br> - definition $$\sigma = \sqrt{V[x]}$$<br>
%% Cell type:markdown id:ae67472e-f9b2-46fa-9679-15553d31aaaf tags: %% Cell type:markdown id:ae67472e-f9b2-46fa-9679-15553d31aaaf tags:
### Covariance ### Covariance
- covariance $V_{xy}$ for two random variables $x$ and $y$ with p.d.f. $f(x,y)$: - covariance $V_{xy}$ for two random variables $x$ and $y$ with p.d.f. $f(x,y)$:
$$V_{xy} = E[(x - \mu_x)(y - \mu_y)] = E[xy] - \mu_x \mu_y$$ $$V_{xy} = E[(x - \mu_x)(y - \mu_y)] = E[xy] - \mu_x \mu_y$$
$$V_{xy} = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} xy\, f(x, y)\,dx \,dy - \mu_x\mu_y$$ $$V_{xy} = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} xy\, f(x, y)\,dx \,dy - \mu_x\mu_y$$
- Covariance $V_{ab} = \text{cov}[a, b]$ of two $a$ und $b$ functions of the random vector $\vec x$: - Covariance $V_{ab} = \text{cov}[a, b]$ of two $a$ und $b$ functions of the random vector $\vec x$:
$$\text{cov}[a, b] = E[(a - \mu_a)(b - \mu_b)] = E[ab] - \mu_a \mu_b$$ $$\text{cov}[a, b] = E[(a - \mu_a)(b - \mu_b)] = E[ab] - \mu_a \mu_b$$
$$\text{cov}[a, b] = \int_{-\infty}^{\infty} \dots \int_{-\infty}^{\infty} a(x) b(x)\, f(\vec x)\,dx_1 \dots \,dx_n - \mu_a\mu_b$$ $$\text{cov}[a, b] = \int_{-\infty}^{\infty} \dots \int_{-\infty}^{\infty} a(x) b(x)\, f(\vec x)\,dx_1 \dots \,dx_n - \mu_a\mu_b$$
%% Cell type:markdown id:e9c61f18 tags: %% Cell type:markdown id:e9c61f18 tags:
### Covariance ### Covariance
- covariance: $$\text{cov}(x,y) = E[(x-E[x])(y-E[y])] = E[xy] - E[x]E[y]$$ - covariance: $$\text{cov}(x,y) = E[(x-E[x])(y-E[y])] = E[xy] - E[x]E[y]$$
for samples $X = x_1, x_2,\dots, x_N$ and $Y = y_1, y_2,\dots, y_N$: for samples $X = x_1, x_2,\dots, x_N$ and $Y = y_1, y_2,\dots, y_N$:
$$\text{cov}(X,Y) = \frac{1}{n}\sum\limits_{i=1}^n (x_i - \overline x)(y_i - \overline y)$$ $$\text{cov}(X,Y) = \frac{1}{n}\sum\limits_{i=1}^n (x_i - \overline x)(y_i - \overline y)$$
- correlation $$\rho_{xy} = \frac{\text{cov}(X,Y)}{\sqrt{\text{cov}(X,X)\text{cov}(Y,Y)}}$$ - correlation $$\rho_{xy} = \frac{\text{cov}(X,Y)}{\sqrt{\text{cov}(X,X)\text{cov}(Y,Y)}}$$
%% Cell type:markdown id:576b4570-8a06-4224-b867-31459483e6bb tags: %% Cell type:markdown id:576b4570-8a06-4224-b867-31459483e6bb tags:
### Covariance matrix ### Covariance matrix
$$C = \left( $$C = \left(
\begin{array}{rr} \begin{array}{rr}
V_{xx} & V_{xy} \\ V_{xx} & V_{xy} \\
V_{yx} & V_{yy}\\ V_{yx} & V_{yy}\\
\end{array} \end{array}
\right)$$ \right)$$
Remarks: Remarks:
- sometimes called error matrix - sometimes called error matrix
<br> <br>
- $V_{xy} = V_{yx}$, matrix is symmetric - $V_{xy} = V_{yx}$, matrix is symmetric
<br> <br>
- $V_{ii} > 0$, matrix is positive (semi)definite - $V_{ii} > 0$, matrix is positive (semi)definite
<br> <br>
- correlation matrix: $$C^\prime = \left( - correlation matrix: $$C^\prime = \left(
\begin{array}{rr} \begin{array}{rr}
V_{xx}/V_{xx} & V_{xy}/\sqrt{V_{xx}V_{yy}} \\ V_{xx}/V_{xx} & V_{xy}/\sqrt{V_{xx}V_{yy}} \\
V_{xy}/\sqrt{V_{xx}V_{yy}} & V_{yy}/V_{yy}\\ V_{xy}/\sqrt{V_{xx}V_{yy}} & V_{yy}/V_{yy}\\
\end{array} \end{array}
\right) = \left( \right) = \left(
\begin{array}{rr} \begin{array}{rr}
1 & \rho_{xy} \\ 1 & \rho_{xy} \\
\rho_{xy} & 1\\ \rho_{xy} & 1\\
\end{array} \end{array}
\right)$$ \right)$$
- correlation coefficient: - correlation coefficient:
$$\rho_{xy} = \frac{V_{xy}}{\sqrt{V_{xx}V_{yy}}}$$ $$\rho_{xy} = \frac{V_{xy}}{\sqrt{V_{xx}V_{yy}}}$$
%% Cell type:markdown id:c3b63f0e tags: %% Cell type:markdown id:c3b63f0e tags:
### Error propagation ### Error propagation
Suppose we have a random vector $\vec x$ distributed according to joint p.d.f. $f(\vec x)$ with mean values $\vec \mu$ and covariance matrix $V$: Suppose we have a random vector $\vec x$ distributed according to joint p.d.f. $f(\vec x)$ with mean values $\vec \mu$ and covariance matrix $V$:
What is the variance of the function $y(\vec x)$? What is the variance of the function $y(\vec x)$?
Expand $y$ around $x = \vec \mu$: Expand $y$ around $x = \vec \mu$:
$$y(x) \approx y(\vec \mu) + \sum_{i = 1}^{N} \frac{\partial y}{\partial x_i}\big|_{\vec \mu}(x_i-\mu_i)$$ $$y(x) \approx y(\vec \mu) + \sum_{i = 1}^{N} \frac{\partial y}{\partial x_i}\big|_{\vec \mu}(x_i-\mu_i)$$
Expectation value of $y$: Expectation value of $y$:
$$E[y] \approx y(\vec \mu)$$ $$E[y] \approx y(\vec \mu)$$
Expectation value of $y^2$: Expectation value of $y^2$:
$$E[y^2] \approx E[(y(\vec \mu) + \sum_{i = 1}^{N} \frac{\partial y}{\partial x_i}\big|_{\vec \mu}(x_i-\mu_i))(y(\vec \mu) + \sum_{j = 1}^{N} \frac{\partial y}{\partial x_j}\big|_{\vec \mu}(x_i-\mu_j))] = y^2(\vec \mu) + \sum_{i = 1}^{N} \sum_{j = 1}^{N} \frac{\partial y}{\partial x_i}\big|_{\vec \mu} \frac{\partial y}{\partial x_j}\big|_{\vec \mu}E[(x_i-\mu_i)(x_j- \mu_j)]$$ $$E[y^2] \approx E[(y(\vec \mu) + \sum_{i = 1}^{N} \frac{\partial y}{\partial x_i}\big|_{\vec \mu}(x_i-\mu_i))(y(\vec \mu) + \sum_{j = 1}^{N} \frac{\partial y}{\partial x_j}\big|_{\vec \mu}(x_i-\mu_j))] = y^2(\vec \mu) + \sum_{i = 1}^{N} \sum_{j = 1}^{N} \frac{\partial y}{\partial x_i}\big|_{\vec \mu} \frac{\partial y}{\partial x_j}\big|_{\vec \mu}E[(x_i-\mu_i)(x_j- \mu_j)]$$
$$E[y^2] = y^2(\vec \mu) + \sum_{i = 1}^{N} \sum_{j = 1}^{N} \frac{\partial y}{\partial x_i}\big|_{\vec \mu} \frac{\partial y}{\partial x_j}\big|_{\vec \mu} V_{ij}$$ $$E[y^2] = y^2(\vec \mu) + \sum_{i = 1}^{N} \sum_{j = 1}^{N} \frac{\partial y}{\partial x_i}\big|_{\vec \mu} \frac{\partial y}{\partial x_j}\big|_{\vec \mu} V_{ij}$$
variance of $y$: variance of $y$:
$$\sigma^2_y = E[y^2] - E[y]^2 \approx \sum_{i = 1}^{N} \sum_{j = 1}^{N} \frac{\partial y}{\partial x_i}\big|_{\vec \mu} \frac{\partial y}{\partial x_j}\big|_{\vec \mu} V_{ij} $$ $$\sigma^2_y = E[y^2] - E[y]^2 \approx \sum_{i = 1}^{N} \sum_{j = 1}^{N} \frac{\partial y}{\partial x_i}\big|_{\vec \mu} \frac{\partial y}{\partial x_j}\big|_{\vec \mu} V_{ij} $$
%% Cell type:markdown id:38fedec1 tags: %% Cell type:markdown id:38fedec1 tags:
### Error propagation in more dimensions ### Error propagation in more dimensions
Now assume a vector function $\vec y(\vec x)= y_1(\vec x),\dots,y_M(\vec x))$: Now assume a vector function $\vec y(\vec x)= y_1(\vec x),\dots,y_M(\vec x))$:
Covariance $U_{kl}$ for $y_k$ and $y_l$: Covariance $U_{kl}$ for $y_k$ and $y_l$:
$$U_{kl} = \text{cov}[y_k, y_l] = \sum_{i = 1}^{N} \sum_{j = 1}^{N} \frac{\partial y_k}{\partial x_i}\big|_{\vec \mu} \frac{\partial y_l}{\partial x_j}\big|_{\vec \mu} V_{ij}$$ $$U_{kl} = \text{cov}[y_k, y_l] = \sum_{i = 1}^{N} \sum_{j = 1}^{N} \frac{\partial y_k}{\partial x_i}\big|_{\vec \mu} \frac{\partial y_l}{\partial x_j}\big|_{\vec \mu} V_{ij}$$
With matrix of derivatives $A$ with $A_{ij} = \frac{\partial y_i}{\partial x_j}\big|_{\vec \mu} $): With matrix of derivatives $A$ with $A_{ij} = \frac{\partial y_i}{\partial x_j}\big|_{\vec \mu} $):
$$ U = A V A^{T}$$ $$ U = A V A^{T}$$
Example: $y = x_1 + x_2$ and, hence, $A = (1, 1)$ Example: $y = x_1 + x_2$ and, hence, $A = (1, 1)$
$$U = \left(\begin{array}{rr}1 & 1\\ \end{array}\right) $$U = \left(\begin{array}{rr}1 & 1\\ \end{array}\right)
\left( \left(
\begin{array}{rr}\sigma_1^2 & V_{12} \\ V_{12} & \sigma_2^2\\ \end{array} \begin{array}{rr}\sigma_1^2 & V_{12} \\ V_{12} & \sigma_2^2\\ \end{array}
\right) \right)
\left(\begin{array}{r}1 \\ 1\\ \end{array}\right) = \left(\begin{array}{r}1 \\ 1\\ \end{array}\right) =
\left(\begin{array}{rr}\sigma_1^2 + V_{12} & V_{12}+ \sigma_2^2\\ \end{array} \left(\begin{array}{rr}\sigma_1^2 + V_{12} & V_{12}+ \sigma_2^2\\ \end{array}
\right) \left(\begin{array}{r}1 \\ 1\\ \end{array}\right) = \sigma_1^2 + \sigma_2^2 + 2V_{12}$$ \right) \left(\begin{array}{r}1 \\ 1\\ \end{array}\right) = \sigma_1^2 + \sigma_2^2 + 2V_{12}$$
Example: $y = x_1 x_2$ and, hence, $A = (x_2, x_1)$ Example: $y = x_1 x_2$ and, hence, $A = (x_2, x_1)$
$$\frac{\sigma^2_y}{y^2} = \frac{\sigma^2_1}{x_1^2} + \frac{\sigma^2_2}{x_2^2} + 2 \frac{V_{12}}{x_1 x_2}$$ $$\frac{\sigma^2_y}{y^2} = \frac{\sigma^2_1}{x_1^2} + \frac{\sigma^2_2}{x_2^2} + 2 \frac{V_{12}}{x_1 x_2}$$
%% Cell type:markdown id:9f09c066 tags: %% Cell type:markdown id:9f09c066 tags:
Now let's try a few things!!! Now let's try a few things!!!
<br> <br>
Any questions so far? Any questions so far?
%% Cell type:markdown id:9eae9199-9401-40c4-b212-ae57f1ccab38 tags: %% Cell type:markdown id:9eae9199-9401-40c4-b212-ae57f1ccab38 tags:
## Samples ## Samples
--- ---
Sample: $X = x_1, x_2,\dots, x_N$ Sample: $X = x_1, x_2,\dots, x_N$
Expectation value: Expectation value:
$$E[f(x)] = \frac{1}{N}\sum_i^N f(x_i)$$ $$E[f(x)] = \frac{1}{N}\sum_i^N f(x_i)$$
Describing samples: minimum, maximum, frequency/histogram, means, variance, standard deviation,.... Describing samples: minimum, maximum, frequency/histogram, means, variance, standard deviation,....
%% Cell type:markdown id:a0bccc47 tags: %% Cell type:markdown id:a0bccc47 tags:
### Describing samples ### Describing samples
minimum, maximum, frequency/histogram, means, variance, standard deviation,.... minimum, maximum, frequency/histogram, means, variance, standard deviation,....
Here: home and away goals in Bundesliga matches Here: home and away goals in Bundesliga matches
%% Cell type:code id:b44e356e-b829-4879-b5fc-9706fffe873d tags: %% Cell type:code id:b44e356e-b829-4879-b5fc-9706fffe873d tags:
``` python ``` python
import numpy as np import numpy as np
data = np.loadtxt('./exercises/09_data.txt') data = np.loadtxt('./exercises/09_data.txt')
data[0:9] data[0:9]
``` ```
%% Output
array([[5., 0.],
[2., 1.],
[0., 1.],
[0., 3.],
[0., 1.],
[2., 2.],
[4., 0.],
[2., 1.],
[1., 3.]])
%% Cell type:code id:2aeacb94-518d-464f-a7ab-5282b97bc225 tags: %% Cell type:code id:2aeacb94-518d-464f-a7ab-5282b97bc225 tags:
``` python ``` python
data[0:9,0] data[0:9,0]
``` ```
%% Output
array([5., 2., 0., 0., 0., 2., 4., 2., 1.])
%% Cell type:code id:1829fa78-7d7b-4bac-9f24-5129dca63629 tags: %% Cell type:code id:1829fa78-7d7b-4bac-9f24-5129dca63629 tags:
``` python ``` python
np.min(data), np.max(data) np.min(data), np.max(data)
``` ```
%% Output
(np.float64(0.0), np.float64(6.0))
%% Cell type:markdown id:caa12e94-0b72-4f60-875d-5a306a09d036 tags: %% Cell type:markdown id:caa12e94-0b72-4f60-875d-5a306a09d036 tags:
### Histograms ### Histograms
%% Cell type:code id:871f915d-09f8-4d14-a79a-8fbd2f16ab75 tags: %% Cell type:code id:871f915d-09f8-4d14-a79a-8fbd2f16ab75 tags:
``` python ``` python
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
plt.hist(data[:, 0]) plt.hist(data[:, 0])
#plt.hist(data[:, 0], bins=np.arange(-0.25,6.25,0.5)) #plt.hist(data[:, 0], bins=np.arange(-0.25,6.25,0.5))
#plt.xlabel("k") #plt.xlabel("k")
``` ```
%% Output
(array([74., 96., 0., 67., 0., 43., 13., 0., 10., 3.]),
array([0. , 0.6, 1.2, 1.8, 2.4, 3. , 3.6, 4.2, 4.8, 5.4, 6. ]),
<BarContainer object of 10 artists>)
%% Cell type:markdown id:e9327372 tags: %% Cell type:markdown id:e9327372 tags:
### Histograms ### Histograms
%% Cell type:code id:f8263279-48b5-4421-be05-604ddbfd8d6f tags: %% Cell type:code id:f8263279-48b5-4421-be05-604ddbfd8d6f tags:
``` python ``` python
plt.hist(data[:, 0], bins=np.arange(-0.25,6.25,0.5)) plt.hist(data[:, 0], bins=np.arange(-0.25,6.25,0.5))
plt.xlabel("k") plt.xlabel("k")
``` ```
%% Output
Text(0.5, 0, 'k')
%% Cell type:markdown id:61619361 tags: %% Cell type:markdown id:61619361 tags:
### Histograms ### Histograms
%% Cell type:code id:5056a717-6561-41ec-be39-1757984863a9 tags: %% Cell type:code id:5056a717-6561-41ec-be39-1757984863a9 tags:
``` python ``` python
plt.hist(data[:, 0], bins=np.arange(-0.25,6.26,0.5)) plt.hist(data[:, 0], bins=np.arange(-0.25,6.26,0.5))
plt.xlabel("k") plt.xlabel("k")
#plt.savefig("hist.pdf") #plt.savefig("hist.pdf")
plt.show() plt.show()
``` ```
%% Output
%% Cell type:code id:3f64d35c tags: %% Cell type:code id:3f64d35c tags:
``` python ``` python
plt.hist(data[:, 1], bins=np.arange(-0.25,6.26,0.5)) plt.hist(data[:, 1], bins=np.arange(-0.25,6.26,0.5))
plt.xlabel("l") plt.xlabel("l")
plt.show() plt.show()
``` ```
%% Output
%% Cell type:markdown id:4183331e-8a9f-4af5-829f-3fcb9b2abb31 tags: %% Cell type:markdown id:4183331e-8a9f-4af5-829f-3fcb9b2abb31 tags:
### Cumulated Distribution ### Cumulated Distribution
%% Cell type:code id:4ecfb198-c621-4ee4-9d24-9f9a8cb8bec7 tags: %% Cell type:code id:4ecfb198-c621-4ee4-9d24-9f9a8cb8bec7 tags:
``` python ``` python
plt.hist(data[:, 0], bins=100, cumulative=True, density = True, label="kumuliert") plt.hist(data[:, 0], bins=100, cumulative=True, density = True, label="kumuliert")
plt.xlabel("k") plt.xlabel("k")
#plt.savefig("hist2.pdf") #plt.savefig("hist2.pdf")
print("median", np.median(data[:, 0])) print("median", np.median(data[:, 0]))
``` ```
%% Output
median 1.0
%% Cell type:code id:d00cadf3-0cfa-4f59-9962-4b9e6b707e0b tags: %% Cell type:code id:d00cadf3-0cfa-4f59-9962-4b9e6b707e0b tags:
``` python ``` python
``` ```
%% Cell type:markdown id:ceb5ca27-a96a-4ee3-967b-5f02efe66540 tags: %% Cell type:markdown id:ceb5ca27-a96a-4ee3-967b-5f02efe66540 tags:
### Means ### Means
--- ---
different means: different means:
- arithmetic mean: $$ \overline{x} = E[x] = <x> = \frac{1}{n}\sum\limits_{i=1}^n x_i (= \mu)$$ - arithmetic mean: $$ \overline{x} = E[x] = <x> = \frac{1}{n}\sum\limits_{i=1}^n x_i (= \mu)$$
- geometric mean: $$ \overline{{x}}_\mathrm {geom} = \sqrt[n]{\prod\limits_{i=1}^{n}{x_i}}$$ - geometric mean: $$ \overline{{x}}_\mathrm {geom} = \sqrt[n]{\prod\limits_{i=1}^{n}{x_i}}$$
- quadratic mean: $$ \overline{{x}}_\mathrm{quadr} = \sqrt{E[x^2]} = \sqrt {\frac {1}{n} \sum\limits_{i=1}^{n}x_i^2} = \sqrt{\overline{x^2}} $$ - quadratic mean: $$ \overline{{x}}_\mathrm{quadr} = \sqrt{E[x^2]} = \sqrt {\frac {1}{n} \sum\limits_{i=1}^{n}x_i^2} = \sqrt{\overline{x^2}} $$
%% Cell type:markdown id:2503d92a tags: %% Cell type:markdown id:2503d92a tags:
### Variance ### Variance
- variance - variance
$$V = E[(x - \mu)^2]$$ $$V = E[(x - \mu)^2]$$
with $\mu = E[x] = \overline x$: with $\mu = E[x] = \overline x$:
$$V = E[(x - {\overline x}^2]$ = E[x^2 - 2x{\overline x} + {\overline x}^2] = E[x^2] - 2{\overline x}E[x] + {\overline x}^2 = E[x^2] - 2 {\overline x}^2 + {\overline x}^2 = E[x^2] - (E[x])^2$$ $$V = E[(x - {\overline x}^2]$ = E[x^2 - 2x{\overline x} + {\overline x}^2] = E[x^2] - 2{\overline x}E[x] + {\overline x}^2 = E[x^2] - 2 {\overline x}^2 + {\overline x}^2 = E[x^2] - (E[x])^2$$
for sample $X = x_1, x_2,\dots, x_N$; for sample $X = x_1, x_2,\dots, x_N$;
$$V= \frac{1}{n}\sum\limits_{i=1}^n (x_i - \overline x)^2$$ $$V= \frac{1}{n}\sum\limits_{i=1}^n (x_i - \overline x)^2$$
- standard deviation: - standard deviation:
$$\sigma = \sqrt{V}$$ $$\sigma = \sqrt{V}$$
%% Cell type:markdown id:68cea392-8d4f-48d7-aacc-0f10d46af3af tags: %% Cell type:markdown id:68cea392-8d4f-48d7-aacc-0f10d46af3af tags:
### Exercise: Compute mean and variance of $X$ ### Exercise: Compute mean and variance of $X$
%% Cell type:code id:5f98b25d tags: %% Cell type:code id:5f98b25d tags:
``` python ``` python
``` ```
%% Cell type:code id:76d92d18-1d77-40d2-a910-592183635d3b tags: %% Cell type:code id:76d92d18-1d77-40d2-a910-592183635d3b tags:
``` python ``` python
print("mean", np.mean(data, axis=0)) print("mean", np.mean(data, axis=0))
``` ```
%% Output
mean [1.56535948 1.26470588]
%% Cell type:code id:5f0f34b4-5bbe-439b-8b4e-fbb05794b790 tags: %% Cell type:code id:5f0f34b4-5bbe-439b-8b4e-fbb05794b790 tags:
``` python ``` python
print("variance", np.var(data, axis=0)) print("variance", np.var(data, axis=0))
``` ```
%% Output
variance [1.85357128 1.27306805]
%% Cell type:code id:70a1920f-beda-4154-ad77-22a9ffe2e39f tags: %% Cell type:code id:70a1920f-beda-4154-ad77-22a9ffe2e39f tags:
``` python ``` python
print("standard deviation:", np.std(data, axis=0)) print("standard deviation:", np.std(data, axis=0))
``` ```
%% Output
standard deviation: [1.36145925 1.12830317]
%% Cell type:markdown id:20d86d18 tags: %% Cell type:markdown id:20d86d18 tags:
### Exercise: compute covariance and correlation column 1 and 2 ### Exercise: compute covariance and correlation column 1 and 2
%% Cell type:code id:bf609514 tags: %% Cell type:code id:bf609514 tags:
``` python ``` python
plt.hist2d(data[:,0], data[:,1], bins=np.arange(-0.5,6.1,1)) plt.hist2d(data[:,0], data[:,1], bins=np.arange(-0.5,6.1,1))
; ;
``` ```
%% Output
''
%% Cell type:code id:6fff5415 tags: %% Cell type:code id:6fff5415 tags:
``` python ``` python
``` ```
%% Cell type:code id:751f9384 tags: %% Cell type:code id:751f9384 tags:
``` python ``` python
print(np.cov(data, rowvar=False)) print(np.cov(data, rowvar=False))
``` ```
%% Output
[[ 1.85964856 -0.1927676 ]
[-0.1927676 1.27724204]]
%% Cell type:code id:abda1c9f tags: %% Cell type:code id:abda1c9f tags:
``` python ``` python
print(np.corrcoef(data, rowvar=False)) print(np.corrcoef(data, rowvar=False))
``` ```
%% Output
[[ 1. -0.12507831]
[-0.12507831 1. ]]
%% Cell type:markdown id:eed39982 tags: %% Cell type:markdown id:eed39982 tags:
### Exercise: Compute variance of goals per match ### Exercise: Compute variance of goals per match
Compute the variance of the sum of the home and away goals per match in three ways, where $V$ is the covariance matrix on home,away goals from before: Compute the variance of the sum of the home and away goals per match in three ways, where $V$ is the covariance matrix on home,away goals from before:
- wrong error propagation $U = \sigma_1^2 + \sigma_2^2 = V_{11} + V_{22}$ - wrong error propagation $U = \sigma_1^2 + \sigma_2^2 = V_{11} + V_{22}$
<br> <br>
- correct error propagation $U = \left(\begin{array}{rr}1 & 1\\ \end{array}\right)V\left(\begin{array}{r}1 \\ 1\\ \end{array}\right)$ - correct error propagation $U = \left(\begin{array}{rr}1 & 1\\ \end{array}\right)V\left(\begin{array}{r}1 \\ 1\\ \end{array}\right)$
> You can use: `A = np.array([[1, 1]])` to define the matrix of derivatives and <br> `U=A@V@A.T`for the matrix transformation > You can use: `A = np.array([[1, 1]])` to define the matrix of derivatives and <br> `U=A@V@A.T`for the matrix transformation
<br> <br>
- directly with`np.var` - directly with`np.var`
<br> <br>
%% Cell type:code id:6b0966b7 tags: %% Cell type:code id:6b0966b7 tags:
``` python ``` python
``` ```
%% Cell type:markdown id:404bffbd tags: %% Cell type:markdown id:404bffbd tags:
What changes when you look at the goal difference? What changes when you look at the goal difference?
%% Cell type:code id:f88ff1a1 tags: %% Cell type:code id:f88ff1a1 tags:
``` python ``` python
A = np.array([[1, 1]]) A = np.array([[1, 1]])
V = np.cov(data, rowvar=False) V = np.cov(data, rowvar=False)
print(V[0,0] + V[1,1]) print(V[0,0] + V[1,1])
U = A@V@A.T U = A@V@A.T
print(U) print(U)
print(np.var(data[:,0] + data[:,1])) print(np.var(data[:,0] + data[:,1]))
print(np.var(data[:,0] - data[:,1])) print(np.var(data[:,0] - data[:,1]))
``` ```
%% Output
3.136890603235832
[[2.75135541]]
2.7423640480157205
3.510914605493613
%% Cell type:markdown id:f826b603 tags: %% Cell type:markdown id:f826b603 tags:
### Exercise: Check "functions of random variables" ### Exercise: Check "functions of random variables"
%% Cell type:markdown id:ac750c83 tags: %% Cell type:markdown id:ac750c83 tags:
Let's use pseudo-experiments/Monte Carlo: Let's use pseudo-experiments/Monte Carlo:
* generate 100.000 uniformly distributed values $u$ * generate 100.000 uniformly distributed values $u$
* make a histogram of $u$ and of $\sqrt(u)$ * make a histogram of $u$ and of $\sqrt(u)$
%% Cell type:markdown id:0d93b2f0 tags: %% Cell type:markdown id:0d93b2f0 tags:
Relatively easy with *scipy* and *numpy*: Relatively easy with *scipy* and *numpy*:
* use [numpy random generator](https://numpy.org/doc/stable/reference/random/generator.html) * use [numpy random generator](https://numpy.org/doc/stable/reference/random/generator.html)
<br> <br>
or or
<br> <br>
* use [`scipy.stats`](https://docs.scipy.org/doc/scipy/reference/stats.html) * use [`scipy.stats`](https://docs.scipy.org/doc/scipy/reference/stats.html)
* use [`scipy.stats.norm`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.uniform.html) class * use [`scipy.stats.norm`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.uniform.html) class
%% Cell type:code id:b424b5b0 tags: %% Cell type:code id:b424b5b0 tags:
``` python ``` python
import numpy as np import numpy as np
rng = np.random.default_rng(12345) rng = np.random.default_rng(12345)
rfloat = rng.random() rfloat = rng.random()
print(rfloat) print(rfloat)
``` ```
%% Output
0.22733602246716966
%% Cell type:code id:785734fb tags: %% Cell type:code id:785734fb tags:
``` python ``` python
u = rng.random(100000) u = rng.random(100000)
print(u) print(u)
plt.hist(u,bins=100, histtype='step') plt.hist(u,bins=100, histtype='step')
plt.hist(np.sqrt(u), bins=100, histtype='step') plt.hist(np.sqrt(u), bins=100, histtype='step')
; ;
``` ```
%% Output
[0.24660218 0.23393333 0.78231662 ... 0.48546766 0.51880662 0.3569082 ]
''
%% Cell type:markdown id:fd133d10 tags:
# What is meant with error/uncertainty on a measured quantity?
%% Cell type:markdown id:d369a48b tags:
If we quote $a = 1 \pm 0.5$, we usually mean that the probability for the *true* value of $a$ is Gaussian $G(a, \mu, \sigma)$ distributed with $\mu = 1$ and $\sigma = 0.5$.
%% Cell type:markdown id:a56aa4e8 tags:
# How often can/should the measurement be outside one sigma?
%% Cell type:markdown id:cf26407a tags:
Let's use pseudo-experiments/Monte Carlo:
* generate 10.000 Gaussian distributed measurements
* count how ofter they differ by more than one sigma
%% Cell type:markdown id:0ab4af7a tags:
Relatively easy with *scipy* and *numpy*:
* use [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html)
* use [scipy.stats.norm](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html) class
%% Cell type:code id:d0adc358 tags:
``` python
import scipy.stats as stats
import numpy as np
pseudo_a = stats.norm.rvs(1, 0.5, 10000)
print(pseudo_a)
is_outside = abs(pseudo_a - 1) > 0.5
print(is_outside)
print("fraction outside one sigma:", sum(is_outside)/len(pseudo_a))
```
%% Output
[0.62483765 1.14886134 0.26443731 ... 1.71203625 1.6365167 1.52227868]
[False False True ... True True True]
fraction outside one sigma: 0.3178
%% Cell type:markdown id:fa7c75ae tags:
# Why is it a Gaussian
%% Cell type:markdown id:bf5c16d9 tags:
Central limit theorem:
"let $X_{1},X_{2},\dots ,X_{n}$ denote a statistical sample of size $n$ from a population with expected value (average) $\mu$ and finite positive variance $\sigma ^{2}$, and let $\bar {X_{n}}$ denote the sample mean (which is itself a random variable). Then the limit as $n\to \infty$ of the distribution of $\frac {({\bar {X}}_{n}-\mu )}{\frac {\sigma }{\sqrt {n}}}$, is a normal distribution with mean 0 and variance 1."
%% Cell type:code id:4823f38f tags: %% Cell type:code id:4823f38f tags:
``` python ``` python
``` ```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment