Skip to content
Snippets Groups Projects
Commit d46368a3 authored by Hartmut Stadie's avatar Hartmut Stadie
Browse files

move code blocks

parent 4a359dcd
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id:a374c69f tags:
# Lecture 1
%% Cell type:code id:76d92d18-1d77-40d2-a910-592183635d3b tags:
``` python
import numpy as np
data = np.loadtxt('./09_data.txt')
data[0:9]
print("mean", np.mean(data, axis=0))
```
%% Output
mean [1.56535948 1.26470588]
%% Cell type:code id:5f0f34b4-5bbe-439b-8b4e-fbb05794b790 tags:
``` python
print("variance", np.var(data, axis=0))
```
%% Output
variance [1.85357128 1.27306805]
%% Cell type:code id:70a1920f-beda-4154-ad77-22a9ffe2e39f tags:
``` python
print("standard deviation:", np.std(data, axis=0))
```
%% Output
standard deviation: [1.36145925 1.12830317]
%% Cell type:markdown id:b70576a2 tags:
### Covariance
%% Cell type:code id:8643c0a4 tags:
``` python
print(np.cov(data, rowvar=False))
```
%% Output
[[ 1.85964856 -0.1927676 ]
[-0.1927676 1.27724204]]
%% Cell type:code id:83699a61 tags:
``` python
print(np.corrcoef(data, rowvar=False))
```
%% Output
[[ 1. -0.12507831]
[-0.12507831 1. ]]
%% Cell type:markdown id:844b9baa tags:
### Error propagation
%% Cell type:code id:e7bd0366 tags:
``` python
A = np.array([[1, 1]])
V = np.cov(data, rowvar=False)
print(V[0,0] + V[1,1])
U = A@V@A.T
print(U)
print(np.var(data[:,0] + data[:,1]))
print(np.var(data[:,0] - data[:,1]))
```
%% Output
3.136890603235832
[[2.75135541]]
2.7423640480157205
3.510914605493613
%% Cell type:markdown id:7fe1a83d tags:
### Transformation
%% Cell type:code id:545cd923 tags:
``` python
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng(12345)
rfloat = rng.random()
print(rfloat)
u = rng.random(100000)
print(u)
plt.hist(u,bins=100, histtype='step')
plt.hist(np.sqrt(u), bins=100, histtype='step')
;
```
%% Output
0.22733602246716966
[0.31675834 0.79736546 0.67625467 ... 0.7802251 0.2300369 0.88856197]
''
%% Cell type:code id:03c8dd26 tags:
``` python
```
%% Cell type:markdown id:87c9d786-6d5c-4366-9701-5fa127727caa tags:
# Lecture 1
---
## Basic statistics
#
<br>
Material: [https://gitlab.rrz.uni-hamburg.de/BAN1966/statlecture](https://gitlab.rrz.uni-hamburg.de/BAN1966/statlecture)
<br>
Hartmut Stadie
hartmut.stadie@uni-hamburg.de
%% Cell type:markdown id:754b7855 tags:
### Bibliography
<br>
* Glen Cowan, Statistical Data Analysis,
[pdf](https://www.sherrytowers.com/cowan_statistical_data_analysis.pdf)
<br>
* Roger John Barlow, Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences, [lecture notes](https://arxiv.org/pdf/1905.12362.pdf)
<br>
* Volker Blobel, Erich Lohrmann, Statistische und numerische Methoden der Datenanalyse,[pdf](https://www.desy.de/~sschmitt/blobel/eBuch.pdf)
%% Cell type:markdown id:a3347273 tags:
## Probability
### Probability
When is a system *random*?
The degree of randomness can a quantified with the concept of *probability*.
Consider the sample space $S$:
Axioms by Kolmogorov:
- for every subset $A$ in $S$,
$P(A) \ge 0$
- for two disjoint subsets $A$ and $B$
($A \cap B = \emptyset$),
$P(A \cup B) = P(A) + P(B)$
- for the whole sample space $S$,
$P(S) = 1$
*random variable*: a variable that takes on a specific value for each element of $S$
%% Cell type:markdown id:a7de85eb tags:
### Conditional probability
%% Cell type:markdown id:67a031e2 tags:
Definition: **Conditional probability**
Conditional probability for two subsets $A$ and $B$ in $S$:
$P(A|B) = \dfrac{P(A \cap B)}{P(B)}$
Definition: **independence**
Two subsets $A$ and $B$ in $S$ are independent, if
$P(A \cap B) = P(A) P(B)$
If $A$ and $B$ are independent:
$P(A|B) = P(A)$ and $P(B|A) = P(B)$
%% Cell type:markdown id:eb310ab8 tags:
<img src="./figures/08/Conditional_probability.png" />
CC0, via Wikimedia Commons
%% Cell type:markdown id:8a49042a tags:
### Bayes' theorem
%% Cell type:markdown id:a6fd8465 tags:
$P(A|B) = \dfrac{P(A \cap B)}{P(B)}$
$P(B|A) = \dfrac{P(B \cap A)}{P(A)}$
Theorem: $$P(A|B) = \dfrac{P(B|A) P(A)}{P(B)}$$
%% Cell type:markdown id:72effe84 tags:
<img src="./figures/08/bayes.gif" style="width:80.0%" />
most likely not Bayes
%% Cell type:markdown id:deb1f10f tags:
### Classic example
Test for disease: Prior: $P(\text{disease}) = 0.001$;
$P(\text{no disease}) = 0.999$
Test: $P(+|\text{disease}) = 0.98$;
$P(+|\text{no disease}) = 0.03$
What is the implication of a positive test result?
$$\begin{aligned}
P(\text{disease}|+) &= & \dfrac{P(+|\text{disease}) P(\text{disease})}{P(+)} \\
& = &\dfrac{P(+|\text{disease}) P(\text{disease})}{P(+|\text{disease})P(\text{disease})+P(+|\text{no disease})P(\text{no disease})}\\
& = &\dfrac{0.98\cdot 0.001}{0.98\cdot 0.001 + 0.03\cdot 0.999}\\
& = &0.32\\
\end{aligned}$$
%% Cell type:markdown id:c9c11bcb tags:
## Interpretation of probability
### Interpretations of probability
%% Cell type:markdown id:5104a1d9 tags:
**Objective interpretation**:
Probability as a relative frequency:
$$P(A) = \lim_{n\to\infty} \dfrac{\text{number of occurrences of outcome $A$ in $n$ measurements}}{n}$$
(*frequentist)*
%% Cell type:markdown id:389d7d78 tags:
**Subjective interpretation**:
$$P(A) = \text{degree of belief that hypotheses $A$ is true}$$
Typical example:
$$P(\text{theory}|\text{data}) \propto P(\text{data}|\text{theory}) P(\text{theory})$$
(*Bayesian*)
%% Cell type:markdown id:b8ab42a5-b7af-4942-9a7a-91b73e0ce625 tags:
# Probability Density Functions
Single continuous variable $x$ which describes the outcome of an experiment:
probability density function (p.d.f.) $f(x)$:
- probability to observe $x$ in the interval $[x, x + dx]$:
$f(x)\,dx$
- normalization $$\int_S f(x)\,dx = 1$$
Cumulative distribution (cumulative density function (cdf)) $F(x)$:<br>
(mathematics: distribution function)
- probability to observe values less of equal than $x$:
$$F(x) = \int_{-\infty}^x f(x^\prime)\,dx^\prime$$
%% Cell type:markdown id:13563524-0626-4a69-a6ab-f87d7f19a016 tags:
### Example
$$P(a \le x \le b) = \int_a^b f(x)\,dx = F(b) - F(a)$$
%% Cell type:markdown id:99c06502 tags:
# Probability Density Functions
%% Cell type:markdown id:165ca1d8 tags:
<img src="./figures/08/Lognormal_distribution_PDF.svg" width="150%" />
%% Cell type:markdown id:7b97cfc3 tags:
<img src="./figures/08/CDF-log_normal_distributions.svg" width="75%" />
%% Cell type:markdown id:8eb9fa5b tags:
### Quantiles
%% Cell type:markdown id:8e7b412c tags:
Quantile $x_\alpha$ is the value of the random variable $x$ with
$$F(x_\alpha) = \int_{-\infty}^{x_\alpha} f(x)\,dx = \alpha$$
Hence: $$x_\alpha = F^{-1}(\alpha)$$
Median: $x_{\frac{1}{2}}$
$$F(x_{\frac{1}{2}}) = 0.5$$ $$x_{\frac{1}{2}} = F^{-1}(0.5)$$
%% Cell type:markdown id:14e4b782-9573-4e87-89d0-901dade25538 tags:
<img src="./figures/09/Normalverteilung.png" width=80% alt="image" />
%% Cell type:markdown id:1258f494 tags:
### Joint probability densisty function
%% Cell type:markdown id:783ab1f3 tags:
Example: outcome of a measurement characterized by two continuous variables $x,y$. <br>
Event $A$: $x$ observed in $[x, x + dx]$, y anywhere<br>
Event $B$: $y$ observed in $[y, y + dy]$, x anywhere
$$P(A \cap B) = \text{probability for $x$ in $[x, x + dx]$ and $y$ in $[y, y + dy]$} = f(x, y)\,dxdy$$
Marginal p.d.f.: $$f_x(x) = \int_{-\infty}^\infty f(x,y)\,dy$$
$$f_y(y) = \int_{-\infty}^\infty f(x,y)\,dx$$
%% Cell type:markdown id:8e74c511-b730-4781-a751-e1895983cc4c tags:
<img src="./figures/09/W_top.png" width="150%" alt="image" />
%% Cell type:markdown id:8f2cf729 tags:
### Marginal pdfs
%% Cell type:markdown id:438662b0 tags:
<img src="./figures/09/W.png" style="width:47.0%"
alt="image" />
%% Cell type:markdown id:55461265 tags:
<img src="./figures/09/top.png" style="width:47.0%"
alt="image" />
%% Cell type:markdown id:c88542a8 tags:
### Conditional p.d.f.
%% Cell type:markdown id:ec5c20d4 tags:
Conditional p.d.f.: $$g(x|y) = \frac{f(x,y)}{f_y(y)}$$
$$h(y|x) = \frac{f(x,y)}{f_x(x)}$$
%% Cell type:markdown id:7624d391-9fd5-4567-bf84-632eead1bb20 tags:
<img src="./figures/09/top_cond.png" width="150%"
alt="image" />
%% Cell type:markdown id:2e0275c5 tags:
### Bayes' theorem
%% Cell type:markdown id:76ce2f83 tags:
$g(x|y) = \frac{f(x,y)}{f_y(y)}$ and
$h(y|x) = \frac{f(x,y)}{f_x(x)}$
Theorem: $$g(x|y) = \frac{h(y|x) f_x(x)}{f_y(y)}$$
With $f(x,y) = h(y|x) f_x(x) = g(x|y) f_y(y)$:
$$f_x(x) = \int_{-\infty}^\infty g(x|y) f_y(y)\,dy$$
$$f_y(y) = \int_{-\infty}^\infty h(y|x) f_x(x)\,dy$$
%% Cell type:markdown id:ba8fae9b-636b-4b7e-a36d-3be582c000f5 tags:
<img src="./figures/08/bayes.gif" style="width:80.0%" />
most likely not Bayes
%% Cell type:markdown id:fdd11743-1e7b-48b6-9e9b-41bee7d4f23b tags:
## Functions of random variables
### functions of random variables
Let $x$ be a random variable, $f(x)$ its p.d.f. and
$a(x)$ a continuous function:
What is the p.d.f $g(a)$?
equal probability for $x$ in $[x, x+dx]$ and $a$ in $[a, a+da]$:
$$g(a) da = \int_{dS} f(x)\,dx$$
if $x(a)$ (the inverse of $a(x)$) exists:
$$g(a) da = \left| \int_{x(a)}^{x(a +da)} f(x^\prime)\,dx^\prime \right| = \int_{x(a)}^{x(a) + |\frac{dx}{da}|da} f(x^\prime)\,dx^\prime$$
or
$$g(a) = f(x(a)) \left|\frac{dx}{da}\right|$$
%% Cell type:markdown id:525847ad-2ddf-468c-883f-35bc853bac8c tags:
### Examples
- example 1:
For $x$ equally distributed between 0 and 1, p.d.f. of $x$: $u(x) = 1$ and $a(x) = \sqrt{x}$, $x(a) = a^2$<br>
p.d.f. $g(a)$:
$$g(a) = u(x(a)) \left|\frac{dx}{da}\right| = 1 \cdot \left|\frac{da^2}{da}\right| = 2a \text{ (linearly distributed)}$$
<br>
- example 2:
For $x$ equally distributed between 0 and 1, p.d.f. of $x$: $u(x) = 1$ and $a(x) = F^{-1}(x)$, $x(a) = F(a)$<br>
p.d.f. $g(a)$:
$$g(a) = u(x(a)) \left|\frac{dx}{da}\right| = 1 \cdot \left|\frac{dF(a)}{da}\right| = f(a$$
%% Cell type:markdown id:0c669bab-6c81-45fb-a506-8ef709cd4687 tags:
### Functions of vectors of random variables
Let $\vec x$ be vector of random variables, $f(\vec x)$ the p.d.f. and $\vec a(\vec x)$ a continuous function:
What is the p.d.f. $g(\vec a)$?
$$g(\vec a) = f(\vec x) \left| J \right| \text{, where $\left| J \right|$ is the absolute value of Jacobian determinant of } J =
\begin{array}{rrrr}
\frac{\partial x_1}{\partial a_1} & \frac{\partial x_1}{\partial a_2} & \dots & \frac{\partial x_1}{\partial a_m} \\[6pt]
\frac{\partial x_2}{\partial a_1} & \frac{\partial x_2}{\partial a_2} & \dots & \frac{\partial x_2}{\partial a_m} \\[6pt]
\vdots & \vdots & \ddots & \vdots \\[6pt]
\frac{\partial x_n}{\partial a_1} & \frac{\partial x_n}{\partial a_2} & \dots & \frac{\partial x_n}{\partial a_m} \\[6pt]
\end{array}$$
%% Cell type:markdown id:2907584f-bb17-463e-b084-749f2011bd4c tags:
### Expectation value and moments
- **Definition:**
expectation value of the function $h(x)$ for a p.d.f. $f(x)$:
$$E[h] = \int_{-\infty}^{\infty} h(x) \, f(x)\,dx$$
- **special case:** $h(x) = x$:
$$E[x] = \int_{-\infty}^{\infty} x \, f(x)\,dx = <x>$$
$E[x]$ is called the population mean or just mean, $\bar x$ or $\mu$.
- Expectation value is a linear operator:
$$E[a\cdot g(x) + b \cdot h(x)] = a\cdot E[g(x)] + b\cdot E[h(x)]$$
- $n$th moment:
$$E[x^n] = \int_{-\infty}^{\infty} x \, f(x)\,dx$$
- $n$th central moment:
$$E[(x - E[x])^n] = E[(x-\mu)^n] = \int_{-\infty}^{\infty} x \, f(x)\,dx$$
%% Cell type:markdown id:4b7f6667-c019-4b23-83fd-a1758d748762 tags:
## Grundbegriffe
### Grundbegriffe
Diskrete Zufallsvariable Mittelwert:
$$<r> = \bar r = \sum _{i=1}^N r_i P(r_i)$$
Kontinuierliche Zufallsvariable Wahrscheinlichkeitsdichte $f(x)$ mit
- $P(a \leq x \leq b) = \int_a^b f(x)\,dx$
- $f(x) \geq 0$
- $\int_{-\infty}^{\infty} f(x)\,dx = 1$
Mittelwert:
$$<x> = \bar x = \int_{-\infty}^{\infty} x \, f(x)\,dx = \mu_x$$
%% Cell type:markdown id:5cd273e7-3129-454b-a31c-4e91c4316bf4 tags:
### Variance and standard deviation
variance $V[x]$:
- measure for the width of a p.d.f.
- second central moment
- definition:
$$V[x] = E[(x - \mu_x)^2] = \int_{-\infty}^{\infty} (x-\mu_x)^2 \, f(x)\,dx$$
- useful relations:
$$V = E[(x - \mu)^2] = E[x^2 - 2x\mu + \mu^2] = E[x^2] - 2\mu E[x] + \mu^2 = E[x^2] - 2 \mu^2 + \mu^2 = E[x^2] - (E[x])^2$$
$$V[ax] = a^2 V[x]$$
%% Cell type:markdown id:30d01c8b-50a5-4d90-8e0b-89a68f52f9bd tags:
### Variance and standard deviation
standard deviation $\sigma$:
- measure for the variation of a random variable around its mean
<br>
- in physics: “the error”
<br>
- definition $$\sigma = \sqrt{V[x]}$$<br>
%% Cell type:markdown id:ae67472e-f9b2-46fa-9679-15553d31aaaf tags:
### Covariance
- covariance $V_{xy}$ for two random variables $x$ and $y$ with p.d.f. $f(x,y)$:
$$V_{xy} = E[(x - \mu_x)(y - \mu_y)] = E[xy] - \mu_x \mu_y$$
$$V_{xy} = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} xy\, f(x, y)\,dx \,dy - \mu_x\mu_y$$
- Covariance $V_{ab} = \text{cov}[a, b]$ of two $a$ und $b$ functions of the random vector $\vec x$:
$$\text{cov}[a, b] = E[(a - \mu_a)(b - \mu_b)] = E[ab] - \mu_a \mu_b$$
$$\text{cov}[a, b] = \int_{-\infty}^{\infty} \dots \int_{-\infty}^{\infty} a(x) b(x)\, f(\vec x)\,dx_1 \dots \,dx_n - \mu_a\mu_b$$
%% Cell type:markdown id:e9c61f18 tags:
### Covariance
- covariance: $$\text{cov}(x,y) = E[(x-E[x])(y-E[y])] = E[xy] - E[x]E[y]$$
for samples $X = x_1, x_2,\dots, x_N$ and $Y = y_1, y_2,\dots, y_N$:
$$\text{cov}(X,Y) = \frac{1}{n}\sum\limits_{i=1}^n (x_i - \overline x)(y_i - \overline y)$$
- correlation $$\rho_{xy} = \frac{\text{cov}(X,Y)}{\sqrt{\text{cov}(X,X)\text{cov}(Y,Y)}}$$
%% Cell type:markdown id:576b4570-8a06-4224-b867-31459483e6bb tags:
### Covariance matrix
$$C = \left(
\begin{array}{rr}
V_{xx} & V_{xy} \\
V_{yx} & V_{yy}\\
\end{array}
\right)$$
Remarks:
- sometimes called error matrix
<br>
- $V_{xy} = V_{yx}$, matrix is symmetric
<br>
- $V_{ii} > 0$, matrix is positive (semi)definite
<br>
- correlation matrix: $$C^\prime = \left(
\begin{array}{rr}
V_{xx}/V_{xx} & V_{xy}/\sqrt{V_{xx}V_{yy}} \\
V_{xy}/\sqrt{V_{xx}V_{yy}} & V_{yy}/V_{yy}\\
\end{array}
\right) = \left(
\begin{array}{rr}
1 & \rho_{xy} \\
\rho_{xy} & 1\\
\end{array}
\right)$$
- correlation coefficient:
$$\rho_{xy} = \frac{V_{xy}}{\sqrt{V_{xx}V_{yy}}}$$
%% Cell type:markdown id:c3b63f0e tags:
### Error propagation
Suppose we have a random vector $\vec x$ distributed according to joint p.d.f. $f(\vec x)$ with mean values $\vec \mu$ and covariance matrix $V$:
What is the variance of the function $y(\vec x)$?
Expand $y$ around $x = \vec \mu$:
$$y(x) \approx y(\vec \mu) + \sum_{i = 1}^{N} \frac{\partial y}{\partial x_i}\big|_{\vec \mu}(x_i-\mu_i)$$
Expectation value of $y$:
$$E[y] \approx y(\vec \mu)$$
Expectation value of $y^2$:
$$E[y^2] \approx E[(y(\vec \mu) + \sum_{i = 1}^{N} \frac{\partial y}{\partial x_i}\big|_{\vec \mu}(x_i-\mu_i))(y(\vec \mu) + \sum_{j = 1}^{N} \frac{\partial y}{\partial x_j}\big|_{\vec \mu}(x_i-\mu_j))] = y^2(\vec \mu) + \sum_{i = 1}^{N} \sum_{j = 1}^{N} \frac{\partial y}{\partial x_i}\big|_{\vec \mu} \frac{\partial y}{\partial x_j}\big|_{\vec \mu}E[(x_i-\mu_i)(x_j- \mu_j)]$$
$$E[y^2] = y^2(\vec \mu) + \sum_{i = 1}^{N} \sum_{j = 1}^{N} \frac{\partial y}{\partial x_i}\big|_{\vec \mu} \frac{\partial y}{\partial x_j}\big|_{\vec \mu} V_{ij}$$
variance of $y$:
$$\sigma^2_y = E[y^2] - E[y]^2 \approx \sum_{i = 1}^{N} \sum_{j = 1}^{N} \frac{\partial y}{\partial x_i}\big|_{\vec \mu} \frac{\partial y}{\partial x_j}\big|_{\vec \mu} V_{ij} $$
%% Cell type:markdown id:38fedec1 tags:
### Error propagation in more dimensions
Now assume a vector function $\vec y(\vec x)= y_1(\vec x),\dots,y_M(\vec x))$:
Covariance $U_{kl}$ for $y_k$ and $y_l$:
$$U_{kl} = \text{cov}[y_k, y_l] = \sum_{i = 1}^{N} \sum_{j = 1}^{N} \frac{\partial y_k}{\partial x_i}\big|_{\vec \mu} \frac{\partial y_l}{\partial x_j}\big|_{\vec \mu} V_{ij}$$
With matrix of derivatives $A$ with $A_{ij} = \frac{\partial y_i}{\partial x_j}\big|_{\vec \mu} $):
$$ U = A V A^{T}$$
Example: $y = x_1 + x_2$ and, hence, $A = (1, 1)$
$$U = \left(\begin{array}{rr}1 & 1\\ \end{array}\right)
\left(
\begin{array}{rr}\sigma_1^2 & V_{12} \\ V_{12} & \sigma_2^2\\ \end{array}
\right)
\left(\begin{array}{r}1 \\ 1\\ \end{array}\right) =
\left(\begin{array}{rr}\sigma_1^2 + V_{12} & V_{12}+ \sigma_2^2\\ \end{array}
\right) \left(\begin{array}{r}1 \\ 1\\ \end{array}\right) = \sigma_1^2 + \sigma_2^2 + 2V_{12}$$
Example: $y = x_1 x_2$ and, hence, $A = (x_2, x_1)$
$$\frac{\sigma^2_y}{y^2} = \frac{\sigma^2_1}{x_1^2} + \frac{\sigma^2_2}{x_2^2} + 2 \frac{V_{12}}{x_1 x_2}$$
%% Cell type:markdown id:9f09c066 tags:
Now let's try a few things!!!
<br>
Any questions so far?
%% Cell type:markdown id:9eae9199-9401-40c4-b212-ae57f1ccab38 tags:
## Samples
---
Sample: $X = x_1, x_2,\dots, x_N$
Expectation value:
$$E[f(x)] = \frac{1}{N}\sum_i^N f(x_i)$$
Describing samples: minimum, maximum, frequency/histogram, means, variance, standard deviation,....
%% Cell type:markdown id:a0bccc47 tags:
### Describing samples
minimum, maximum, frequency/histogram, means, variance, standard deviation,....
Here: home and away goals in Bundesliga matches
%% Cell type:code id:b44e356e-b829-4879-b5fc-9706fffe873d tags:
``` python
import numpy as np
data = np.loadtxt('./exercises/09_data.txt')
data[0:9]
```
%% Cell type:code id:2aeacb94-518d-464f-a7ab-5282b97bc225 tags:
``` python
data[0:9,0]
```
%% Cell type:code id:1829fa78-7d7b-4bac-9f24-5129dca63629 tags:
``` python
np.min(data), np.max(data)
```
%% Cell type:markdown id:caa12e94-0b72-4f60-875d-5a306a09d036 tags:
### Histograms
%% Cell type:code id:871f915d-09f8-4d14-a79a-8fbd2f16ab75 tags:
``` python
import matplotlib.pyplot as plt
plt.hist(data[:, 0])
#plt.hist(data[:, 0], bins=np.arange(-0.25,6.25,0.5))
#plt.xlabel("k")
```
%% Cell type:markdown id:e9327372 tags:
### Histograms
%% Cell type:code id:f8263279-48b5-4421-be05-604ddbfd8d6f tags:
``` python
plt.hist(data[:, 0], bins=np.arange(-0.25,6.25,0.5))
plt.xlabel("k")
```
%% Cell type:markdown id:61619361 tags:
### Histograms
%% Cell type:code id:5056a717-6561-41ec-be39-1757984863a9 tags:
``` python
plt.hist(data[:, 0], bins=np.arange(-0.25,6.26,0.5))
plt.xlabel("k")
#plt.savefig("hist.pdf")
plt.show()
```
%% Cell type:code id:3f64d35c tags:
``` python
plt.hist(data[:, 1], bins=np.arange(-0.25,6.26,0.5))
plt.xlabel("l")
plt.show()
```
%% Cell type:markdown id:4183331e-8a9f-4af5-829f-3fcb9b2abb31 tags:
### Cumulated Distribution
%% Cell type:code id:4ecfb198-c621-4ee4-9d24-9f9a8cb8bec7 tags:
``` python
plt.hist(data[:, 0], bins=100, cumulative=True, density = True, label="kumuliert")
plt.xlabel("k")
#plt.savefig("hist2.pdf")
print("median", np.median(data[:, 0]))
```
%% Cell type:code id:d00cadf3-0cfa-4f59-9962-4b9e6b707e0b tags:
``` python
```
%% Cell type:markdown id:ceb5ca27-a96a-4ee3-967b-5f02efe66540 tags:
### Means
---
different means:
- arithmetic mean: $$ \overline{x} = E[x] = <x> = \frac{1}{n}\sum\limits_{i=1}^n x_i (= \mu)$$
- geometric mean: $$ \overline{{x}}_\mathrm {geom} = \sqrt[n]{\prod\limits_{i=1}^{n}{x_i}}$$
- quadratic mean: $$ \overline{{x}}_\mathrm{quadr} = \sqrt{E[x^2]} = \sqrt {\frac {1}{n} \sum\limits_{i=1}^{n}x_i^2} = \sqrt{\overline{x^2}} $$
%% Cell type:markdown id:2503d92a tags:
### Variance
- variance
$$V = E[(x - \mu)^2]$$
with $\mu = E[x] = \overline x$:
$$V = E[(x - {\overline x}^2]$ = E[x^2 - 2x{\overline x} + {\overline x}^2] = E[x^2] - 2{\overline x}E[x] + {\overline x}^2 = E[x^2] - 2 {\overline x}^2 + {\overline x}^2 = E[x^2] - (E[x])^2$$
for sample $X = x_1, x_2,\dots, x_N$;
$$V= \frac{1}{n}\sum\limits_{i=1}^n (x_i - \overline x)^2$$
- standard deviation:
$$\sigma = \sqrt{V}$$
%% Cell type:markdown id:68cea392-8d4f-48d7-aacc-0f10d46af3af tags:
### Exercise: Compute mean and variance of $X$
%% Cell type:code id:5f98b25d tags:
``` python
```
%% Cell type:code id:76d92d18-1d77-40d2-a910-592183635d3b tags:
``` python
print("mean", np.mean(data, axis=0))
```
%% Cell type:code id:5f0f34b4-5bbe-439b-8b4e-fbb05794b790 tags:
``` python
print("variance", np.var(data, axis=0))
```
%% Cell type:code id:70a1920f-beda-4154-ad77-22a9ffe2e39f tags:
``` python
print("standard deviation:", np.std(data, axis=0))
```
%% Cell type:markdown id:20d86d18 tags:
### Exercise: compute covariance and correlation column 1 and 2
use `np.cov` and `np.corrcoef`
%% Cell type:code id:bf609514 tags:
``` python
plt.hist2d(data[:,0], data[:,1], bins=np.arange(-0.5,6.1,1))
;
```
%% Cell type:code id:6fff5415 tags:
``` python
```
%% Cell type:code id:751f9384 tags:
``` python
print(np.cov(data, rowvar=False))
```
%% Cell type:code id:abda1c9f tags:
``` python
print(np.corrcoef(data, rowvar=False))
```
%% Cell type:markdown id:eed39982 tags:
### Exercise: Compute variance of goals per match
Compute the variance of the sum of the home and away goals per match in three ways, where $V$ is the covariance matrix on home,away goals from before:
- wrong error propagation $U = \sigma_1^2 + \sigma_2^2 = V_{11} + V_{22}$
<br>
- correct error propagation $U = \left(\begin{array}{rr}1 & 1\\ \end{array}\right)V\left(\begin{array}{r}1 \\ 1\\ \end{array}\right)$
> You can use: `A = np.array([[1, 1]])` to define the matrix of derivatives and <br> `U=A@V@A.T`for the matrix transformation
<br>
- directly with`np.var`
<br>
%% Cell type:code id:6b0966b7 tags:
``` python
```
%% Cell type:markdown id:404bffbd tags:
What changes when you look at the goal difference?
%% Cell type:code id:f88ff1a1 tags:
``` python
A = np.array([[1, 1]])
V = np.cov(data, rowvar=False)
print(V[0,0] + V[1,1])
U = A@V@A.T
print(U)
print(np.var(data[:,0] + data[:,1]))
print(np.var(data[:,0] - data[:,1]))
```
%% Cell type:markdown id:f826b603 tags:
### Exercise: Check "functions of random variables"
%% Cell type:markdown id:ac750c83 tags:
Let's use pseudo-experiments/Monte Carlo:
* generate 100.000 uniformly distributed values $u$
* make a histogram of $u$ and of $\sqrt(u)$
%% Cell type:markdown id:0d93b2f0 tags:
Relatively easy with *scipy* and *numpy*:
* use [numpy random generator](https://numpy.org/doc/stable/reference/random/generator.html)
<br>
or
<br>
* use [`scipy.stats`](https://docs.scipy.org/doc/scipy/reference/stats.html)
* use [`scipy.stats.norm`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.uniform.html) class
%% Cell type:code id:b424b5b0 tags:
``` python
import numpy as np
rng = np.random.default_rng(12345)
rfloat = rng.random()
print(rfloat)
```
%% Cell type:code id:785734fb tags:
``` python
u = rng.random(100000)
print(u)
plt.hist(u,bins=100, histtype='step')
plt.hist(np.sqrt(u), bins=100, histtype='step')
;
```
%% Cell type:code id:4823f38f tags:
``` python
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment