## What is meant with error/uncertainty on a measured quantity?
%% Cell type:markdown id:ffd68924 tags:
If we quote $a = 1 \pm 0.5$, we usually mean that the probability for the *true* value of $a$ is Gaussian $G(a, \mu, \sigma)$ distributed with $\mu = 1$ and $\sigma = 0.5$.
%% Cell type:markdown id:f8037279 tags:
### Exercise: How often can/should the measurement be outside one sigma?
* count how ofter they differ by more than one sigma
Relatively easy with *scipy* and *numpy*:
* use [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html)
* use [scipy.stats.norm](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html) class
%% Cell type:code id:9b4557d9 tags:
``` python
```
%%Celltype:markdownid:a92fa7d1tags:
importscipy.statsasstats
importnumpyasnp
pseudo_a=stats.norm.rvs(1,0.5,10000)
print(pseudo_a)
is_outside=abs(pseudo_a-1)>0.5
print(is_outside)
print("fraction outside one sigma:",sum(is_outside)/len(pseudo_a))
%%Celltype:markdownid:54d58609tags:
### Why is it a Gaussian?
Centrallimittheorem:
"let $X_{1},X_{2},\dots ,X_{n}$ denote a statistical sample of size $n$ from a population with expected value (average) $\mu$ and finite positive variance $\sigma ^{2}$, and let $\bar {X_{n}}$ denote the sample mean (which is itself a random variable). Then the limit as $n\to \infty$ of the distribution of $\frac {({\bar {X}}_{n}-\mu )}{\frac {\sigma }{\sqrt {n}}}$, is a normal distribution with mean 0 and variance 1."
- compute the confidence for the interval $[0, 2]$ numerically with [scipy.integrate.quad](https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.quad.html)
### Let's find a better interval: $[\mu_-, \mu_+]$
here:
$\mu_- = 0$
find $\mu_+$, so that the integral from $[0,\mu_+]$ is $0.6827$:
- define a function that returns the integral for a given $mu_+$
- plot the function
- use root finding with [scipy.optimize.brentq](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.brentq.html) to find the value of $\mu_+$ where the integral minus 0.6827 is 0.
Hypothesis: "The $k_i$ goals in each Bundesliga match $i$ are Poisson distributed with a common parameter $\mu = <k>$."
We need an alternative Hypothesis for the test: "The goals in each Bundesliga match $k_i$ are Poisson distributed with parameter $\mu_i = ki$ for each match."
Errors of first and second kind:
* error of first kind: The hypothesis is true, but is rejected.
significance: $\alpha$
specificity: $1-\alpha$ (efficiency)
* error of the second kind: The hypothesis is accepted, but is wrong (false positive).
probability for error: $\beta$
power: $1-\beta$
%% Cell type:markdown id:644cb69d tags:
### Example
Gaussian distributed random variable $x$ ($\sigma = 1$)
For Gaussian: $-2 \ln\frac{P(H}{P(A)}$ follows $\chi^2$
Wilk's Theorem:
$-2 \ln\frac{P(H}{P(A)}$ approaches a $\chi^2$ distribution (asymptotic limit) with $n$ degrees of freedom $n = \text{number of data points} - \text{number of model parameters}$
%% Cell type:markdown id:f09138b6 tags:
%% Cell type:markdown id:fb443d99 tags:
### Let's try it
- use $\chi^2$ p.d.f. and c.d.f. from [scipy](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2.html)
- what is the value of `df`
- draw the $\chi^2$ distribution in the interval $[200, 425]$
- compute the $p$-value for our data
- does it work? Add the histgram for the simulated $\chi^2=2d$ values.