- #1
Vital
- 108
- 4
Hello.
I am bewildered by so many different notions of probability distribution percentages, i.e. the proportion of values that lie within certain standard deviations from the mean.
(1) There is a Chebyshev's inequality:
- for any distribution with finite variance, the proportion of the observations within k standard deviations of the arithmetic mean is at least 1 − 1/k2 for all k > 1. Below X is the mean.
k = 1.25 => X ± 1.25s => proportion 1 - 1/(1.25)^2 = 36% => 36% of observations lie within 36% from the mean (hence 18% below the mean and 18% above the mean)
k = 1.50 => X ± 1.5s => proportion 56% => 56% of observations lie within 56% from the mean (hence 28% below the mean and 28% above the mean)
k = 2 => X ±2.0s => proportion 75% => 75% of observations lie within 75% from the mean (hence 37.5% below the mean and 37.5% above the mean)
k = 2.50 => X ±2.5s => proportion 84% => 84% of observations lie within 84% from the mean (hence 42% below the mean and 42% above the mean)
k = 3.0 => X +- 3.0s => proportion 89% => 89% of observations lie within 89% from the mean (hence 44.5% below the mean and 44.5% above the mean)
k = 4.0 => X +- 4.0s => proportion 94% => 94% of observations lie within 94% from the mean (hence 47% below the mean and 47% above the mean)
(2) Confidence intervals:
is a range of values around the expected outcome within which we expect the actual outcome to be some specified percentage of the time. For example, a 95% confidence interval is a range that we expect the random variable to be in 95% of the time.
μ ± 1.65σ for 90 percent of the observations
μ ± 1.96σ for 95 percent of the observations
μ ± 2.58σ for 99 percent of the observations.
Both approaches above show completely different percentages of observations within a certain number of standard deviations from the mean. In Chebyshev's inequality concept there are 94% of observations within ±4 standard deviations, while in Confidence interval approach there are 99% within ±2.58 standard deviations.
Please, help me to understand how these differ from each other, and why they give such different percentages. Please, do me a favour and don't go too deep into a rabbit whole by using complicated math formulas in your explanation.
Thank you very much.)
I am bewildered by so many different notions of probability distribution percentages, i.e. the proportion of values that lie within certain standard deviations from the mean.
(1) There is a Chebyshev's inequality:
- for any distribution with finite variance, the proportion of the observations within k standard deviations of the arithmetic mean is at least 1 − 1/k2 for all k > 1. Below X is the mean.
k = 1.25 => X ± 1.25s => proportion 1 - 1/(1.25)^2 = 36% => 36% of observations lie within 36% from the mean (hence 18% below the mean and 18% above the mean)
k = 1.50 => X ± 1.5s => proportion 56% => 56% of observations lie within 56% from the mean (hence 28% below the mean and 28% above the mean)
k = 2 => X ±2.0s => proportion 75% => 75% of observations lie within 75% from the mean (hence 37.5% below the mean and 37.5% above the mean)
k = 2.50 => X ±2.5s => proportion 84% => 84% of observations lie within 84% from the mean (hence 42% below the mean and 42% above the mean)
k = 3.0 => X +- 3.0s => proportion 89% => 89% of observations lie within 89% from the mean (hence 44.5% below the mean and 44.5% above the mean)
k = 4.0 => X +- 4.0s => proportion 94% => 94% of observations lie within 94% from the mean (hence 47% below the mean and 47% above the mean)
(2) Confidence intervals:
is a range of values around the expected outcome within which we expect the actual outcome to be some specified percentage of the time. For example, a 95% confidence interval is a range that we expect the random variable to be in 95% of the time.
μ ± 1.65σ for 90 percent of the observations
μ ± 1.96σ for 95 percent of the observations
μ ± 2.58σ for 99 percent of the observations.
Both approaches above show completely different percentages of observations within a certain number of standard deviations from the mean. In Chebyshev's inequality concept there are 94% of observations within ±4 standard deviations, while in Confidence interval approach there are 99% within ±2.58 standard deviations.
Please, help me to understand how these differ from each other, and why they give such different percentages. Please, do me a favour and don't go too deep into a rabbit whole by using complicated math formulas in your explanation.
Thank you very much.)