Understanding 2 equivalent formulations of both data set measures

fab13 · Dec 27, 2020

I have two independant experiments have measured ##\tau_{1},\sigma_{1}## and ##\tau_{2},\sigma_{2}## with ##\sigma_{i}## representing errors on measures.

From these two measures, assuming errors are gaussian, we want to get the estimation of Ï
and its error (i.e with a combination of two measures).

We choose the maximum likelihood method with the pdf of 2 measures:
$$
f(\tau, \sigma)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{1}{2} \frac{(\tau-\hat{\tau})^{2}}{\sigma^{2}}\right)
$$
One has to maximize the likelihood function:
$$
\mathcal{L}=\prod_{i=1}^{2} \frac{1}{\sqrt{2 \pi} \sigma_{i}} \exp \left(-\frac{1}{2} \frac{\left(\tau_{i}-\hat{\tau}\right)^{2}}{\sigma_{i}^{2}}\right)
$$
taking the following condition:
$$
\frac{\partial(-\log \mathcal{L})}{\partial \hat{\tau}}=0
$$
We get :
$$
\Rightarrow \hat{\tau}=\frac{\tau_{1} / \sigma_{1}^{2}+\tau_{2} / \sigma_{2}^{2}}{1 / \sigma_{1}^{2}+1 / \sigma_{2}^{2}}\quad(1)
$$
##\sigma_{\hat{\tau}}## is deducted from second derivate of ##\log{\mathcal{L}}## :
$$
\frac{1}{\sigma_{\hat{\tau}}^{2}}=\frac{1}{\sigma_{1}^{2}}+\frac{1}{\sigma_{2}^{2}}\quad(2)
$$
For these both measures, equivalent number ##\tilde{N}## is defined by:
$$
\frac{\sigma_{1}}{\tau_{1}}=\frac{1}{\sqrt{\tilde{N}_{1}}} \quad \frac{\sigma_{2}}{\tau_{2}}=\frac{1}{\sqrt{\tilde{N}_{2}}}\quad(3)
$$

- Question 1) Why we call this quantity ##\tilde{N}## as an "equivalent number" in
eq##(3)##

- Question 2) This expression eq##(3)## is defined as being the relative error of
measure expressed by the statistical error due to the number of
events. Where does this definition of relative error of measure come from
? I mean, how to justify it ?

After, we can write :
$$
\hat{\tau}=\frac{\tilde{N}_{1} \tau_{1}+\tilde{N}_{2} \tau_{2}}{\tilde{N}_{1}+\tilde{N}_{2}}
$$
Finally, we have :
$$
\hat{\tau}=\frac{\tau_{1} /\left(\sigma_{1} / \tau_{1}\right)^{2}+\tau_{2} /\left(\sigma_{2} / \tau_{2}\right)^{2}}{1 /\left(\sigma_{1} / \tau_{1}\right)^{2}+1 /\left(\sigma_{2} / \tau_{2}\right)^{2}}\quad(4)
$$

In conclusion, we can say that in one case :

- case (1) : weighted by the square of inverse error (eq##(1)##)

and in another case :

- case (2): weighted by the square of relative error (eq##(4)##)

Question 3) Are these 2 cases, rather formulations, are equivalent ? Are they 2 interpretations of a same quantity ##\hat{\tau}## ? If not, what's the link between these both expressions eq##(1)## and eq##(4)## ?

In the same time, on [Wikipedia], it is said that :

Population-based statistics [ edit ]
The populations of sets, which may overlap, can be calculated simply as follows:
$$
N_{X \cup Y}=N_{X}+N_{Y}-N_{X \cap Y}
$$
The populations of sets, which do not overlap, can be calculated simply as follows:
$$
\begin{aligned}
X \cap Y=\varnothing \Rightarrow & N_{X \cap Y}=0 \\
\Rightarrow & N_{X \cup Y}=N_{X}+N_{Y}
\end{aligned}
$$
Standard deviations of non-overlapping ##(X \cap Y=\varnothing)## sub-populations can be aggregated as follows if the size (actual or relative to one another) and means of each are known:
$$
\begin{aligned}
\mu_{X \cup Y} &=\frac{N_{X} \mu_{X}+N_{Y} \mu_{Y}}{N_{X}+N_{Y}} \\
\sigma_{X \cup Y} &=\sqrt{\frac{N_{X} \sigma_{X}^{2}+N_{Y} \sigma_{Y}^{2}}{N_{X}+N_{Y}}+\frac{N_{X} N_{Y}}{\left(N_{X}+N_{Y}\right)^{2}}\left(\mu_{X}-\mu_{Y}\right)^{2}}\quad(5)
\end{aligned}
$$
For example, suppose it is known that the average American man has a mean height of 70 inches with a standard deviation of three inches and that the average American woman has a mean height of 65 inches with a standard deviation of two inches. Also assume that the number of men, ##N##, is equal to the number of women. Then the mean and standard deviation of heights of American adults could be calculated as
$$
\begin{array}{l}
\mu=\frac{N \cdot 70+N \cdot 65}{N+N}=\frac{70+65}{2}=67.5 \\
\sigma=\sqrt{\frac{3^{2}+2^{2}}{2}+\frac{(70-65)^{2}}{2^{2}}}=\sqrt{12.75} \approx 3.57
\end{array}
$$ - Question 4) Considering the expectations ##\mu_x## and ##\mu_y## are not the same, like in my 2 measures at the beginning (corresponding to ##\tau_1## and ##\tau_2##), can we say that eq##(2)## and eq##(5)## are equivalent ? i.e in the case where I have 2 measures at the beginning of my most.

Any help is welcome

Stephen Tashi · Jan 1, 2021

fab13 said:

I have two independant experiments have measured ##\tau_{1},\sigma_{1}## and ##\tau_{2},\sigma_{2}## with ##\sigma_{i}## representing errors on measures.

We choose the maximum likelihood method with the pdf of 2 measures:
##
f(\tau, \sigma)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{1}{2} \frac{(\tau-\hat{\tau})^{2}}{\sigma^{2}}\right)
##
One has to maximize the likelihood function:
##
\mathcal{L}=\prod_{i=1}^{2} \frac{1}{\sqrt{2 \pi} \sigma_{i}} \exp \left(-\frac{1}{2} \frac{\left(\tau_{i}-\hat{\tau}\right)^{2}}{\sigma_{i}^{2}}\right)
##

That is a liklihood function for two samples, one sample taken per each gaussian distribution. So it isn't clear how we would compute ##\sigma_1, \sigma_2## if we have only one sample per each gaussian distribution.

For these both measures, equivalent number ##\tilde{N}## is defined by:
##
\frac{\sigma_{1}}{\tau_{1}}=\frac{1}{\sqrt{\tilde{N}_{1}}} \quad \frac{\sigma_{2}}{\tau_{2}}=\frac{1}{\sqrt{\tilde{N}_{2}}}\quad(3)
## - Question 1) Why we call this quantity ##\tilde{N}## as an "equivalent number" in
eq##(3)##

Where did you read about "equivalent number"? - in what text or article?

In conclusion, we can say that in one case :
- case (1) : weighted by the square of inverse error (eq(1))
and in another case :
- case (2): weighted by the square of relative error (eq(4))
Question 3) Are these 2 cases, rather formulations, are equivalent ?

No.

Are they 2 interpretations of a same quantity ##\hat{\tau}## ?

No.

If not, what's the link between these both expressions eq##(1)## and eq##(4)## ?

I don't know. The statistic ##\tau/\sigma## is called the "coefficient of variation". https://en.wikipedia.org/wiki/Coefficient_of_variation The only place I have seen the a relations like ##\sigma^2/\tau^2 = z/N^2 ## is in equations for confidence intervals for the statistic.

In the same time, on [Wikipedia], it is said that :

Population-based statistics [ edit ]
The populations of sets, which may overlap, can be calculated simply as follows:

That article deals with calculating the sample mean and sample standard deviation of a sample when we know those statistics for subsets of the numerical data. This is a different topic that how to use sample standard means and sample standard deviations of subsets of the data to estimate the population mean and population standard deviation of the sample.

Of course it is possible that the maximum liklihood or other type of estimator for a parameter uses only the sample mean and standard deviation from an entire set of data instead of using different calculations on various subsets of the data and combing the results. The thing to understand is that the above article is about a calculation whose correct answer is established by the definitions of sample mean and sample standard deviation for an entire data set. By contrast, an article about how to formulate estimators may deal with maximum liklihood estimators, minimum variance estimators, unbiased estimators etc., which have different definitions.

fab13 · Jan 2, 2021

Are they 2 interpretations of a same quantity

?

No.

Could you justify please ? I think there is the notion of confidence between both interpretations but maybe I am wrong.

What is the interest of have these 2 different equations eq(1) and eq(4) ?

fab13 · Jan 3, 2021

sorry, I wanted to say : "I think there is the notion of confidence level (C.L) between both interpretations" and this is also related to what is called "equivalent number".

Stephen Tashi · Jan 3, 2021

fab13 said:

Could you justify please ?

Is it clear that the two equations can produce different results for ##\hat{\tau}##?

What is the interest of have these 2 different equations eq(1) and eq(4) ?

I've never seen equation 4 before. Where did you see it?

BvU · Jan 3, 2021

fab13 said:

equivalent number ##\tilde{N}## is defined by:
$$
\frac{\sigma_{1}}{\tau_{1}}=\frac{1}{\sqrt{\tilde{N}_{1}}} \quad \frac{\sigma_{2}}{\tau_{2}}=\frac{1}{\sqrt{\tilde{N}_{2}}}\quad(3)
$$

This can only be done for Poisson statistics where expected value and variance are the same.

For a gaussian with mean zero (or negative

) you get nonsense.

fab13 · Jan 5, 2021

BvU said:

This can only be done for Poisson statistics where expected value and variance are the same.

For a gaussian with mean zero (or negative ) you get nonsense.

I don't understand very well when you say this can only be done for Poisson statistic. Indeed, if ##\sigma_1=\tau_1## and ##\sigma_2=\tau_2##, then we would have :

$$\frac{\sigma_{1}}{\tau_{1}}=1=\frac{1}{\sqrt{\tilde{N}_{1}}} \quad \frac{\sigma_{2}}{\tau_{2}}=1=\frac{1}{\sqrt{\tilde{N}_{2}}}\quad(3)$$

So ##\tilde{N}_{1}=\tilde{N}_{2}=1## ? that makes no sense. What is the signification of this "equivalent number" ? I thought there was a link with confidence interval expressions.

BvU · Jan 5, 2021

Variance is the square of ##\sigma##

fab13 · Jan 5, 2021

So what to conclude ? I would get :

##\dfrac{\sigma^2}{\tau}=\dfrac{\sigma}{\sqrt{\tilde{N}}}## ?

I have difficulties to grasp the subtilities of this quantity. As much I can understand the notion of relative dispersion with the ratio ##\dfrac{\sigma}{\tau}##, as less I understand this notion of "equivalent number".

Any clarifications are welcome

BvU · Jan 5, 2021

fab13 said:

I don't understand very well when you say this can only be done for Poisson statistic.

What meaning can you possibly attach to ##\ \sigma/\tau\ ## when ##\tau ## is zero or negative ?
Except, of course, to designate ##\ \sigma/|\tau| \ ## as the relative error.

fab13 said:

I have two independent experiments have measured τ₁,σ₁ and τ₂,σ₂ with σ_irepresenting errors on measures.

Usually, there is a story attached to such a statement:

How does one 'measure' ##\sigma_i## (as opposed to estimating) and is there a valid reason for them to be different ?

Are the ##\tau_i## based on independent sample sets taken from one and the same population ?

I seem to remember

If the ##\tau_i## are averages, the corresponding estimate of ##\sigma_m## of the average is ##\sigma_i/\sqrt {N-1}\approx 1/\sqrt N## (with \sigma_i the estimate for he standard deviatiom of the sample), so in that case you could make a case for ##\sigma_m \propto 1/\sqrt { N_i}##.

The relative accuracy of an estimate of the standard deviation is approximately ##1/\sqrt N## where N is the sample size. This puts a severe limit on the significance of differences beween the ##\sigma_i## !

Understanding 2 equivalent formulations of both data set measures

Related to Understanding 2 equivalent formulations of both data set measures

1. What are equivalent formulations in data set measures?

2. Why is it important to understand 2 equivalent formulations in data set measures?

3. How can I identify equivalent formulations in data set measures?

4. Can equivalent formulations in data set measures lead to different conclusions?

5. How can I ensure accuracy when working with 2 equivalent formulations in data set measures?

Similar threads

Hot Threads

Recent Insights