Statistical modeling and relationship between random variables

fog37 · Jan 16, 2024

In statistical modeling, the goal is to come up with a model that describes the relationship between random variables. A function of randoms variables is also a random variable.
We could have three random variables, ##Y##, ##X##, ##\epsilon## with the r.v. ##Y## given by ##Y=b_1 X + b_2 + \epsilon## where ##b_1, b_2## are constant. The expectation value of ##Y## is simply ##E[Y|X] = b_1 E[X]+ b_2 + E[\epsilon]## with ##E[\epsilon]=0##. This is what simple linear regression is about. A note: an author wrote ##E[Y;X]## instead of ##E[Y|X]##, stating that it is not really a conditional expectation value, but I am not sure about the difference...

But in most textbooks, the variable ##X## is generally said to not be a random variable but a deterministic one...Why? Clearly, that would simplify the expectation value of ##Y## to ##E[Y|X] = b_1 X+ b_2##.

On the other hand, when ##X## is also a r.v., we need to know its expectation value ##E[X]## in order to proceed. How would we get ##E[X]## from the sample data?

For example, in practice, if we asked 50 random people, out of a population, their height ##Y## and age ##X##, both age and height would be r.v. , correct? That seems the most common scenario for linear regression. What kind of situation would instead have ##X## to be deterministic? Maybe if we search from the beginning for people of specific ages and then ask them their height? In that case, we planned what the values of the variable ##X## would be...But in many other cases, it seems that both variables would be commonly random. How would we then handle

FactChecker · Jan 16, 2024

fog37 said:

The expectation value of ##Y## is simply ##E[Y|X] = b_1 E[X]+ b_2 + E[\epsilon]## with ##E[\epsilon]=0##.

Be careful about this. ##E[Y|X]## is a function of ##X## whereas the right side is just a single number.

fog37 said:

This is what simple linear regression is about. A note: an author wrote ##E[Y;X]## instead of ##E[Y|X]##, stating that it is not really a conditional expectation value, but I am not sure about the difference...

What author? If I am going to say something that is contradicted by the author, then I would like to know what all the surrounding text said.

fog37 said:

But in most textbooks, the variable ##X## is generally said to not be a random variable but a deterministic one...Why?
Clearly, that would simplify the expectation value of ##Y## to ##E[Y|X] = b_1 X+ b_2##.

That equation equation is to be used with a particular value of ##X##. How ##X## got to have that value, whether deterministic or random, does not matter.

fog37 said:

On the other hand, when ##X## is also a r.v., we need to know its expectation value ##E[X]## in order to proceed.

Suppose that ##X_1, X_2, ... , X_n## are random variables, and more statistical analysis needs to be done with it. That is a more complicated situation. Some variables might be correlated, others might be independent. ADDED.(actually you need to do this even if the ##X_i##s ar all deterministic.)

fog37 said:

How would we get ##E[X]## from the sample data?

For example, in practice, if we asked 50 random people, out of a population, their height ##Y## and age ##X##, both age and height would be r.v. , correct? That seems the most common scenario for linear regression. What kind of situation would instead have ##X## to be deterministic?

You would normally collect a sample, ##(x_1,y_1), (x_2,y_3), ..., (x_m,y_m)##. Would it matter whether the ##x_i##s were from a random variable? If you have several input variables, ##X_i##s and want to do more detailed analysis of their relationship with ##Y## then you would first need to address the issue of correlated ##X## variables. ADDED.(actually you need to do this even if the ##X_i##s ar all deterministic.)

Dale · Jan 16, 2024

I think this is more a matter of convention than anything. Bayesian statistics tends to treat everything as a random variable and assign it prior probability distributions. So you would model $$y\sim \mathcal N (b_1 X + b_0, \sigma)$$ You would not just treat ##y## as a random variable, but everything else too. Each one would have their own prior distribution.

WWGD · Jan 18, 2024

I believe the " deterministic" is just one that is not random, such as , the date, say, yearwise. You can control your choice of date when plotting, say, inflation vs year/date, or record high jump vs year. Notice you can regress one random variable vs a deterministic, but finding correlations are not defined.

WWGD · Jan 18, 2024

Dale said:

I think this is more a matter of convention than anything. Bayesian statistics tends to treat everything as a random variable and assign it prior probability distributions. So you would model $$y\sim \mathcal N (b_1 X + b_0, \sigma)$$ You would not just treat ##y## as a random variable, but everything else too. Each one would have their own prior distribution.

Some authors , maybe frequentists, define correlation in terms of conditional expectation ( E( Y|X), when regressing X against Y). Is this done with the Bayesian approach?

Dale · Jan 18, 2024

WWGD said:

Some authors , maybe frequentists, define correlation in terms of conditional expectation ( E( Y|X), when regressing X against Y). Is this done with the Bayesian approach?

I haven’t seen that as a definition, but Bayesians use the same computations to actually calculate correlations as frequentists do.

WWGD · Jan 18, 2024

Dale said:

I haven’t seen that as a definition, but Bayesians use the same computations to actually calculate correlations as frequentists do.

Thanks, I was also wondering on whether regressing Y on X is seen, described as the conditional expectation of Y on X , i.e., E[Y|X].

Dale · Jan 18, 2024

WWGD said:

Thanks, I was also wondering on whether regressing Y on X is seen, described as the conditional expectation of Y on X , i.e., E[Y|X].

Sort of. It is not just the conditional expectation, but you get the entire conditional distribution. So you can get the expectation of the conditional distribution, but you can also get any other measure such as the variance or anything else you like.

Statistical modeling and relationship between random variables

Similar threads

Hot Threads

Recent Insights