Statistical modeling and relationship between random variables

  • #1
fog37
1,568
108
TL;DR Summary
Statistical modeling and relationship between 3 random variables and 2 random variables
In statistical modeling, the goal is to come up with a model that describes the relationship between random variables. A function of randoms variables is also a random variable.
We could have three random variables, ##Y##, ##X##, ##\epsilon## with the r.v. ##Y## given by ##Y=b_1 X + b_2 + \epsilon## where ##b_1, b_2## are constant. The expectation value of ##Y## is simply ##E[Y|X] = b_1 E[X]+ b_2 + E[\epsilon]## with ##E[\epsilon]=0##. This is what simple linear regression is about. A note: an author wrote ##E[Y;X]## instead of ##E[Y|X]##, stating that it is not really a conditional expectation value, but I am not sure about the difference...

But in most textbooks, the variable ##X## is generally said to not be a random variable but a deterministic one...Why? Clearly, that would simplify the expectation value of ##Y## to ##E[Y|X] = b_1 X+ b_2##.

On the other hand, when ##X## is also a r.v., we need to know its expectation value ##E[X]## in order to proceed. How would we get ##E[X]## from the sample data?

For example, in practice, if we asked 50 random people, out of a population, their height ##Y## and age ##X##, both age and height would be r.v. , correct? That seems the most common scenario for linear regression. What kind of situation would instead have ##X## to be deterministic? Maybe if we search from the beginning for people of specific ages and then ask them their height? In that case, we planned what the values of the variable ##X## would be...But in many other cases, it seems that both variables would be commonly random. How would we then handle
 
Physics news on Phys.org
  • #2
fog37 said:
The expectation value of ##Y## is simply ##E[Y|X] = b_1 E[X]+ b_2 + E[\epsilon]## with ##E[\epsilon]=0##.
Be careful about this. ##E[Y|X]## is a function of ##X## whereas the right side is just a single number.
fog37 said:
This is what simple linear regression is about. A note: an author wrote ##E[Y;X]## instead of ##E[Y|X]##, stating that it is not really a conditional expectation value, but I am not sure about the difference...
What author? If I am going to say something that is contradicted by the author, then I would like to know what all the surrounding text said.
fog37 said:
But in most textbooks, the variable ##X## is generally said to not be a random variable but a deterministic one...Why?
Clearly, that would simplify the expectation value of ##Y## to ##E[Y|X] = b_1 X+ b_2##.
That equation equation is to be used with a particular value of ##X##. How ##X## got to have that value, whether deterministic or random, does not matter.
fog37 said:
On the other hand, when ##X## is also a r.v., we need to know its expectation value ##E[X]## in order to proceed.
Suppose that ##X_1, X_2, ... , X_n## are random variables, and more statistical analysis needs to be done with it. That is a more complicated situation. Some variables might be correlated, others might be independent. ADDED.(actually you need to do this even if the ##X_i##s ar all deterministic.)
fog37 said:
How would we get ##E[X]## from the sample data?

For example, in practice, if we asked 50 random people, out of a population, their height ##Y## and age ##X##, both age and height would be r.v. , correct? That seems the most common scenario for linear regression. What kind of situation would instead have ##X## to be deterministic?
You would normally collect a sample, ##(x_1,y_1), (x_2,y_3), ..., (x_m,y_m)##. Would it matter whether the ##x_i##s were from a random variable? If you have several input variables, ##X_i##s and want to do more detailed analysis of their relationship with ##Y## then you would first need to address the issue of correlated ##X## variables. ADDED.(actually you need to do this even if the ##X_i##s ar all deterministic.)
 
  • #3
I think this is more a matter of convention than anything. Bayesian statistics tends to treat everything as a random variable and assign it prior probability distributions. So you would model $$y\sim \mathcal N (b_1 X + b_0, \sigma)$$ You would not just treat ##y## as a random variable, but everything else too. Each one would have their own prior distribution.
 
  • #4
I believe the " deterministic" is just one that is not random, such as , the date, say, yearwise. You can control your choice of date when plotting, say, inflation vs year/date, or record high jump vs year. Notice you can regress one random variable vs a deterministic, but finding correlations are not defined.
 
  • #5
Dale said:
I think this is more a matter of convention than anything. Bayesian statistics tends to treat everything as a random variable and assign it prior probability distributions. So you would model $$y\sim \mathcal N (b_1 X + b_0, \sigma)$$ You would not just treat ##y## as a random variable, but everything else too. Each one would have their own prior distribution.
Some authors , maybe frequentists, define correlation in terms of conditional expectation ( E( Y|X), when regressing X against Y). Is this done with the Bayesian approach?
 
  • #6
WWGD said:
Some authors , maybe frequentists, define correlation in terms of conditional expectation ( E( Y|X), when regressing X against Y). Is this done with the Bayesian approach?
I haven’t seen that as a definition, but Bayesians use the same computations to actually calculate correlations as frequentists do.
 
  • Like
Likes WWGD
  • #7
Dale said:
I haven’t seen that as a definition, but Bayesians use the same computations to actually calculate correlations as frequentists do.
Thanks, I was also wondering on whether regressing Y on X is seen, described as the conditional expectation of Y on X , i.e., E[Y|X].
 
  • #8
WWGD said:
Thanks, I was also wondering on whether regressing Y on X is seen, described as the conditional expectation of Y on X , i.e., E[Y|X].
Sort of. It is not just the conditional expectation, but you get the entire conditional distribution. So you can get the expectation of the conditional distribution, but you can also get any other measure such as the variance or anything else you like.
 
  • Like
Likes WWGD

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
550
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
6
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
605
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
556
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
961
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
2K
Back
Top