Time series and why would we remove seasonality and trend?

  • I
  • Thread starter fog37
  • Start date
  • Tags
    Time series
  • #1
fog37
1,568
108
TL;DR Summary
Time series and why removing of seasonality and trend
Hello,
I understand a few things about time series but I am unclear on other main concepts. Hope you can help me get on the right track.
  • A time series is simply a 1D signal with the variable time ##t## on the horizontal axis and another variable of choice ##X## on the vertical axis. The time implies a precise order of the samples of the variable ##X## (sequence).
  • I understand that the time signal ##X(t)## can be viewed as the sum of 3 components which are a) seasonality, b) trend, c) random component. Seasonality means a there is a periodic component (not matter its functional shape, sine, etc.). Trend is another functional shape (linear, curvilinear, etc.). The random component is obvious.
  • Signals can be stationary or not. Stationarity simply means if we take a segment of the ##X(t)##, say from 5s to 8s, and another sample from 10-13s, the two segments are not identical but statistically similar (mean, correlation, etc.): the statistical properties of ##X(t)## don't change over time.
My question:

The goal in time series analysis is generally coming up with a model that predicts future values using past values. Why would we want to remove seasonality and/or trend from ##X(t)##? That would seem to change the identity of the signal....I get that removing them would make the signal stationary if it is not...But I am thinking how two different signals ##X(t)## are indeed different because they are holistically different in their seasonality, trend, etc.

If a signal is truly ##X(t) = seasonality+trend+random component##, removing the first two leaves us with only the random part...

I see how removing seasonality may make sense sometimes. For example, the earnings of a company may go up and down over the course of a year simply due to what generally happens during a specific month. That is useful to know even if it makes the time series not stationary....

Thank you!
 
Physics news on Phys.org
  • #2
trend and seasonality are removed by differencing which does not lose information

Take trend - one you difference by taking the log of the value change it’s easy to recreate by reversing the process
 
  • #3
fog37 said:
TL;DR Summary: Time series and why removing of seasonality and trend

Why would we want to remove seasonality and/or trend
Usually because we have some specific application in mind and for that application we are uninterested in the variation due to seasonality or trend.

For example, right now in my location the temperatures are dropping. A time series analysis shows strong seasonal effects and a smaller trend.

My neighbor, a meteorologist, wants to include both the trend and the seasonal variation in his advice whether to wear a jacket tomorrow.

My other neighbor, a climate scientist, wants to remove the seasonal variation to show that the planet is warming despite the fact that it is colder today than yesterday.

My other other neighbor, a store owner, wants to remove the trend to figure when to place an order for a bunch of swim suits based on the seasonal variation.

The decision about removing one thing or another is based on the application rather than the data. All three had the same data and model.
 
  • Like
Likes FactChecker and fog37
  • #4
I see. Thank you Dale. That makes a lot of sense....I guess what I am reading is all about removing seasonality and that confused me.

What about the strive to make a time-series stationary if it is not?

Do you have a simple example like the ones above about when removing non-stationarity or keeping may be application dependent?

Thank you again!
 
  • #5
fog37 said:
What about the strive to make a time-series stationary if it is not?
That I don’t know enough about to give solid advice. Maybe different statistical methods need to be used for non-stationary series?
 
  • #6
fog37 said:
What about the strive to make a time-series stationary if it is not?

Do you have a simple example like the ones above about when removing non-stationarity or keeping may be application dependent?
How would you measure the volatility of the S&P 500 over the past 30 years, when the index value has gone up 10x? you cannot take simply take the standard deviation of the price
 
  • Like
Likes Dale
  • #7
fog37 said:
Do you have a simple example like the ones above about when removing non-stationarity or keeping may be application dependent?
1) Suppose you had decades of data of the daily high temperatures at a location. If you were interested in the long-term temperature change (Global warming?), you would not want to have to always consider if you were looking at summer or winter data. So you would want to remove the effects of seasons. On the other hand, if you were interested in seasonal variation, you would want to remove the long-term effects of global warming so that you can compare summers to winters without having to consider the general upward trend over time.

2) Suppose you were looking at the daily reported deaths from COVID-19 over a year. In general, deaths are not well reported over the weekends and then they catch up on Monday and Tuesday. If you are interested in the long-term spread of COVID, you would want to remove the weekend/Monday_Tuesday cycles so that you can see the long term growth rate. On the other hand, if you are interested on how the reporting is done, you might want to do the opposite -- remove the long term trend to get stationary data and compare weekends to the Monday_Tuesday numbers.
 
  • Like
Likes Dale
  • #8
Reading back on all your comments, I was thinking of the fundamental concept of why time series are so special.
For example, given two continuous variables ##X## and ##Y##, we can do a scatter plot ##Y## vs ##X## and determine the Pearson correlation as well as come up with the linear regression model ##Y= a X +b##.

In the case of a time variable ##t## and the variable ##Z(t)##, we can also do a scatter plot and determine how the correlation coefficient of the linear predictive model ## Y(t)=a t + b##...

So what makes ##t## so different from the variable ##X##? We plot both ##t## and ##X##, in increasing order, on the horizontal axis and the other variable on the vertical axis. For each ##X## there is an associated ##Y## and for each ##t## there is an associated ##Z## value...

Maybe the difference really show up when we get autoregressive type models where ##Z(t)## can depend on ##t## at the current value, at previous values of ##t## and on previous values of ##Z##. All of this does not happen for the variable pair ##Y## and ##X##. Also, autocorrelation would not make sense on the values of ##X## or ##Y## alone while we can do it for ##Z##, i.e. ##corr(lag)=E[Z(t) Z(t+lag)]##

Is my understanding correct?
 
  • #9
fog37 said:
Is my understanding correct?
You may be over-thinking it. There are a great many things where the best predictor of a future value is the current value and/or some combination of prior values. It's as simple as that.
 
  • Like
Likes fog37
  • #10
FactChecker said:
You may be over-thinking it. There are a great many things where the best predictor of a future value is the current value and/or some combination of prior values. It's as simple as that.
You might be right.

Yes
best predictor of a future value is the current value and/or some combination of prior values. It's as simple as that.
but that is happens with time sequenced data (future, prior, current apply to time data).

That concept does not apply to cross-sectional data that is not time data. For example, ##X##=weight and ##Y##=height. Usually we don't see autoregressive models on cross-section data. On the other hand, autocorrelation is used in regression analysis with cross-sectional to determine, for example, if the residuals are independent.
 
  • #11
fog37 said:
but that is happens with time sequenced data (future, prior, current apply to time data).

That concept does not apply to cross-sectional data that is not time data.
You make an interesting point. IMO, the concept still applies but the implementation is not common. Often the best predictor of the value at one location is the values around it. The term "around it" could mean time, position, or some other dimension. I have personally dealt only with the standard time series where earlier data was known, a future data is being estimated, and nothing is known for times beyond that. If data afterward (or around) was known and we are just estimating an unknown intermediate value, would that be completely different? It seems very similar to me.
 

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
570
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
525
  • Set Theory, Logic, Probability, Statistics
Replies
14
Views
313
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
2K
  • Biology and Medical
Replies
6
Views
471
Replies
11
Views
886
  • Set Theory, Logic, Probability, Statistics
Replies
9
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
21
Views
3K
Replies
1
Views
977
  • Special and General Relativity
Replies
5
Views
1K
Back
Top