Time series and why would we remove seasonality and trend?

fog37 · Nov 22, 2023

Hello,
I understand a few things about time series but I am unclear on other main concepts. Hope you can help me get on the right track.

A time series is simply a 1D signal with the variable time ##t## on the horizontal axis and another variable of choice ##X## on the vertical axis. The time implies a precise order of the samples of the variable ##X## (sequence).
I understand that the time signal ##X(t)## can be viewed as the sum of 3 components which are a) seasonality, b) trend, c) random component. Seasonality means a there is a periodic component (not matter its functional shape, sine, etc.). Trend is another functional shape (linear, curvilinear, etc.). The random component is obvious.
Signals can be stationary or not. Stationarity simply means if we take a segment of the ##X(t)##, say from 5s to 8s, and another sample from 10-13s, the two segments are not identical but statistically similar (mean, correlation, etc.): the statistical properties of ##X(t)## don't change over time.

My question:

The goal in time series analysis is generally coming up with a model that predicts future values using past values. Why would we want to remove seasonality and/or trend from ##X(t)##? That would seem to change the identity of the signal....I get that removing them would make the signal stationary if it is not...But I am thinking how two different signals ##X(t)## are indeed different because they are holistically different in their seasonality, trend, etc.

If a signal is truly ##X(t) = seasonality+trend+random component##, removing the first two leaves us with only the random part...

I see how removing seasonality may make sense sometimes. For example, the earnings of a company may go up and down over the course of a year simply due to what generally happens during a specific month. That is useful to know even if it makes the time series not stationary....

Thank you!

BWV · Nov 22, 2023

trend and seasonality are removed by differencing which does not lose information

Take trend - one you difference by taking the log of the value change it’s easy to recreate by reversing the process

Dale · Nov 22, 2023

fog37 said:

TL;DR Summary: Time series and why removing of seasonality and trend

Why would we want to remove seasonality and/or trend

Usually because we have some specific application in mind and for that application we are uninterested in the variation due to seasonality or trend.

For example, right now in my location the temperatures are dropping. A time series analysis shows strong seasonal effects and a smaller trend.

My neighbor, a meteorologist, wants to include both the trend and the seasonal variation in his advice whether to wear a jacket tomorrow.

My other neighbor, a climate scientist, wants to remove the seasonal variation to show that the planet is warming despite the fact that it is colder today than yesterday.

My other other neighbor, a store owner, wants to remove the trend to figure when to place an order for a bunch of swim suits based on the seasonal variation.

The decision about removing one thing or another is based on the application rather than the data. All three had the same data and model.

fog37 · Nov 22, 2023

I see. Thank you Dale. That makes a lot of sense....I guess what I am reading is all about removing seasonality and that confused me.

What about the strive to make a time-series stationary if it is not?

Do you have a simple example like the ones above about when removing non-stationarity or keeping may be application dependent?

Thank you again!

Dale · Nov 22, 2023

fog37 said:

What about the strive to make a time-series stationary if it is not?

That I don’t know enough about to give solid advice. Maybe different statistical methods need to be used for non-stationary series?

BWV · Nov 22, 2023

fog37 said:

What about the strive to make a time-series stationary if it is not?

Do you have a simple example like the ones above about when removing non-stationarity or keeping may be application dependent?

How would you measure the volatility of the S&P 500 over the past 30 years, when the index value has gone up 10x? you cannot take simply take the standard deviation of the price

FactChecker · Nov 22, 2023

fog37 said:

Do you have a simple example like the ones above about when removing non-stationarity or keeping may be application dependent?

1) Suppose you had decades of data of the daily high temperatures at a location. If you were interested in the long-term temperature change (Global warming?), you would not want to have to always consider if you were looking at summer or winter data. So you would want to remove the effects of seasons. On the other hand, if you were interested in seasonal variation, you would want to remove the long-term effects of global warming so that you can compare summers to winters without having to consider the general upward trend over time.

2) Suppose you were looking at the daily reported deaths from COVID-19 over a year. In general, deaths are not well reported over the weekends and then they catch up on Monday and Tuesday. If you are interested in the long-term spread of COVID, you would want to remove the weekend/Monday_Tuesday cycles so that you can see the long term growth rate. On the other hand, if you are interested on how the reporting is done, you might want to do the opposite -- remove the long term trend to get stationary data and compare weekends to the Monday_Tuesday numbers.

fog37 · Dec 8, 2023

Reading back on all your comments, I was thinking of the fundamental concept of why time series are so special.
For example, given two continuous variables ##X## and ##Y##, we can do a scatter plot ##Y## vs ##X## and determine the Pearson correlation as well as come up with the linear regression model ##Y= a X +b##.

In the case of a time variable ##t## and the variable ##Z(t)##, we can also do a scatter plot and determine how the correlation coefficient of the linear predictive model ## Y(t)=a t + b##...

So what makes ##t## so different from the variable ##X##? We plot both ##t## and ##X##, in increasing order, on the horizontal axis and the other variable on the vertical axis. For each ##X## there is an associated ##Y## and for each ##t## there is an associated ##Z## value...

Maybe the difference really show up when we get autoregressive type models where ##Z(t)## can depend on ##t## at the current value, at previous values of ##t## and on previous values of ##Z##. All of this does not happen for the variable pair ##Y## and ##X##. Also, autocorrelation would not make sense on the values of ##X## or ##Y## alone while we can do it for ##Z##, i.e. ##corr(lag)=E[Z(t) Z(t+lag)]##

Is my understanding correct?

FactChecker · Dec 8, 2023

fog37 said:

Is my understanding correct?

You may be over-thinking it. There are a great many things where the best predictor of a future value is the current value and/or some combination of prior values. It's as simple as that.

fog37 · Dec 9, 2023

FactChecker said:

You may be over-thinking it. There are a great many things where the best predictor of a future value is the current value and/or some combination of prior values. It's as simple as that.

You might be right.

Yes

best predictor of a future value is the current value and/or some combination of prior values. It's as simple as that.

but that is happens with time sequenced data (future, prior, current apply to time data).

That concept does not apply to cross-sectional data that is not time data. For example, ##X##=weight and ##Y##=height. Usually we don't see autoregressive models on cross-section data. On the other hand, autocorrelation is used in regression analysis with cross-sectional to determine, for example, if the residuals are independent.

FactChecker · Dec 9, 2023

fog37 said:

but that is happens with time sequenced data (future, prior, current apply to time data).

That concept does not apply to cross-sectional data that is not time data.

You make an interesting point. IMO, the concept still applies but the implementation is not common. Often the best predictor of the value at one location is the values around it. The term "around it" could mean time, position, or some other dimension. I have personally dealt only with the standard time series where earlier data was known, a future data is being estimated, and nothing is known for times beyond that. If data afterward (or around) was known and we are just estimating an unknown intermediate value, would that be completely different? It seems very similar to me.

Time series and why would we remove seasonality and trend?

Similar threads

Hot Threads

Recent Insights