Time Series Profiling
Data Granularity
Let's look at the AHSRAE data used before.from pandas import read_csv, DataFrame, Series
from matplotlib.pyplot import figure, show
from dslabs_functions import plot_line_chart, HEIGHT
file_tag = "ASHRAE"
target = "meter_reading"
data: DataFrame = read_csv(
"data/time_series/ashrae.csv",
index_col="timestamp",
sep=",",
decimal=".",
parse_dates=True,
infer_datetime_format=True,
)
series: Series = data[target]
figure(figsize=(3 * HEIGHT, HEIGHT / 2))
plot_line_chart(
series.index.to_list(),
series.to_list(),
xlabel=series.index.name,
ylabel=target,
title=f"{file_tag} hourly {target}",
)
show()
We've already perceived that data is recorded hourly, but we can try other aggregations...
We use the ts_univariate_aggregation_by function to do that.
from pandas import Index, Period
def ts_aggregation_by(
data: Series | DataFrame,
gran_level: str = "D",
agg_func: str = "mean",
) -> Series | DataFrame:
df: Series | DataFrame = data.copy()
index: Index[Period] = df.index.to_period(gran_level)
df = df.groupby(by=index, dropna=True, sort=True).agg(agg_func)
df.index.drop_duplicates()
df.index = df.index.to_timestamp()
return df
ss_days: Series = ts_aggregation_by(series, "D")
figure(figsize=(3 * HEIGHT, HEIGHT / 2))
plot_line_chart(
ss_days.index.to_list(),
ss_days.to_list(),
xlabel="days",
ylabel=target,
title=f"{file_tag} daily mean {target}",
)
show()
Aggregating by days, we perform a kind of a smoothing, since we are using the mean as aggregation function. And as a result, we found a smoother version of the original time series, with less noise.
In this new version, we continue to identify a cyclic behavior, which seems to be shown weekly.
from matplotlib.pyplot import subplots
from matplotlib.axes import Axes
from matplotlib.figure import Figure
grans: list[str] = ["D", "W", "M"]
fig: Figure
axs: list[Axes]
fig, axs = subplots(len(grans), 1, figsize=(3 * HEIGHT, HEIGHT / 2 * len(grans)))
fig.suptitle(f"{file_tag} {target} aggregation study")
for i in range(len(grans)):
ss: Series = ts_aggregation_by(series, grans[i])
plot_line_chart(
ss.index.to_list(),
ss.to_list(),
ax=axs[i],
xlabel=f"{ss.index.name} ({grans[i]})",
ylabel=target,
title=f"granularity={grans[i]}",
)
show()
The chart for weekly consumption is quite different from the previous ones – it does not show any cyclic behavior as before! Indeed, despite the reduction trend on the second semester, the weekly consumptions are almost constant in the first quarter.
The chart for monthly consumptions confirm those identified trends… and confirms any suspicion about the lack of stationarity in the time series. Indeed its mean is not constant along time. In particular we identify very different values of consumption per month, but there are more formal ways to deal with that!