What is the most common kind of forecasting model?

If there are no data available, or if the data available are not relevant to the forecasts, then qualitative forecasting methods must be used. These methods are not purely guesswork—there are well-developed structured approaches to obtaining good forecasts without using historical data. These methods are discussed in Chapter .

Nội dung chính Show

Predictor variables and time series forecasting
Time series models
Time series analysis vs. time series forecasting
Validating and testing a time series model
Types of forecasting methods
Examples of time series forecasting
Storage forecasting
Machine learning
Overview of time series forecasting methods
Types of time series methods used for forecasting
Forecasting models including seasonality
What model is best for forecasting?
What are the 2 main methods of forecasting?
Which is the most accurate forecasting method and why?

Quantitative forecasting can be applied when two conditions are satisfied:

numerical information about the past is available;
it is reasonable to assume that some aspects of the past patterns will continue into the future.

There is a wide range of quantitative forecasting methods, often developed within specific disciplines for specific purposes. Each method has its own properties, accuracies, and costs that must be considered when choosing a specific method.

Most quantitative prediction problems use either time series data (collected at regular intervals over time) or cross-sectional data (collected at a single point in time). In this book we are concerned with forecasting future data, and we concentrate on the time series domain.

Predictor variables and time series forecasting

Predictor variables are often useful in time series forecasting. For example, suppose we wish to forecast the hourly electricity demand (ED) of a hot region during the summer period. A model with predictor variables might be of the form \[\begin{align*} \text{ED} = & f(\text{current temperature, strength of economy, population,}\\ & \qquad\text{time of day, day of week, error}). \end{align*}\] The relationship is not exact — there will always be changes in electricity demand that cannot be accounted for by the predictor variables. The “error” term on the right allows for random variation and the effects of relevant variables that are not included in the model. We call this an explanatory model because it helps explain what causes the variation in electricity demand.

Because the electricity demand data form a time series, we could also use a time series model for forecasting. In this case, a suitable time series forecasting equation is of the form \[ \text{ED}_{t+1} = f(\text{ED}_{t}, \text{ED}_{t-1}, \text{ED}_{t-2}, \text{ED}_{t-3},\dots, \text{error}), \] where \(t\) is the present hour, \(t+1\) is the next hour, \(t-1\) is the previous hour, \(t-2\) is two hours ago, and so on. Here, prediction of the future is based on past values of a variable, but not on external variables which may affect the system. Again, the “error” term on the right allows for random variation and the effects of relevant variables that are not included in the model.

There is also a third type of model which combines the features of the above two models. For example, it might be given by \[ \text{ED}_{t+1} = f(\text{ED}_{t}, \text{current temperature, time of day, day of week, error}). \] These types of mixed models have been given various names in different disciplines. They are known as dynamic regression models, panel data models, longitudinal models, transfer function models, and linear system models (assuming that \(f\) is linear). These models are discussed in Chapter .

An explanatory model is useful because it incorporates information about other variables, rather than only historical values of the variable to be forecast. However, there are several reasons a forecaster might select a time series model rather than an explanatory or mixed model. First, the system may not be understood, and even if it was understood it may be extremely difficult to measure the relationships that are assumed to govern its behaviour. Second, it is necessary to know or forecast the future values of the various predictors in order to be able to forecast the variable of interest, and this may be too difficult. Third, the main concern may be only to predict what will happen, not to know why it happens. Finally, the time series model may give more accurate forecasts than an explanatory or mixed model.

The model to be used in forecasting depends on the resources and data available, the accuracy of the competing models, and the way in which the forecasting model is to be used.

Time series forecasting is one of the most applied data science techniques in business, finance, supply chain management, production and inventory planning. Many prediction problems involve a time component and thus require extrapolation of time series data, or time series forecasting. Time series forecasting is also an important area of machine learning (ML) and can be cast as a supervised learning problem. ML methods such as Regression, Neural Networks, Support Vector Machines, Random Forests and XGBoost — can be applied to it. Forecasting involves taking models fit on historical data and using them to predict future observations.

Time series forecasting means to forecast or to predict the future value over a period of time. It entails developing models based on previous data and applying them to make observations and guide future strategic decisions.

The future is forecast or estimated based on what has already happened. Time series adds a time order dependence between observations. This dependence is both a constraint and a structure that provides a source of additional information. Before we discuss time series forecasting methods, let’s define time series forecasting more closely.

Time series forecasting is a technique for the prediction of events through a sequence of time. It predicts future events by analyzing the trends of the past, on the assumption that future trends will hold similar to historical trends. It is used across many fields of study in various applications including:

Astronomy
Business planning
Control engineering
Earthquake prediction
Econometrics
Mathematical finance
Pattern recognition
Resources allocation
Signal processing
Statistics
Weather forecasting

Time series forecasting starts with a historical time series. Analysts examine the historical data and check for patterns of time decomposition, such as trends, seasonal patterns, cyclic patterns and regularity. Many areas within organizations including marketing, finance and sales use some form of time series forecasting to evaluate probable technical costs and consumer demand. Models for time series data can have many forms and represent different stochastic processes.

Time series models

Time series models are used to forecast events based on verified historical data. Common types include ARIMA, smooth-based, and moving average. Not all models will yield the same results for the same dataset, so it's critical to determine which one works best based on the individual time series.

When forecasting, it is important to understand your goal. To narrow down the specifics of your predictive modeling problem, ask questions about:

Volume of data available — more data is often more helpful, offering greater opportunity for exploratory data analysis, model testing and tuning, and model fidelity.
Required time horizon of predictions — shorter time horizons are often easier to predict — with higher confidence — than longer ones.
Forecast update frequency — Forecasts might need to be updated frequently over time or might need to be made once and remain static (updating forecasts as new information becomes available often results in more accurate predictions).
Forecast temporal frequency — Often forecasts can be made at lower or higher frequencies, which allows harnessing downsampling and up-sampling of data (this in turn can offer benefits while modeling).

Time series analysis vs. time series forecasting

While time series analysis is all about understanding the dataset; forecasting is all about predicting it. Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values.

The three aspects of predictive modeling are:

Sample data: the data that we collect that describes our problem with known relationships between inputs and outputs.
Learn a model: the algorithm that we use on the sample data to create a model that we can later use over and over again.
Making predictions: the use of our learned model on new data for which we don’t know the output.

Validating and testing a time series model

Among the factors that make time series forecasting challenging are:

Time dependence of a time series - The basic assumption of a linear regression model that the observations are independent doesn’t hold in this case. Due to the temporal dependencies in time series data, time series forecasting cannot rely on usual validation techniques. To avoid biased evaluations, training data sets should contain observations that occurred prior to the ones in validation sets. Once we have chosen the best model, we can fit it on the entire training set and evaluate its performance on a separate test set subsequent in time.
Seasonality in a time series - Along with an increasing or decreasing trend, most time series have some form of seasonal trends, i.e. variations specific to a particular time frame.

Time series models can outperform others on a particular dataset — one model which performs best on one type of dataset may not perform the same for all others.

Types of forecasting methods

ModelUseDecompositionalDeconstruction of time seriesSmooth-basedRemoval of anomalies for clear patternsMoving-AverageTracking a single type of dataExponential SmoothingSmooth-based model + exponential window function

Examples of time series forecasting

Examples of time series forecasting include: predicting consumer demand for a particular product across seasons; the price of home heating fuel sources; hotel occupancy rate; hospital inpatient treatment; fraud detection; stock prices. You can perform forecasting either via storage or machine learning models.

Let's explore forecasting examples using InfluxDB, the open source time series database.

Storage forecasting

Here is a use case example of storage forecasting (at Veritas Technologies), from which the below screenshot is taken:

Storage Usage Forecast at Veritas Predictive Insights

Machine learning

Here is a use case example of machine learning (at Playtech), from which the below screenshot is taken:

Moving statistics

Overview of time series forecasting methods

Decompositional models

Time series data can exhibit a variety of patterns, so it is often helpful to split a time series into components, each representing an underlying pattern category. This is what decompositional models do.

The decomposition of time series is a statistical task that deconstructs a time series into several components, each representing one of the underlying categories of patterns. When we decompose a time series into components, we think of a time series as comprising three components: a trend component, a seasonal component, and residuals or ”noise” (containing anything else in the time series).

There are two main types of decomposition: decomposition based on rates of change and decomposition based on predictability.

Decomposition based on rates of change

This is an important time series analysis technique, especially for seasonal adjustment. It seeks to construct, from an observed time series, a number of component series (that could be used to reconstruct the original by additions or multiplications) where each of these has a certain characteristic or type of behavior.

If data shows some seasonality (e.g. daily, weekly, quarterly, yearly) it may be useful to decompose the original time series into the sum of three components:

Y(t) = S(t) + T(t) + R(t)

where S(t) is the seasonal component, T(t) is the trend-cycle component, and R(t) is the remainder component.

There are several techniques to estimate such a decomposition. The most basic one is called classical decomposition and consists in:

Estimating trend T(t) through a rolling mean
Computing S(t) as the average detrended series Y(t)-T(t) for each season (e.g. for each month)
Computing the remainder series as R(t)=Y(t)-T(t)-S(t)

Time series can also be decomposed into:

, the trend component at time t, which reflects the long-term progression of the series. A trend exists when there is a persistent increasing or decreasing direction in the data.
, the cyclical component at time t, which reflects repeated but non-periodic fluctuations. The duration of these fluctuations is usually of at least two years.
, the seasonal component at time t, reflecting seasonality (seasonal variation). Seasonality occurs over a fixed and known time period (e.g., the quarter of the year, the month, or day of the week).
, the irregular component (“residuals” or "noise") at time t, which describes random, irregular influences.

Additive vs. multiplicative decomposition

In an additive time series, the components add together to make the time series. In a multiplicative time series, the components multiply together to make the time series.

Here is an example of a time series using an additive model:

An additive model is used when the variations around the trend do not vary with the level of the time series. To learn more about forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects, see the “Forecasting with FB Prophet and InfluxDB” tutorial which shows how to make a univariate time series prediction (Facebook Prophet is an open source library published by Facebook that is based on decomposable — trend+seasonality+holidays — models).

Here is an example of a time series using a multiplicative model:

A multiplicative model is appropriate if the trend is proportional to the level of the time series.

Decomposition based on predictability

The theory of time series analysis makes use of the idea of decomposing a time series into deterministic and non-deterministic components (or predictable and unpredictable components).

In statistics, Wold's decomposition or the Wold representation theorem, named after Herman Wold, says that every covariance-stationary time series can be written as the sum of two time series, one deterministic and one stochastic.
Formally:

Where:

is the time series being considered,
is an uncorrelated sequence which is the innovation process to the process — that is, a white noise process that is input to the linear filter
is the possibly infinite vector of moving average weights (coefficients or parameters)
is a deterministic time series, such as one represented by a sine wave.

Types of time series methods used for forecasting

Times series methods refer to different ways to measure timed data. Common types include: Autoregression (AR), Moving Average (MA), Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving Average (ARIMA), and Seasonal Autoregressive Integrated Moving-Average (SARIMA).

The important thing is to select the appropriate forecasting method based on the characteristics of the time series data.

Smoothing-based models

In time series forecasting, data smoothing is a statistical technique that involves removing outliers from a time series data set to make a pattern more visible. Inherent in the collection of data taken over time is some form of random variation. Smoothing data removes or reduces random variation and shows underlying trends and cyclic components.

Moving-average model

In time series analysis, the moving-average model (MA model), also known as moving-average process, is a common approach for modeling univariate time series. The moving-average model specifies that the output variable depends linearly on the current and various past values of a stochastic (imperfectly predictable) term.

Together with the autoregressive (AR) model (covered below), the moving-average model is a special case and key component of the more general ARMA and ARIMA models of time series, which have a more complicated stochastic structure.

Contrary to the AR model, the finite MA model is always stationary.

Exponential Smoothing model

Exponential smoothing is a rule of thumb technique for smoothing time series data using the exponential window function. Exponential smoothing is an easily learned and easily applied procedure for making some determination based on prior assumptions by the user, such as seasonality. Different types of exponential smoothing include single exponential smoothing, double exponential smoothing, and triple exponential smoothing (also known as the Holt-Winters method). For tutorials on how to use Holt-Winters out of the box with InfluxDB, see “When You Want Holt-Winters Instead of Machine Learning” and “Using InfluxDB to Predict The Next Extinction Event”).

An example of using with InfluxQL in 1.7.6. HoltWinters() can also be applied with Flux — InfluxData’s functional query and scripting language.

In single exponential smoothing, forecasts are given by:

Ŷ(t+h|t) = ⍺y(t) + ⍺(1-⍺)y(t-1) + ⍺(1-⍺)²y(t-2) + …

with 0<⍺<1.

Triple Exponential Smoothing or Holt Winters is mathematically similar to Single Exponential Smoothing except that the seasonality and trend are included in the forecast.

Moving-Average model vs. Exponential Smoothing model

Whereas in the simple moving average the past observations are weighted equally, exponential functions are used to assign exponentially decreasing weights over time (recent observations are given relatively more weight in forecasting than the older observations).
In the case of moving averages, the weights assigned to the observations are the same and are equal to 1/N. In exponential smoothing, however, there are one or more smoothing parameters to be determined (or estimated) and these choices determine the weights assigned to the observations.

Forecasting models including seasonality

ARIMA and SARIMA

To define ARIMA and SARIMA, it’s helpful to first define autoregression. Autoregression is a time series model that uses observations from previous time steps as input to a regression equation to predict the value at the next time step. (“Autoregression Models for Time Series Forecasting With Python” is a good tutorial on how to implement an autoregressive model for time series forecasting with Python.)

AutoRegressive Integrated Moving Average (ARIMA) models are among the most widely used time series forecasting techniques:

In an Autoregressive model, the forecasts correspond to a linear combination of past values of the variable.
In a Moving Average model the forecasts correspond to a linear combination of past forecast errors.

The ARIMA models combine the above two approaches. Since they require the time series to be stationary, differencing (Integrating) the time series may be a necessary step, i.e. considering the time series of the differences instead of the original one.

The SARIMA model (Seasonal ARIMA) extends the ARIMA by adding a linear combination of seasonal past values and/or forecast errors.

TBATS

The TBATS model is a forecasting model based on exponential smoothing. The name is an acronym for Trigonometric, Box-Cox transform, ARMA errors, Trend and Seasonal components.

The TBATS model’s main feature is its capability to deal with multiple seasonalities by modelling each seasonality with a trigonometric representation based on Fourier series. A classic example of complex seasonality is given by daily observations of sales volumes which often have both weekly and yearly seasonality.

What model is best for forecasting?

A causal model is the most sophisticated kind of forecasting tool. It expresses mathematically the relevant causal relationships, and may include pipeline considerations (i.e., inventories) and market survey information. It may also directly incorporate the results of a time series analysis.

What are the 2 main methods of forecasting?

There are two types of forecasting methods: qualitative and quantitative.

Which is the most accurate forecasting method and why?

Exponential Smoothing It can often result in a more accurate forecast. It is an easy method that enables forecasts to quickly react to new trends or changes. A benefit to exponential smoothing is that it does not require a large amount of historical data.