What is the most common kind of forecasting model?
If there are no data available, or if the data available are not relevant to the forecasts, then qualitative forecasting methods must be used. These methods are not purely guesswork—there are well-developed structured approaches to obtaining good forecasts without using historical data. These methods are discussed in Chapter . Show
Quantitative forecasting can be applied when two conditions are satisfied:
There is a wide range of quantitative forecasting methods, often developed within specific disciplines for specific purposes. Each method has its own properties, accuracies, and costs that must be considered when choosing a specific method. Most quantitative prediction problems use either time series data (collected at regular intervals over time) or cross-sectional data (collected at a single point in time). In this book we are concerned with forecasting future data, and we concentrate on the time series domain. Predictor variables and time series forecastingPredictor variables are often useful in time series forecasting. For example, suppose we wish to forecast the hourly electricity demand (ED) of a hot region during the summer period. A model with predictor variables might be of the form \[\begin{align*} \text{ED} = & f(\text{current temperature, strength of economy, population,}\\ & \qquad\text{time of day, day of week, error}). \end{align*}\] The relationship is not exact — there will always be changes in electricity demand that cannot be accounted for by the predictor variables. The “error” term on the right allows for random variation and the effects of relevant variables that are not included in the model. We call this an explanatory model because it helps explain what causes the variation in electricity demand. Because the electricity demand data form a time series, we could also use a time series model for forecasting. In this case, a suitable time series forecasting equation is of the form \[ \text{ED}_{t+1} = f(\text{ED}_{t}, \text{ED}_{t-1}, \text{ED}_{t-2}, \text{ED}_{t-3},\dots, \text{error}), \] where \(t\) is the present hour, \(t+1\) is the next hour, \(t-1\) is the previous hour, \(t-2\) is two hours ago, and so on. Here, prediction of the future is based on past values of a variable, but not on external variables which may affect the system. Again, the “error” term on the right allows for random variation and the effects of relevant variables that are not included in the model. There is also a third type of model which combines the features of the above two models. For example, it might be given by \[ \text{ED}_{t+1} = f(\text{ED}_{t}, \text{current temperature, time of day, day of week, error}). \] These types of mixed models have been given various names in different disciplines. They are known as dynamic regression models, panel data models, longitudinal models, transfer function models, and linear system models (assuming that \(f\) is linear). These models are discussed in Chapter . An explanatory model is useful because it incorporates information about other variables, rather than only historical values of the variable to be forecast. However, there are several reasons a forecaster might select a time series model rather than an explanatory or mixed model. First, the system may not be understood, and even if it was understood it may be extremely difficult to measure the relationships that are assumed to govern its behaviour. Second, it is necessary to know or forecast the future values of the various predictors in order to be able to forecast the variable of interest, and this may be too difficult. Third, the main concern may be only to predict what will happen, not to know why it happens. Finally, the time series model may give more accurate forecasts than an explanatory or mixed model. The model to be used in forecasting depends on the resources and data available, the accuracy of the competing models, and the way in which the forecasting model is to be used. Time series forecasting is one of the most applied data science techniques in business, finance, supply chain management, production and inventory planning. Many prediction problems involve a time component and thus require extrapolation of time series data, or time series forecasting. Time series forecasting is also an important area of machine learning (ML) and can be cast as a supervised learning problem. ML methods such as Regression, Neural Networks, Support Vector Machines, Random Forests and XGBoost — can be applied to it. Forecasting involves taking models fit on historical data and using them to predict future observations. Time series forecasting means to forecast or to predict the future value over a period of time. It entails developing models based on previous data and applying them to make observations and guide future strategic decisions. The future is forecast or estimated based on what has already happened. Time series adds a time order dependence between observations. This dependence is both a constraint and a structure that provides a source of additional information. Before we discuss time series forecasting methods, let’s define time series forecasting more closely. Time series forecasting is a technique for the prediction of events through a sequence of time. It predicts future events by analyzing the trends of the past, on the assumption that future trends will hold similar to historical trends. It is used across many fields of study in various applications including:
Time series forecasting starts with a historical time series. Analysts examine the historical data and check for patterns of time decomposition, such as trends, seasonal patterns, cyclic patterns and regularity. Many areas within organizations including marketing, finance and sales use some form of time series forecasting to evaluate probable technical costs and consumer demand. Models for time series data can have many forms and represent different stochastic processes. Time series modelsTime series models are used to forecast events based on verified historical data. Common types include ARIMA, smooth-based, and moving average. Not all models will yield the same results for the same dataset, so it's critical to determine which one works best based on the individual time series. When forecasting, it is important to understand your goal. To narrow down the specifics of your predictive modeling problem, ask questions about:
Time series analysis vs. time series forecastingWhile time series analysis is all about understanding the dataset; forecasting is all about predicting it. Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values. The three aspects of predictive modeling are:
Validating and testing a time series modelAmong the factors that make time series forecasting challenging are:
Time series models can outperform others on a particular dataset — one model which performs best on one type of dataset may not perform the same for all others. Types of forecasting methodsModelUseDecompositionalDeconstruction of time seriesSmooth-basedRemoval of anomalies for clear patternsMoving-AverageTracking a single type of dataExponential SmoothingSmooth-based model + exponential window functionExamples of time series forecastingExamples of time series forecasting include: predicting consumer demand for a particular product across seasons; the price of home heating fuel sources; hotel occupancy rate; hospital inpatient treatment; fraud detection; stock prices. You can perform forecasting either via storage or machine learning models. Let's explore forecasting examples using InfluxDB, the open source time series database. Storage forecastingHere is a use case example of storage forecasting (at Veritas Technologies), from which the below screenshot is taken: Storage Usage Forecast at Veritas Predictive Insights Machine learningHere is a use case example of machine learning (at Playtech), from which the below screenshot is taken: Moving statistics Overview of time series forecasting methodsDecompositional modelsTime series data can exhibit a variety of patterns, so it is often helpful to split a time series into components, each representing an underlying pattern category. This is what decompositional models do. The decomposition of time series is a statistical task that deconstructs a time series into several components, each representing one of the underlying categories of patterns. When we decompose a time series into components, we think of a time series as comprising three components: a trend component, a seasonal component, and residuals or ”noise” (containing anything else in the time series). There are two main types of decomposition: decomposition based on rates of change and decomposition based on predictability. Decomposition based on rates of changeThis is an important time series analysis technique, especially for seasonal adjustment. It seeks to construct, from an observed time series, a number of component series (that could be used to reconstruct the original by additions or multiplications) where each of these has a certain characteristic or type of behavior. If data shows some seasonality (e.g. daily, weekly, quarterly, yearly) it may be useful to decompose the original time series into the sum of three components: Y(t) = S(t) + T(t) + R(t) where S(t) is the seasonal component, T(t) is the trend-cycle component, and R(t) is the remainder component. There are several techniques to estimate such a decomposition. The most basic one is called classical decomposition and consists in:
Time series can also be decomposed into:
Additive vs. multiplicative decompositionIn an additive time series, the components add together to make the time series. In a multiplicative time series, the components multiply together to make the time series. Here is an example of a time series using an additive model: An additive model is used when the variations around the trend do not vary with the level of the time series. To learn more about forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects, see the “Forecasting with FB Prophet and InfluxDB” tutorial which shows how to make a univariate time series prediction (Facebook Prophet is an open source library published by Facebook that is based on decomposable — trend+seasonality+holidays — models). Here is an example of a time series using a multiplicative model: A multiplicative model is appropriate if the trend is proportional to the level of the time series. Decomposition based on predictabilityThe theory of time series analysis makes use of the idea of decomposing a time series into deterministic and non-deterministic components (or predictable and unpredictable components). In statistics, Wold's decomposition or the Wold representation theorem, named after Herman Wold, says that every covariance-stationary time series can be written as the sum of two time series, one deterministic and one stochastic. Where:
Types of time series methods used for forecastingTimes series methods refer to different ways to measure timed data. Common types include: Autoregression (AR), Moving Average (MA), Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving Average (ARIMA), and Seasonal Autoregressive Integrated Moving-Average (SARIMA). The important thing is to select the appropriate forecasting method based on the characteristics of the time series data. Smoothing-based modelsIn time series forecasting, data smoothing is a statistical technique that involves removing outliers from a time series data set to make a pattern more visible. Inherent in the collection of data taken over time is some form of random variation. Smoothing data removes or reduces random variation and shows underlying trends and cyclic components. Moving-average modelIn time series analysis, the moving-average model (MA model), also known as moving-average process, is a common approach for modeling univariate time series. The moving-average model specifies that the output variable depends linearly on the current and various past values of a stochastic (imperfectly predictable) term. Together with the autoregressive (AR) model (covered below), the moving-average model is a special case and key component of the more general ARMA and ARIMA models of time series, which have a more complicated stochastic structure. Contrary to the AR model, the finite MA model is always stationary. Exponential Smoothing modelExponential smoothing is a rule of thumb technique for smoothing time series data using the exponential window function. Exponential smoothing is an easily learned and easily applied procedure for making some determination based on prior assumptions by the user, such as seasonality. Different types of exponential smoothing include single exponential smoothing, double exponential smoothing, and triple exponential smoothing (also known as the Holt-Winters method). For tutorials on how to use Holt-Winters out of the box with InfluxDB, see “When You Want Holt-Winters Instead of Machine Learning” and “Using InfluxDB to Predict The Next Extinction Event”). An example of using with InfluxQL in 1.7.6. HoltWinters() can also be applied with Flux — InfluxData’s functional query and scripting language. In single exponential smoothing, forecasts are given by: Ŷ(t+h|t) = ⍺y(t) + ⍺(1-⍺)y(t-1) + ⍺(1-⍺)²y(t-2) + … with 0<⍺<1. Triple Exponential Smoothing or Holt Winters is mathematically similar to Single Exponential Smoothing except that the seasonality and trend are included in the forecast. Moving-Average model vs. Exponential Smoothing model
Forecasting models including seasonalityARIMA and SARIMATo define ARIMA and SARIMA, it’s helpful to first define autoregression. Autoregression is a time series model that uses observations from previous time steps as input to a regression equation to predict the value at the next time step. (“Autoregression Models for Time Series Forecasting With Python” is a good tutorial on how to implement an autoregressive model for time series forecasting with Python.) AutoRegressive Integrated Moving Average (ARIMA) models are among the most widely used time series forecasting techniques:
The ARIMA models combine the above two approaches. Since they require the time series to be stationary, differencing (Integrating) the time series may be a necessary step, i.e. considering the time series of the differences instead of the original one. The SARIMA model (Seasonal ARIMA) extends the ARIMA by adding a linear combination of seasonal past values and/or forecast errors. TBATSThe TBATS model is a forecasting model based on exponential smoothing. The name is an acronym for Trigonometric, Box-Cox transform, ARMA errors, Trend and Seasonal components. The TBATS model’s main feature is its capability to deal with multiple seasonalities by modelling each seasonality with a trigonometric representation based on Fourier series. A classic example of complex seasonality is given by daily observations of sales volumes which often have both weekly and yearly seasonality. What model is best for forecasting?A causal model is the most sophisticated kind of forecasting tool. It expresses mathematically the relevant causal relationships, and may include pipeline considerations (i.e., inventories) and market survey information. It may also directly incorporate the results of a time series analysis.
What are the 2 main methods of forecasting?There are two types of forecasting methods: qualitative and quantitative.
Which is the most accurate forecasting method and why?Exponential Smoothing
It can often result in a more accurate forecast. It is an easy method that enables forecasts to quickly react to new trends or changes. A benefit to exponential smoothing is that it does not require a large amount of historical data.
|