Time Series Forecasting Real-World Challenges

If you are working as a data specialist, forecasting can be a common request across many business problems in various industries. After all, regardless of the sector or business category, organizations need forecasts for planning, informed decision-making, and identifying potential risks and opportunities. Whether it’s a supply and demand problem like predicting sales in e-commerce, energy demand in the utility sector, or prescription behavior of healthcare providers in pharmaceuticals, forecasting is challenging, and its corresponding errors can directly translate to additional business costs.

Let’s consider a couple of examples, say you are in e-commerce and you get this question from an internal or vendor development team: “Can we forecast how many units of product P we are able to sell in the next X months?” Or, if you’re in pharma, the question might be something like: “How many units of drug D are going to be prescribed by HCP (Health Care Provider) H in the next X months, and what does the same forecast say for competitor drugs in the market?”

Although the above questions are generated from two distinct markets, a data specialist would go about solving them in similar ways as they both require a process of predicting future values based on time series historical observations. Time series data refers to a sequence of data points listed in order of their occurrence in time.

Despite the questions in the examples appearing relatively simple and straightforward, building a feasible and scalable solution to these real-world forecasting questions is quite a challenging undertaking.

People don’t have a lot of time series training. This is a plot of Stack Overflow tags for questions with different topics [Fig. 1] and you can see everybody in machine learning wants to build classifiers and regression models, but no one is working on time series [because they are extremely challenging]. Sean Taylor (Former Research Scientist Manager at Meta) at the 2018 New York R Conference.

The monthly share of all Stack Overflow questions carrying each tag (Stack Overflow Trends data), 2009 to 2023, the plot Sean Taylor showed at the 2018 New York R Conference. Deep learning sits near zero until 2015, machine learning spikes past 0.55% around 2020, and time series never leaves the floor. Drag the line to read any month.

The main challenge in making time series forecasts coincides with the first step of the process, research! Much of the existing research on time series models use very clean data. These either explain a mathematical theory or demonstrate the performance of a new algorithm or library/code on a dataset that does not suffer the practical challenges and problems of a business unit (ex. Fig 2). When data scientists fail to acknowledge and adjust for the differences between the clean data used to generate a particular forecasting model and the real-world (often messy) data that they have access to, errors are magnified and the model will perform poorly. In the remainder of this article, we will cover some of the real-world data challenges by sharing our own experiences as well as some from industry leaders.

reality

At zero you get the clean, regular series every tutorial shows. Turn the dial up and the same signal gathers what real data carries: noise, missing stretches, outlier spikes, and a level shift partway through. The dashed line is the ideal you thought you were modelling.

Before Forecasting

One of the main industry challenges is that data scientists and engineers start generating solutions before they understand the business problems in a profound way. This is a common mistake that happens when we assume that there are more similarities between time series forecasting problems than there really are. Although we can leverage certain consistencies across different industries when generating a model, we must also have a deep understanding of how our industry or problem is different. By modifying models to fit the uniqueness of each problem, we are able to forecast a better reflection of our particular market. To highlight this, I would like to share from Inbal Tadesky, Data Science Manager at Anodot in Data & AI Summit (July 2022). Before you put the effort into doing forecasting:

Understand exactly how you want to use it. The process of consuming the forecasts must be defined from the beginning. You need to understand if you have a workflow to integrate forecasting into, as well as, understand the actions you are looking to take using the outcomes.
Define the requirements of the system. For example, what is the horizon you need the forecast for (how far out)? And at which time scale (time intervals)?
Understand how to measure the value and the success criteria of the forecasting model. The accuracy of a particular Machine Learning model is definitely not a measure of success for the business case. Back to our e-commerce example, if we do the forecasting for Product P, is it going to correct/optimize our inventory? Alternatively, at ODAIA, we forecast the volume of prescribed drugs by HCPs. If we achieve reasonably correct forecasting, does it optimize/help the process for sales reps to target the right HCPs for their specific drugs?
Define and communicate the expectations based on:
1. The quality of the historical data you have.
2. The level of uncertainty of the particular data set you are dealing with.

Data Challenges

The data is not clean!! And this is a bitter reality. All real-world data also has certain complexities that amplify many of the problems facing forecasting models (see Fig 3.):

Missing Data: In actuality, data is riddled with missing values, and depending on how these missing values are spread throughout the dataset, the complexity of dealing with the issues increases. For example, if the missing values in the data are single data points spread throughout a dense signal, they can be resolved relatively easily using well-known imputation techniques (techniques that help you fill in the missing data point based on the points and signals around it). However, this becomes a more challenging problem if the missing data occur in a large chunk (missing period) in an otherwise sparse and volatile dataset. Additionally, the complexity of managing missing data increases when these lost data points are contained at the starting and/or the ending points of signals.
Multiple Time Series: In real-world data, most of the time we are not forecasting for a single isolated time series. Often, we need to forecast for a bundle of signals that may or may not have an impact on each other.
Time Series Length and Period Overlap: In the real world, the length of the time series might vary for different reasons:
1. Newly introduced signals: for example, in the pharma industry, this could represent new doctors who have just started their career/prescribing under their own license.
2. Discontinued: an example of this in the pharma industry could be those HCPs who are retiring.
Sparse Data / Not Enough Data: Generally sparse time series are much harder to deal with when attempting to create models and provide meaningful forecasts.
Volatility: It is common, especially in data pertaining to human behavior, that a large portion of the dataset is volatile and has lots of noise. The stronger the noise is, the weaker the ability to extract useful patterns from the data.
External Drivers / Special Events: The patterns in the real-world time series data are sometimes driven by external factors. For example, in an e-commerce setting, multiple factors could change the behavior of the sales signal for a particular product such as special dates (Black Friday-Cyber Monday, etc.), promotions, marketing costs, etc. When this happens, the biggest challenge lies in the fact that we know the external drivers exist but, we either don’t have access to the driver’s information or, it is very complex to add them to the system (for example the impact of news on the sales forecast/stock market).
Multiple seasonality: Time series that have multiple repeating patterns that occur at different frequencies and/or timescales are more challenging to address using predictive models. For example, a time series of daily sales data may exhibit both a weekly pattern (with higher sales during the weekends) and an annual pattern (with higher sales during holiday seasons or cold seasons like the fall and winter). These different patterns can interact with each other, making it difficult to model the series accurately. In particular, the seasonal patterns may not be orthogonal, meaning that they can overlap and interfere with each other, leading to complex interactions and dependencies that are difficult to accurately capture.
Effect of Hierarchy: In the real-world, forecasting does not focus on only one level of aggregation in the data, but rather, provides a coherent forecast across different aggregation levels. A common mistake is to look at each data level individually, meaning that predictions are made completely independently. This leads to inconsistencies between the aggregated signal (higher level) and the corresponding granular signals (lower levels). Logically, the sum of granular signals’ historical data is equal to the aggregated signal, so their forecasts should be the same. However, when making individual forecasts at each of the lower levels, the sum does not always add up to the aggregated forecasts and can introduce considerable discrepancies between the sum of lower-level forecasts and the higher-level forecast. To address this, we need to make sure while the lower-level time series go through their individual predictions, they still preserve the appropriate relationships within the established hierarchy. There are two types of hierarchy:

Spatial Hierarchy: This could be various geographical levels or different product-category hierarchies in the system. In pharma, HCPs can be categorized based on their geographical information such as Postal Code/Zip or Provinces/States (Fig. 4).

The signal lives at the prescriber, but the business asks about postal codes and provinces. Select a node: the panel shows its rolled-up daily prescriptions over the faint component series beneath it. The aggregate is smooth and confident; the parts it is built from are spiky and sparse.

Temporal Hierarchy: The timestamp of the signal can also introduce a hierarchy, i.e., daily signals must match monthly, quarterly, and yearly aggregates (Fig. 5).

One prescriber's daily count (faint) and its average over each day, week, month, or quarter. Coarsen the grain and the same series turns smooth and almost trend-like, which is why a forecast made at one level rarely agrees with one made at another.

While forecasting real-world challenges are inevitable, organizations that are able to develop effective forecasting processes or integrate platforms that are providing end-to-end forecasting services focused on their particular business niche can gain a significant competitive advantage. It is crucial for businesses to understand that the main goal of forecasting is not to predict a future for some specific metrics and KPIs but rather, it is a critical component of data-driven decision-making that can transform the way a business operates.