post · ODAIA · ML engineering · 2023

Time Series Forecasting Real-World Challenges

Forecasting sounds simple, but real business data is messy. A tour of the real-world challenges that make time series forecasting hard: missing data, sparsity, volatility, external drivers, multiple seasonality, and hierarchy.

If you are working as a data specialist, forecasting can be a common request across many business problems in various industries. After all, regardless of the sector or business category, organizations need forecasts for planning, informed decision-making, and identifying potential risks and opportunities. Whether it’s a supply and demand problem like predicting sales in e-commerce, energy demand in the utility sector, or prescription behavior of healthcare providers in pharmaceuticals, forecasting is challenging, and its corresponding errors can directly translate to additional business costs.

Let’s consider a couple of examples, say you are in e-commerce and you get this question from an internal or vendor development team: “Can we forecast how many units of product P we are able to sell in the next X months?” Or, if you’re in pharma, the question might be something like: “How many units of drug D are going to be prescribed by HCP (Health Care Provider) H in the next X months, and what does the same forecast say for competitor drugs in the market?”

Although the above questions are generated from two distinct markets, a data specialist would go about solving them in similar ways as they both require a process of predicting future values based on time series historical observations. Time series data refers to a sequence of data points listed in order of their occurrence in time.

Despite the questions in the examples appearing relatively simple and straightforward, building a feasible and scalable solution to these real-world forecasting questions is quite a challenging undertaking.

People don’t have a lot of time series training. This is a plot of Stack Overflow tags for questions with different topics [Fig. 1] and you can see everybody in machine learning wants to build classifiers and regression models, but no one is working on time series [because they are extremely challenging]. Sean Taylor (Former Research Scientist Manager at Meta) at the 2018 New York R Conference.

The monthly share of all Stack Overflow questions carrying each tag (Stack Overflow Trends data), 2009 to 2023, the plot Sean Taylor showed at the 2018 New York R Conference. Deep learning sits near zero until 2015, machine learning spikes past 0.55% around 2020, and time series never leaves the floor. Drag the line to read any month.

The main challenge in making time series forecasts coincides with the first step of the process, research! Much of the existing research on time series models use very clean data. These either explain a mathematical theory or demonstrate the performance of a new algorithm or library/code on a dataset that does not suffer the practical challenges and problems of a business unit (ex. Fig 2). When data scientists fail to acknowledge and adjust for the differences between the clean data used to generate a particular forecasting model and the real-world (often messy) data that they have access to, errors are magnified and the model will perform poorly. In the remainder of this article, we will cover some of the real-world data challenges by sharing our own experiences as well as some from industry leaders.

At zero you get the clean, regular series every tutorial shows. Turn the dial up and the same signal gathers what real data carries: noise, missing stretches, outlier spikes, and a level shift partway through. The dashed line is the ideal you thought you were modelling.

Before Forecasting

One of the main industry challenges is that data scientists and engineers start generating solutions before they understand the business problems in a profound way. This is a common mistake that happens when we assume that there are more similarities between time series forecasting problems than there really are. Although we can leverage certain consistencies across different industries when generating a model, we must also have a deep understanding of how our industry or problem is different. By modifying models to fit the uniqueness of each problem, we are able to forecast a better reflection of our particular market. To highlight this, I would like to share from Inbal Tadesky, Data Science Manager at Anodot in Data & AI Summit (July 2022). Before you put the effort into doing forecasting:

Data Challenges

The data is not clean!! And this is a bitter reality. All real-world data also has certain complexities that amplify many of the problems facing forecasting models (see Fig 3.):

  1. Spatial Hierarchy: This could be various geographical levels or different product-category hierarchies in the system. In pharma, HCPs can be categorized based on their geographical information such as Postal Code/Zip or Provinces/States (Fig. 4).
The signal lives at the prescriber, but the business asks about postal codes and provinces. Select a node: the panel shows its rolled-up daily prescriptions over the faint component series beneath it. The aggregate is smooth and confident; the parts it is built from are spiky and sparse.
  1. Temporal Hierarchy: The timestamp of the signal can also introduce a hierarchy, i.e., daily signals must match monthly, quarterly, and yearly aggregates (Fig. 5).
One prescriber's daily count (faint) and its average over each day, week, month, or quarter. Coarsen the grain and the same series turns smooth and almost trend-like, which is why a forecast made at one level rarely agrees with one made at another.

While forecasting real-world challenges are inevitable, organizations that are able to develop effective forecasting processes or integrate platforms that are providing end-to-end forecasting services focused on their particular business niche can gain a significant competitive advantage. It is crucial for businesses to understand that the main goal of forecasting is not to predict a future for some specific metrics and KPIs but rather, it is a critical component of data-driven decision-making that can transform the way a business operates.