How does Flieber calculate the Forecast?

INTRODUCTION

The Data team at Flieber has developed an end-to-end forecasting pipeline that processes sales information and creates a high caliber forecast to the Flieber platform. There are three main pieces to the forecasting pipeline: pre-processing, prediction, and post-processing.

Preprocessing relates to the manipulation of the raw sales history to ensure that it represents actual demand. Prediction relates to the production of the forecasts and measuring their accuracy. Post-processing relates to any final manipulations performed on the forecast to increase the accuracy of the final predicted values.

After our pipeline has run, we quantify the performance of the final predictions and go through a formal process to publish the forecasts to the platform.

PREPROCESSING

The best forecast is one that predicts true demand for a product. The goal of preprocessing the sales history is to get historical data that represent the true demand more accurately than the raw sales history.

False Zero (Stockout) Substitution

Retail sales can be disrupted by a variety of events: stockout, marketplace suspension, etc. The sales during the days of disruption are zero (or near zero) although there is demand on these days. The days that precede and succeed a stockout event may also be impacted. The intention of the False Zero Substitution pre-processing step is to interpolate actual demand when one of those aforementioned events are detected.

The Flieber Approach:

The False Zero Substitution is broken into two steps: identifying periods with disruption and estimating true demand for these periods.

Using a product’s sales history, the likelihood of selling a particular number of units in a given day can be estimated with a Poisson distribution. From there, the likelihood of a sequence of days with low sales can also be estimated. Sequences that are not probable are identified as periods of disruption.

If a product sells consistently, even a single day with no sales can be regarded as a stockout, whereas for products that sell intermittently, only lengthy sequences of no sales are regarded as stock outs. Additionally, to augment the solution when sales do not go near zero but are still significantly different from regular sales (for example periods having issues with keyword search), a changepoint detection algorithm determines such periods and if the mean sales of such periods is less than one-third of mean sales of surrounding periods then such periods are identified as periods of disruption.

Again, this process takes into account the demand classification of sales history and treats smooth consistent selling products differently from intermittent products.

The data points in periods of disruption get interpolated either linearly or with recent sales pace adjusted for seasonality depending upon the demand classification. Additionally, a potential peak of increased sales once a product is back in stock (very common behavior due to backed-up demand) is also adjusted to avoid predicting a similar peak in the forecasts.

Price Normalization

The price of a product may fluctuate over time. These price changes may be due to external factors, but could also be due to internal levers, such as lowering the price during a sales promotion or increasing the price to decelerate sales when heading towards a stockout. Better understanding the relationship of price to demand aids in demand planning strategies and, for this case, in realizing what demand would have been if a past price shift had not occurred. Normalizing the sales volume in the sales history after accounting for this relationship prevents changes in sales volume due to price adjustments from being mistakenly identified by our forecasting algorithms as seasonal patterns.

The Flieber Approach:

Using the median price of every year as the base price, sales are adjusted based on the difference between the price of that day and the base price. This results in a normalization of the sales to decrease sales proportionately when prices increase and inversely when prices decrease.

Anomaly Detection

The data used to train a forecasting model should be as clean as possible to yield the best results. A product may sell very well on a particular day due to a holiday (i.e. Amazon Prime Day, Black Friday, etc.) or a one day promotion. But without known causes for a day of sales to be much greater/lesser than its neighboring days of sales (for example due to a competitor being stocked-out or, conversely, a competitor running a major promotion), it is often beneficial to replace this data point with a more reasonable value as to not materialize the same event occurring in the future.

The Flieber Approach:

A common approach to outlier detection is to compute the interquartile range (IQR) based on historical sales data. Data points more than 1.5x the IQR are labeled as outliers/anomalies. Those data points are then replaced with sales similar to neighboring data points. As the sales history of a product may not be stationary, we need to ensure that the outlier detection is not confounded by expected patterns such as holidays. Simply running the summary on sales would not work for sales histories with trends or seasonal patterns. Instead, a smoothed representation of the sales history is laid on top of the sales history and the residuals’ percentage difference between the actual and smoothed values are recorded. The outlier detection is then computed on the residuals.

Moving Holiday

Some holidays, like Christmas and Valentine’s Day, may have a high seasonal impact on demand and simple models that understand seasonality can pick up this seasonality easily, since they occur on the same day of the year every year.

However, some events like Amazon Prime Day, Black Friday, Cyber Monday, etc., may have a significant impact on demand but the date changes yearly. More sophisticated, multivariate models that accept holidays as an additional feature and utilize them in their forecasts can handle moving holidays. This step is to aid univariate models, which do not accept additional features other than sales history, to capture these moving events/holidays better.

The Flieber Approach:

For every year in the sales history, we determine the unit lift of the moving holidays/events and represent it in the upcoming date of the event so the univariate models can realize its seasonality as a regular day of year seasonality.

Level Shift

A simple decomposition of a time series isolates level, trend, seasonality, and some error/noise. In some instances, an unexpected major shift in level may be observed without any meaningful change in seasonality, pricing, or trend. This could be due to a viral campaign, acquisition of a product by a new seller with a bigger marketing budget, etc. Even though the initial sales at this new level may seem like outliers, this new level sticks as time passes. To get the best forecast accuracy, it is best to identify when these level shifts happen and identify them as soon as possible so the forecast reflects the more recent sales pace. Additionally, identifying the level shift prevents assuming a similar shift in the forecast if it’s not likely.

The Flieber Approach:

Not all products would benefit from this preprocessing step — so, it is reserved for the products which would be better forecasted when applied, which is decided based on a group of internal metrics. The first step is to detect the change points in the sales history, which is a day in which the sales level has shifted greatly either up or down. Detection of these changepoints can be done when observing the rate of change in a window of data points versus the overall cumulative sum’s rate of change. After the latest changepoint is identified, all previous history is disregarded and everything after the changepoint is fed into the forecast models.

PREDICTION

Once the sales history has been preprocessed to better represent the true demand history, the data is processed by our state-of-the-art forecasting machine.

Flieber operated for the first five years with a forecasting model portfolio that included both industry standard models, such as ADIDA, ARIMA, Croston, among many others — and specialized proprietary models.

At the end of 2024, we signed an exclusive partnership with Nixtla (https://www.nixtla.io), one of the most important time-series boutiques in the world, to develop a state-of-the-art custom AI transformer model based on their TimeGPT model, a generative AI (GenAI) model specifically designed to learn temporal patterns from diverse datasets and generate accurate forecasts for a wide range of products. This new model presented a 36% improvement in accuracy compared to Flieber’s old portfolio of 16 models, and was fully deployed to all customers by mid-2025.

Model Description

TimeGPT is a proprietary foundation model for time series forecasting developed and owned by Nixtla. It was first released in October 2023 by Garza, Challu, and Mergenthaler-Canseco (2023).
The model is trained to generalize across domains and temporal structures, enabling it to deliver high-quality forecasts even in zero-shot settings (i.e., without retraining on domain-specific data).

Architecture

TimeGPT’s architecture follows a Generative Pretrained Transformer (GPT) design for time series forecasting. It builds on the self-attention mechanism introduced by Vaswani et al. (2017), and uses a full encoder-decoder structure. The encoder captures temporal dependencies and patterns from past observations, while the decoder uses this information to generate future values in an autoregressive manner.

Input Processing and Encoding

Time series data are first converted into vector representations that capture temporal dynamics. Positional information is incorporated to preserve the order of observations, and the encoder layers identify key structures such as trends and seasonality across time.

Prediction and Generative Output

The decoder generates forecasts sequentially, ensuring each step depends only on past information. Each prediction is fed back into the model to produce the next, and a final projection layer transforms these internal representations into forecasted values.

Unlike large language models trained on text, TimeGPT is purpose-built for temporal data — an original transformer optimized specifically for minimizing forecasting errors.

Customized Solution for Flieber

For Flieber, TimeGPT has been customized as a temporal hierarchical ensemble model that uses hierarchical reconciliation and seasonal adjustments to better capture the demand patterns in retail sales data.

Data Preprocessing

First, the model applies a set of transformers, as described in the “Preprocessing” section above.

Before entering the temporal hierarchical ensemble, the input data undergoes the following additional preprocessing steps:

Missing dates: Missing dates are identified and filled using the fill_gaps() function from Nixtla’s utilsforecast library. 
Missing values: Missing values in the target variable (sales) are filled with zeros, assuming no sales occurred on those days. 
Leading zeros: Leading zeros are removed to avoid distorting the forecast with inactive periods. 
Outlier removal: An outlier removal step is applied using the 0.05 and 0.95 quantiles. Specifically, all values in the target variable below or above these quantiles are replaced with the respective quantile values.

These preprocessing steps ensure that the model receives clean and consistent time series data before generating forecasts.

Ensemble and Hierarchical Reconciliation

The customized Flieber model incorporates both hierarchical reconciliation and ensemble modeling:

Hierarchical reconciliation:  The model performs temporal reconciliation using forecast proportions between daily and weekly forecasts. This ensures coherence across temporal levels, so that the sum of daily forecasts corresponds to the weekly forecasts.
Ensemble modeling:  Each temporal level (daily and weekly) combines TimeGPT with a seasonal model designed to capture recurring patterns such as holiday-driven demand spikes.

The resulting forecasts are produced at the target level: tenant-seller-product. Although the model uses a hierarchical ensemble, the final outputs correspond to this specific target level.

Holiday Calendar Integration

This customized version of TimeGPT uses a holiday calendar to account for the impact of holidays or special events on sales. Past holidays help the model learn the historical effects of these events on demand.  Future holidays are used to anticipate expected sales spikes.

The model supports moving holidays (those that fall on different dates each year) and can model multi-day holiday effects, reflecting the fact that holiday-related demand changes typically extend over several days.

When the Flieber forecast is unable to accurately predict a product due to context or factors that are currently unable to be fed to the models, the Flieber platform provides multiple levers a user can use to adjust the forecast, including the managed events and other forecast models other than the Flieber forecast.

If a user wishes to upload their own forecast for certain products and allow Flieber to predict the tail end or in general use just their forecast while benefiting from the rest of Flieber’s offerings, that is easy to do with the Custom Import Forecast feature. The feature can accept forecasts at the daily or monthly level and at the product level with or without the associated selling account, which Flieber will then help disperse the product’s prediction between its different selling accounts for the user.

POSTPROCESSING

After a forecast has been created, Flieber runs processes to further increase their accuracy and performance.

Disaggregation

Forecasting at a higher level of aggregation tends to boost forecast accuracy and also helps with computation efficiency, as it condenses the number of inputs to consider when crafting the forecast. Flieber’s model relies on aggregated data so this post processing step will convert them back into daily sales predictions.

The Flieber Approach:

To convert weekly data into daily data points, the “day of week” seasonality factors are calculated based on their average sales level. For example, Monday’s average sales level is computed and then compared to the other days of the week. The aggregated values are then dispersed to the daily level and the seasonality factors are applied to provide the common fluctuation of the sales history.

First Weeks’ Smoothing

A good forecast is often not too sensitive to the current sales’ pace. Otherwise, the forecast could change greatly every day. On the other hand a forecast should also reflect the recent sales level to be more accurate and believable. Using a smoothing process that connects the latest sales’ pace to the forecast, the forecast can maintain its level when recent sales increase/decrease but are expected to rebound to the forecast’s level. Smoothing forecast values to demonstrate the eventual unification of sales’ pace and forecast level tends to increase accuracy.

The Flieber Approach:

The smoothing process uses the average fluctuation observed in the last 30 days along with the ending level of the sales history. From these two statistics, an acceptable range of where to expect the forecast to be can be determined. Following this acceptable range, the date when the forecast enters this acceptable range can be recorded. Once the forecast is in the acceptable range, a weighted average connects the ending sales level to the forecast in a smoothed fashion. This process respects seasonal periods and other elements that impact on the variance of the sales.

PERFORMANCE

When a forecast is produced in the Flieber pipeline, some forecast error metrics are recorded to allow the data team to monitor its performance. Additionally, the Flieber team measures the forecast value add (FVA) of preprocessing steps to ensure each step improves accuracy of our models. Another benefit is the ability to quantify improvements done to their respective algorithms on accuracy.

Flieber’s model is capable of generating a test set to see how it would have performed on historical data. The test set is defined using the following parameters:

train_end_date (str): Final date of the training window.
test_start_date (str): First date of the test period.
test_num_days (int): Number of forecast days in the test period.

This allows the model to measure accuracy for specific periods depending on the parameters of the product — for example, a product that has a lead time of 60 days and a coverage period of 30 days could have the accuracy metric measure for the period between 60 and 90 days from today, making the metric a lot more connected with the reality of that product.

Accuracy Metrics

The model evaluation includes two primary accuracy metrics: Mean Absolute Error Percentage (MAE%) and Mean Bias Percentage (Bias%).

where yi represents the actual observed value, and yi represents the forecasted value.

Qualitative Classification

To complement quantitative metrics, forecasts in the test set are qualitatively categorized, as follows:

Excellent
Very Good
Good
Fair
Poor

This categorization is done based on predefined benchmarks for each product tier (A, B, C and D), which are configurable at the tenant level. By separating out the top sellers (A tier) with the lower sellers (D tier), Flieber can appropriately prioritize importance and develop processes with a products’ tier in mind.

PROMOTION

Following the creation of the forecast and recording of its performance, Flieber has a formal process in place to guarantee the quality of what is promoted to production. This includes gating some of the processes to avoid the promotion of forecasts that are performing worse than their previous version. Any forecasts that are already performing well will either not be replaced or the original forecast will be extended to ensure the forecast persists for over a year out.

REPORTING

To aid in monitoring and getting more value out of the Flieber forecast and the byproducts from it (such as the stock-out substitution), Flieber is currently producing different reports that surface to customers the metrics we use to evaluate the results of the forecasts. These reports will include:

Current Forecast Accuracy: This report will contain the accuracy of the forecast of each product, according to the description of the “Performance” section above.

Forecast Historical Accuracy: Every month, Flieber saves the current version of the forecast of each product, both with and without manual adjustments performed by the user. This report will allow users to follow how each version actually performed in reality.

Forecast Comparison: This report will compare the accuracy of different models, such as their own custom models with Flieber’s AI model with a 30-day moving average, for example. This will allow users to better understand which model performs better for each product.

Lost Sales Report: This report gives users visibility into the stockout rate and total lost sales (both in units and in cash) over the past when comparing actual sales against our stockout substituted demand estimation.

As soon as each of these reports are deployed, Flieber’s team will notify all customers.

If you have any questions or concerns, feel free to reach out to us by clicking on the chat in the lower right corner of Flieber.

How does Flieber calculate the Forecast?

Learn the details of Flieber’s complex calculations to ensure high-quality forecasts for your products.

INTRODUCTION

PREPROCESSING

False Zero (Stockout) Substitution

Price Normalization

Anomaly Detection

Moving Holiday

Level Shift

PREDICTION

Model Description

Architecture

Input Processing and Encoding

Prediction and Generative Output

Customized Solution for Flieber

Data Preprocessing

Ensemble and Hierarchical Reconciliation

Holiday Calendar Integration

POSTPROCESSING

Disaggregation

First Weeks’ Smoothing

PERFORMANCE

Accuracy Metrics

Qualitative Classification

PROMOTION

REPORTING