Contents

  1. Abstract
  2. Learnings from mid-term project
  3. Data description and data summary
  4. Croston’s Method
  5. Classification of time series
  6. Forecasting using Croston’s method
  7. Cost function and optimization in Croston’s method
  8. Zero-inflated time series models
  9. Zero-inflated poisson autoregression
  10. Zero-inflated negative binomial autoregression
  11. Dynamic zero-inflated models
  12. Conclusions
  13. Challenges faced and learnings from the project
  14. References

Abstract

SKU forecasting has been used extensively by retail chains even before the emergence of the field of analytics and is an important aspect of the retail business. Forecasting and Inventory optimization is successful for especially products with high demand and high shelf life. Over the period of times, the type of products available to the customer has increased exponentially . Few products are fast moving and other are relatively slow moving .Forecasting sales for slow moving SKU’s has been found to be quite difficult. The problem is furthur aggravted if the SKU’s have lower shelf life (perishable goods like fruits and vegetables) and are slow moving ( goods that serve a niche segment of customer or are bought occasionally)

Learnings from the Mid term project

In the midterm project we observed that , linear state space models as well as holt winter’s smoothing method worked fine with the SKU’s which were fast moving. However, a large fraction of SKU’s in our dataset has demand which is intermittent at various levels. Using conventional time series methods to forecast time series with intermittent demand introduces a bias in estimation . The extent of bias depends on the number of zeroes, pattern of their occurance and variance of the entire series . Therefore, we need to specify our model in a different way for such time series .

This part of the project was more experimental in nature as we did not know exactly what may work . We may still not be in a situation to coment that what is the best way to forecast such time series but the journey of exploration was quite interesting and insightful at the same time.

Dataset

For this part of project we have selected those SKU’s with intermittent demand at different levels and analysing those time series would be the major focus of this part of project . We select a subset of 15 SKU’s with different degree of intermittance.

Data Summary

##   AMERICAN.SHELLED.CORN AMERICAN.SWEET.CORN APPLE.PACK BABY.CORN.PACK
## 1                     0                   2         15              0
## 2                     1                   3         16              4
## 3                     0                   1         28              2
## 4                     0                   3         26              5
## 5                     0                   2         36              1
## 6                     0                   2         26              3
##   BABY.CORN.PEELED.PACK BANANA.FLOWER.UNIT BANANA.RAW.PC BEANS.BUSH
## 1                     2                  0             4         15
## 2                     1                  0             3          0
## 3                     0                  1             3         13
## 4                     1                  1             3          5
## 5                     6                  3             2          8
## 6                     0                  0             3         10
##   BEANS.CLUSTER BRINJAL.NAGPUR.250GM BRINJAL.STRIPPED.VARI....250.GMS.PK
## 1             0                    6                                   2
## 2             0                    1                                   1
## 3             1                    2                                   7
## 4             2                    4                                   0
## 5             4                    7                                   5
## 6             5                    2                                   6
##   CARROT.PLAIN.500GM CAULIFLOWER.MEDIUM.UNIT CHILLI.GREEN
## 1                 12                       5           11
## 2                  7                       9           11
## 3                 10                       7           12
## 4                  9                       4           10
## 5                 12                       3           12
## 6                 14                       5           23
##   CORIANDER.LEAF.PC.KOL
## 1                    11
## 2                    17
## 3                     0
## 4                    11
## 5                    17
## 6                    14
##  AMERICAN.SHELLED.CORN AMERICAN.SWEET.CORN   APPLE.PACK    
##  Min.   :0.000         Min.   : 0.000      Min.   : 0.000  
##  1st Qu.:0.000         1st Qu.: 1.000      1st Qu.: 0.000  
##  Median :0.000         Median : 3.000      Median : 3.500  
##  Mean   :0.163         Mean   : 4.011      Mean   : 7.283  
##  3rd Qu.:0.000         3rd Qu.: 5.000      3rd Qu.:13.000  
##  Max.   :3.000         Max.   :25.000      Max.   :36.000  
##  BABY.CORN.PACK  BABY.CORN.PEELED.PACK BANANA.FLOWER.UNIT BANANA.RAW.PC   
##  Min.   :0.000   Min.   :0.000         Min.   :0.0000     Min.   : 0.000  
##  1st Qu.:0.000   1st Qu.:0.000         1st Qu.:0.0000     1st Qu.: 1.000  
##  Median :1.000   Median :1.000         Median :0.0000     Median : 3.000  
##  Mean   :1.109   Mean   :1.457         Mean   :0.3478     Mean   : 2.783  
##  3rd Qu.:2.000   3rd Qu.:2.000         3rd Qu.:0.0000     3rd Qu.: 4.000  
##  Max.   :5.000   Max.   :6.000         Max.   :3.0000     Max.   :12.000  
##    BEANS.BUSH     BEANS.CLUSTER   BRINJAL.NAGPUR.250GM
##  Min.   : 0.000   Min.   :0.000   Min.   : 0.000      
##  1st Qu.: 2.750   1st Qu.:1.000   1st Qu.: 1.000      
##  Median : 5.000   Median :1.000   Median : 2.000      
##  Mean   : 4.946   Mean   :1.967   Mean   : 2.717      
##  3rd Qu.: 7.000   3rd Qu.:3.000   3rd Qu.: 4.000      
##  Max.   :15.000   Max.   :8.000   Max.   :11.000      
##  BRINJAL.STRIPPED.VARI....250.GMS.PK CARROT.PLAIN.500GM
##  Min.   : 0.000                      Min.   : 0.000    
##  1st Qu.: 1.000                      1st Qu.: 5.000    
##  Median : 3.000                      Median : 8.000    
##  Mean   : 4.098                      Mean   : 7.989    
##  3rd Qu.: 5.000                      3rd Qu.:10.000    
##  Max.   :26.000                      Max.   :16.000    
##  CAULIFLOWER.MEDIUM.UNIT  CHILLI.GREEN CORIANDER.LEAF.PC.KOL
##  Min.   : 0.000          Min.   : 0    Min.   : 0.00        
##  1st Qu.: 3.000          1st Qu.: 6    1st Qu.: 8.75        
##  Median : 4.000          Median : 9    Median :10.00        
##  Mean   : 5.228          Mean   : 9    Mean   :10.17        
##  3rd Qu.: 7.000          3rd Qu.:11    3rd Qu.:12.00        
##  Max.   :18.000          Max.   :23    Max.   :22.00

We now see the distribution of each of these SKU’s

From the above histograms we observe that the frequency of 0’s and 1’s is high for most of the SKU’s we have chosen.

Croston’s method

One of the novel and earliest methods for forecasting intermittent demand time series was suggested by Croston in 1972 and various modification of the method have been introduced later.

Instead of treating an intermittent demand time series as a single series, it divides the series into two parts

  1. The non zero demand size \(z_t\)
  2. The inter demand intervel \(x_t\) ( can be specified as a binary flag variable)

Both these series are independently modelled and respective parameters are estimated independently . Though there have been papers suggesting that the independence assumptions does not hold true in many case. However, for majority of time series this works fine.

The forecasted value \(y_t\) is given by \(z_t\) / \(x_t\)

An import point to make here is the forecasted values we obtain are not actual demand values , rather they can be called “Demand rate” . The forecasted value is the average expected demand in each future period. For example a forecast of 0.2 should be seen as a demand of 2 unit over 10 periods, a demand rate of 0.2.

Why demand rate makes more sense ?

  1. It is hard to obtain a point forecast for each point due to the intermittent nature of data . Demand rate provides more confidence to the consumer of forecast results as compared to the point foreasts.

  2. Demand forecasts are usually used for inventory optimization . Inventory is optimized over a longer period than just a day (so not having a great point forecast doesn’t make us terrible ). In fact, Croston’ method derives it’s popularity from the same idea. It does a really poor job in providing accurate point estimates but does a good job after the forecasts are aggregated over lead time (a week or a month). We will see in the plots that point forecasts are very off from the actual data . The reason goes back to the optimization and cost function of Croston method. We will discuss about it later. (N. Kourentzes, 2014, International Journal of Production Economics, 156: 180-190.)

Classification of demand series based on the structure of time series

Even though we observe that all series we analyse have intermittent demand to some extent. However , the degree varies. We know that in forecasting “One size fits all approach” doesn’t work too well. Different forecasting approaches work well for different kinds of time series.

We classify our time series based on approach discussed by (G. Heineckea, A.A. Syntetosc, W. Wangd) in the paper “Forecasting-based SKU classification” (Link to paper avialable in the references)

## $idx.croston
## [1] 12 14 15
## 
## $idx.sba
##  [1]  1  2  3  4  5  6  7  8  9 10 11 13
## 
## $idx.ses
## integer(0)
## 
## $cv2
##  [1] 0.2444444 1.0509506 0.6386768 0.3903141 0.3560606 0.2578125 0.3891416
##  [8] 0.3178600 0.4606871 0.4012476 0.9897410 0.2207788 0.4301667 0.2014065
## [15] 0.1079016
## 
## $p
##  [1] 8.000000 1.277778 1.475410 1.542373 1.343284 3.454545 1.150000
##  [8] 1.139241 1.295775 1.197368 1.194805 1.022222 1.071429 1.045455
## [15] 1.033708
## 
## $summary
##         Series
## Croston      3
## SBA         12
## SES          0

## $idx.croston
## [1]  7  8  9 10 12 13 14 15
## 
## $idx.sba
## [1]  1  2  3  4  5  6 11
## 
## $idx.ses
## NULL
## 
## $cv2
##  [1] 0.2444444 1.0509506 0.6386768 0.3903141 0.3560606 0.2578125 0.3891416
##  [8] 0.3178600 0.4606871 0.4012476 0.9897410 0.2207788 0.4301667 0.2014065
## [15] 0.1079016
## 
## $p
##  [1] 8.000000 1.277778 1.475410 1.542373 1.343284 3.454545 1.150000
##  [8] 1.139241 1.295775 1.197368 1.194805 1.022222 1.071429 1.045455
## [15] 1.033708
## 
## $summary
##         Series
## Croston      8
## SBA          7

We see above that the 15 series we analyse are classified for analysis by two different methods Croston and SBA . The method classifies time series into 4 categories

  1. Smooth
  2. Erratic
  3. Lumpy
  4. Intermittent

We see that most of our time series lie in lumy or smooth category .So, now we know which of the methods within the Croston family of models is best for each .

The way this classification works is based on two factors as it can be seen from the above figure.

  1. \(Cv\) is the coefficient of variation of the time series
  2. \(p\) is the average inter demand intervel for a time series

Forecasts for each SKU

Forecasts based on Croston’s method

## [1] "BANANA.RAW.PC"

## [1] "BEANS.BUSH"

## [1] "BEANS.CLUSTER"

## [1] "BRINJAL.NAGPUR.250GM"

## [1] "CARROT.PLAIN.500GM"

## [1] "CAULIFLOWER.MEDIUM.UNIT"

## [1] "CHILLI.GREEN"

## [1] "CORIANDER.LEAF.PC.KOL"

Forecast based on SBA method (a modificaton of Croston’s method)

## [1] "AMERICAN.SHELLED.CORN"

## [1] "AMERICAN.SWEET.CORN"

## [1] "APPLE.PACK"

## [1] "BABY.CORN.PACK"

## [1] "BABY.CORN.PEELED.PACK"

## [1] "BANANA.FLOWER.UNIT"

## [1] "BRINJAL.STRIPPED.VARI....250.GMS.PK"

Optimization and cost function

Discussion on optimization and cost function is important in this case as the forecasts we get are not directly interpretable as forecasts by any other method.

As we know that the commonly used cost functions are not holy grail and this is one interesting problem where the cost function is decided based on the problem we are trying to address, i.e , inventory optimization.

Means square error is the most commonly used cost function but is susceptible to be influenced by extreme points. Mean absolute error is another which is robust to extreme points but is influenced by the zeroes in the data and so again is not suitable for our case

In order to address these issues Wallstrom and Segerstedt (2010) introduced a new metric called Period of Stock (PIS) which is total number of periods a unit is instock or out of stock . This can be treated as a measure of error in forecast and hence could be used in optimization. (N. Kourentzes, 2014, International Journal of Production Economics, 156: 180-190.)

\[\begin{equation} PIS = n|\sum_{i=1}^{h} \sum_{j=1}^{i} (y_i - \hat{y_j})|/ \sum_{k=1}^{n} y_k \end{equation}\]

This is the reason that this cost function might not lead to the best point estimate but an optimum estimate over the lead time of inventory .

Models for zero inflated data

As we discussed that having zero inflation in data is not uncommon in industry . Not accounting for zero inflation in data may lead to biased estimates and spurious associations and poor forecasts in turn.

We first discuss Zero inflated poisson autoregression

Zero inflated poisson autoregression

As done in Croston’s method we model our time series as two series , one is the demand series and the other the series of inter demand intervels .

Building on similar idea , ZIP is a hierarchical time series model

It is a mixture model with one of the component being poisson distribution ( to model actual demand counts) and other component being a degenerate distribution with point mass at zero ( to model zeroes)

\(u_t\) is the binary variable which indicates if a value is a positive demand value or not. This is considered as unobserved.

\(Y_t\) is the observed value and is poisson distributed given the value of \(u_t\)

\[\begin{equation} u_t \sim Bernoulli(w_t) \end{equation}\]

\[\begin{equation} Y_t \sim Poisson(\lambda_t, u_t) \end{equation}\]

The below is the table showing the errors

V1 err
AMERICAN.SHELLED.CORN 0.281683925861301
AMERICAN.SWEET.CORN 3.26473051753271
APPLE.PACK 4.73904500651733
BABY.CORN.PACK 0.890507323420779
BABY.CORN.PEELED.PACK 1.09124484548887
BANANA.FLOWER.UNIT 0.488125907832427
BANANA.RAW.PC 1.65537448535313
BEANS.BUSH 2.73543341444909
BEANS.CLUSTER 1.4590977770939
BRINJAL.NAGPUR.250GM 1.65479961910149
BRINJAL.STRIPPED.VARI….250.GMS.PK 3.16645103458673
CARROT.PLAIN.500GM 2.73454981368135
CAULIFLOWER.MEDIUM.UNIT 2.87215367086459
CHILLI.GREEN 3.03075814989506
CORIANDER.LEAF.PC.KOL 2.58821108955083

Observation from the model

We observe that the errors are on the higher side, though the inclusion of zero inflation distribution helps to reduce effect of zeroes on the estimates but we see that the data has a problem of over dispersion and the poisson model kind of underestimates the value of observation in general. We will try the same class of models with hierarchical negative binomial models

Zero inflated negative binomial autoregression

It is a mixture model with one of the component being Negative binomial distribution ( to model actual demand counts) and other component being a degenerate distribution with point mass at zero ( to model zeroes)

\(u_t\) is the binary variable which indicates if a value is a positive demand value or not. This is considered as unobserved.

Negative binomial distribution helps in modeling the over dispersion and we have observed that our data is over dispersed in many cases so we hope that it shows better results as compared to poisson error distribution

\(Y_t\) is the observed value and is Negative binomial distributed given the value of \(u_t\)

\[\begin{equation} u_t \sim Bernoulli(w_t) \end{equation}\]

\[\begin{equation} Y_t \sim NegBinom(\lambda_t, u_t) \end{equation}\]

Let us see the results from this model

Let us see the errors for this model

V1 err
AMERICAN.SHELLED.CORN 0.2816839260673
AMERICAN.SWEET.CORN 3.24783129571089
APPLE.PACK 4.66968893580904
BABY.CORN.PACK 0.889816225547805
BABY.CORN.PEELED.PACK 1.09062725523836
BANANA.FLOWER.UNIT 1.66607141260766
BANANA.RAW.PC 2.7359110141486
BEANS.BUSH 1.45713362974754
BEANS.CLUSTER 1.6574921797012
BRINJAL.NAGPUR.250GM 2.74464893418113
BRINJAL.STRIPPED.VARI….250.GMS.PK 2.87578181751534
CARROT.PLAIN.500GM 3.02484894310362
CAULIFLOWER.MEDIUM.UNIT 2.5882960465169
CHILLI.GREEN 0.2816839260673
CORIANDER.LEAF.PC.KOL 3.24783129571089

Observations from negative binomial model

We see that this model accounts for the over dispersion in the data and does slightly better than the poisson model . However, we see that the these models are still not adaptively learn the sudden changes in th series unlike the structural time series models analysed in the mid term project though these models do a good job in penalizing the effect of zeroes on model fit.

This provides us motivation to build a family of models that considers zero inflation as well as are dynamic in nature and are able to learn abrupt changes in th series .

We will analyze dynamic zero inflated poisson and negative binomial models. These use plug and play methods to estimate the parameters using sequential monte carlo . Therefore, it won’t be possible to run them for all the series due to enormous amount of computation time required.

Dynamic zero-inflated models

I tried fitting a combination of state space model and zero inflated model in non-linear pomp framework but the code broke multiple times till the last moment . It took almost 7 hours for the iteration to run on a single SKU so I did not find it feasible to run and analyse them . However, I am going to work on these models during summers and update you on the progress. However, I found that this can be done in R with ZIM package but the package is incredibly slow and unstable as of now.

Conclusion

We conclude that a combination of zero inflated models along with state space modeling framework can yield best results. A very importatnt learning is that in such problems collaboaration with domain experts (supply chain in this case) can help build better models. Another very important thing to conclude was , it is hard to comment in such a problem about the forecast error without prior knowledge of lead time as it may be possible that even with a poor point forecast in general a model may do well on forecast over the lead time (aggregrated forecast) and a model with good point forecast may be inferior at the level of lead time . This makes us conclude that forecasting and inventory optimization has to go hand in hand in such problems.

Challenges faced and learning in the project

  1. The non linear methods are computationally intensive which does not allow a lot of experimentation . Running a single model takes about 7 hours . Secondly,Less understnading of non linear POMP package and sequential Monte carlo techniques restricts trying new ideas. It is easier to build models similar to present in case studies , However , it is difficult to build a model with a dataset like mine with not so thourough understanding of pomp internally.

  2. This also serves as a motivation to make an effort to thoroughly understand liklihood evaluation through sequential monte carlo techniques.

  3. This project taught me that sometimes conventional statistical diagnostics/error functions may not yield desired results and it becomes imminent in such cases to collaboare with domain experts for right model selection and getting outputs that can be used by the end users .