SKU forecasting has been used extensively by retail chains even before the emergence of the field of analytics and is an important aspect of the retail business. Forecasting and Inventory optimization is successful for especially products with high demand and high shelf life. Over the period of times, the type of products available to the customer has increased exponentially . Few products are fast moving and other are relatively slow moving .Forecasting sales for slow moving SKU’s has been found to be quite difficult. The problem is furthur aggravted if the SKU’s have lower shelf life (perishable goods like fruits and vegetables) and are slow moving ( goods that serve a niche segment of customer or are bought occasionally)
In the midterm project we observed that , linear state space models as well as holt winter’s smoothing method worked fine with the SKU’s which were fast moving. However, a large fraction of SKU’s in our dataset has demand which is intermittent at various levels. Using conventional time series methods to forecast time series with intermittent demand introduces a bias in estimation . The extent of bias depends on the number of zeroes, pattern of their occurance and variance of the entire series . Therefore, we need to specify our model in a different way for such time series .
This part of the project was more experimental in nature as we did not know exactly what may work . We may still not be in a situation to coment that what is the best way to forecast such time series but the journey of exploration was quite interesting and insightful at the same time.
For this part of project we have selected those SKU’s with intermittent demand at different levels and analysing those time series would be the major focus of this part of project . We select a subset of 15 SKU’s with different degree of intermittance.
## AMERICAN.SHELLED.CORN AMERICAN.SWEET.CORN APPLE.PACK BABY.CORN.PACK
## 1 0 2 15 0
## 2 1 3 16 4
## 3 0 1 28 2
## 4 0 3 26 5
## 5 0 2 36 1
## 6 0 2 26 3
## BABY.CORN.PEELED.PACK BANANA.FLOWER.UNIT BANANA.RAW.PC BEANS.BUSH
## 1 2 0 4 15
## 2 1 0 3 0
## 3 0 1 3 13
## 4 1 1 3 5
## 5 6 3 2 8
## 6 0 0 3 10
## BEANS.CLUSTER BRINJAL.NAGPUR.250GM BRINJAL.STRIPPED.VARI....250.GMS.PK
## 1 0 6 2
## 2 0 1 1
## 3 1 2 7
## 4 2 4 0
## 5 4 7 5
## 6 5 2 6
## CARROT.PLAIN.500GM CAULIFLOWER.MEDIUM.UNIT CHILLI.GREEN
## 1 12 5 11
## 2 7 9 11
## 3 10 7 12
## 4 9 4 10
## 5 12 3 12
## 6 14 5 23
## CORIANDER.LEAF.PC.KOL
## 1 11
## 2 17
## 3 0
## 4 11
## 5 17
## 6 14
## AMERICAN.SHELLED.CORN AMERICAN.SWEET.CORN APPLE.PACK
## Min. :0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.:0.000 1st Qu.: 1.000 1st Qu.: 0.000
## Median :0.000 Median : 3.000 Median : 3.500
## Mean :0.163 Mean : 4.011 Mean : 7.283
## 3rd Qu.:0.000 3rd Qu.: 5.000 3rd Qu.:13.000
## Max. :3.000 Max. :25.000 Max. :36.000
## BABY.CORN.PACK BABY.CORN.PEELED.PACK BANANA.FLOWER.UNIT BANANA.RAW.PC
## Min. :0.000 Min. :0.000 Min. :0.0000 Min. : 0.000
## 1st Qu.:0.000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.: 1.000
## Median :1.000 Median :1.000 Median :0.0000 Median : 3.000
## Mean :1.109 Mean :1.457 Mean :0.3478 Mean : 2.783
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:0.0000 3rd Qu.: 4.000
## Max. :5.000 Max. :6.000 Max. :3.0000 Max. :12.000
## BEANS.BUSH BEANS.CLUSTER BRINJAL.NAGPUR.250GM
## Min. : 0.000 Min. :0.000 Min. : 0.000
## 1st Qu.: 2.750 1st Qu.:1.000 1st Qu.: 1.000
## Median : 5.000 Median :1.000 Median : 2.000
## Mean : 4.946 Mean :1.967 Mean : 2.717
## 3rd Qu.: 7.000 3rd Qu.:3.000 3rd Qu.: 4.000
## Max. :15.000 Max. :8.000 Max. :11.000
## BRINJAL.STRIPPED.VARI....250.GMS.PK CARROT.PLAIN.500GM
## Min. : 0.000 Min. : 0.000
## 1st Qu.: 1.000 1st Qu.: 5.000
## Median : 3.000 Median : 8.000
## Mean : 4.098 Mean : 7.989
## 3rd Qu.: 5.000 3rd Qu.:10.000
## Max. :26.000 Max. :16.000
## CAULIFLOWER.MEDIUM.UNIT CHILLI.GREEN CORIANDER.LEAF.PC.KOL
## Min. : 0.000 Min. : 0 Min. : 0.00
## 1st Qu.: 3.000 1st Qu.: 6 1st Qu.: 8.75
## Median : 4.000 Median : 9 Median :10.00
## Mean : 5.228 Mean : 9 Mean :10.17
## 3rd Qu.: 7.000 3rd Qu.:11 3rd Qu.:12.00
## Max. :18.000 Max. :23 Max. :22.00
We now see the distribution of each of these SKU’s
From the above histograms we observe that the frequency of 0’s and 1’s is high for most of the SKU’s we have chosen.
One of the novel and earliest methods for forecasting intermittent demand time series was suggested by Croston in 1972 and various modification of the method have been introduced later.
Instead of treating an intermittent demand time series as a single series, it divides the series into two parts
Both these series are independently modelled and respective parameters are estimated independently . Though there have been papers suggesting that the independence assumptions does not hold true in many case. However, for majority of time series this works fine.
The forecasted value \(y_t\) is given by \(z_t\) / \(x_t\)
An import point to make here is the forecasted values we obtain are not actual demand values , rather they can be called “Demand rate” . The forecasted value is the average expected demand in each future period. For example a forecast of 0.2 should be seen as a demand of 2 unit over 10 periods, a demand rate of 0.2.
Why demand rate makes more sense ?
It is hard to obtain a point forecast for each point due to the intermittent nature of data . Demand rate provides more confidence to the consumer of forecast results as compared to the point foreasts.
Demand forecasts are usually used for inventory optimization . Inventory is optimized over a longer period than just a day (so not having a great point forecast doesn’t make us terrible ). In fact, Croston’ method derives it’s popularity from the same idea. It does a really poor job in providing accurate point estimates but does a good job after the forecasts are aggregated over lead time (a week or a month). We will see in the plots that point forecasts are very off from the actual data . The reason goes back to the optimization and cost function of Croston method. We will discuss about it later. (N. Kourentzes, 2014, International Journal of Production Economics, 156: 180-190.)
Even though we observe that all series we analyse have intermittent demand to some extent. However , the degree varies. We know that in forecasting “One size fits all approach” doesn’t work too well. Different forecasting approaches work well for different kinds of time series.
We classify our time series based on approach discussed by (G. Heineckea, A.A. Syntetosc, W. Wangd) in the paper “Forecasting-based SKU classification” (Link to paper avialable in the references)
## $idx.croston
## [1] 12 14 15
##
## $idx.sba
## [1] 1 2 3 4 5 6 7 8 9 10 11 13
##
## $idx.ses
## integer(0)
##
## $cv2
## [1] 0.2444444 1.0509506 0.6386768 0.3903141 0.3560606 0.2578125 0.3891416
## [8] 0.3178600 0.4606871 0.4012476 0.9897410 0.2207788 0.4301667 0.2014065
## [15] 0.1079016
##
## $p
## [1] 8.000000 1.277778 1.475410 1.542373 1.343284 3.454545 1.150000
## [8] 1.139241 1.295775 1.197368 1.194805 1.022222 1.071429 1.045455
## [15] 1.033708
##
## $summary
## Series
## Croston 3
## SBA 12
## SES 0
## $idx.croston
## [1] 7 8 9 10 12 13 14 15
##
## $idx.sba
## [1] 1 2 3 4 5 6 11
##
## $idx.ses
## NULL
##
## $cv2
## [1] 0.2444444 1.0509506 0.6386768 0.3903141 0.3560606 0.2578125 0.3891416
## [8] 0.3178600 0.4606871 0.4012476 0.9897410 0.2207788 0.4301667 0.2014065
## [15] 0.1079016
##
## $p
## [1] 8.000000 1.277778 1.475410 1.542373 1.343284 3.454545 1.150000
## [8] 1.139241 1.295775 1.197368 1.194805 1.022222 1.071429 1.045455
## [15] 1.033708
##
## $summary
## Series
## Croston 8
## SBA 7
We see above that the 15 series we analyse are classified for analysis by two different methods Croston and SBA . The method classifies time series into 4 categories
We see that most of our time series lie in lumy or smooth category .So, now we know which of the methods within the Croston family of models is best for each .
The way this classification works is based on two factors as it can be seen from the above figure.
Forecasts based on Croston’s method
## [1] "BANANA.RAW.PC"
## [1] "BEANS.BUSH"
## [1] "BEANS.CLUSTER"
## [1] "BRINJAL.NAGPUR.250GM"
## [1] "CARROT.PLAIN.500GM"
## [1] "CAULIFLOWER.MEDIUM.UNIT"
## [1] "CHILLI.GREEN"
## [1] "CORIANDER.LEAF.PC.KOL"
Forecast based on SBA method (a modificaton of Croston’s method)
## [1] "AMERICAN.SHELLED.CORN"
## [1] "AMERICAN.SWEET.CORN"
## [1] "APPLE.PACK"
## [1] "BABY.CORN.PACK"
## [1] "BABY.CORN.PEELED.PACK"
## [1] "BANANA.FLOWER.UNIT"
## [1] "BRINJAL.STRIPPED.VARI....250.GMS.PK"
Discussion on optimization and cost function is important in this case as the forecasts we get are not directly interpretable as forecasts by any other method.
As we know that the commonly used cost functions are not holy grail and this is one interesting problem where the cost function is decided based on the problem we are trying to address, i.e , inventory optimization.
Means square error is the most commonly used cost function but is susceptible to be influenced by extreme points. Mean absolute error is another which is robust to extreme points but is influenced by the zeroes in the data and so again is not suitable for our case
In order to address these issues Wallstrom and Segerstedt (2010) introduced a new metric called Period of Stock (PIS) which is total number of periods a unit is instock or out of stock . This can be treated as a measure of error in forecast and hence could be used in optimization. (N. Kourentzes, 2014, International Journal of Production Economics, 156: 180-190.)
\[\begin{equation} PIS = n|\sum_{i=1}^{h} \sum_{j=1}^{i} (y_i - \hat{y_j})|/ \sum_{k=1}^{n} y_k \end{equation}\]
This is the reason that this cost function might not lead to the best point estimate but an optimum estimate over the lead time of inventory .
As we discussed that having zero inflation in data is not uncommon in industry . Not accounting for zero inflation in data may lead to biased estimates and spurious associations and poor forecasts in turn.
We first discuss Zero inflated poisson autoregression
As done in Croston’s method we model our time series as two series , one is the demand series and the other the series of inter demand intervels .
Building on similar idea , ZIP is a hierarchical time series model
It is a mixture model with one of the component being poisson distribution ( to model actual demand counts) and other component being a degenerate distribution with point mass at zero ( to model zeroes)
\(u_t\) is the binary variable which indicates if a value is a positive demand value or not. This is considered as unobserved.
\(Y_t\) is the observed value and is poisson distributed given the value of \(u_t\)
\[\begin{equation} u_t \sim Bernoulli(w_t) \end{equation}\]
\[\begin{equation} Y_t \sim Poisson(\lambda_t, u_t) \end{equation}\]
The below is the table showing the errors
V1 | err |
---|---|
AMERICAN.SHELLED.CORN | 0.281683925861301 |
AMERICAN.SWEET.CORN | 3.26473051753271 |
APPLE.PACK | 4.73904500651733 |
BABY.CORN.PACK | 0.890507323420779 |
BABY.CORN.PEELED.PACK | 1.09124484548887 |
BANANA.FLOWER.UNIT | 0.488125907832427 |
BANANA.RAW.PC | 1.65537448535313 |
BEANS.BUSH | 2.73543341444909 |
BEANS.CLUSTER | 1.4590977770939 |
BRINJAL.NAGPUR.250GM | 1.65479961910149 |
BRINJAL.STRIPPED.VARI….250.GMS.PK | 3.16645103458673 |
CARROT.PLAIN.500GM | 2.73454981368135 |
CAULIFLOWER.MEDIUM.UNIT | 2.87215367086459 |
CHILLI.GREEN | 3.03075814989506 |
CORIANDER.LEAF.PC.KOL | 2.58821108955083 |
We observe that the errors are on the higher side, though the inclusion of zero inflation distribution helps to reduce effect of zeroes on the estimates but we see that the data has a problem of over dispersion and the poisson model kind of underestimates the value of observation in general. We will try the same class of models with hierarchical negative binomial models
It is a mixture model with one of the component being Negative binomial distribution ( to model actual demand counts) and other component being a degenerate distribution with point mass at zero ( to model zeroes)
\(u_t\) is the binary variable which indicates if a value is a positive demand value or not. This is considered as unobserved.
Negative binomial distribution helps in modeling the over dispersion and we have observed that our data is over dispersed in many cases so we hope that it shows better results as compared to poisson error distribution
\(Y_t\) is the observed value and is Negative binomial distributed given the value of \(u_t\)
\[\begin{equation} u_t \sim Bernoulli(w_t) \end{equation}\]
\[\begin{equation} Y_t \sim NegBinom(\lambda_t, u_t) \end{equation}\]
Let us see the results from this model
Let us see the errors for this model
V1 | err |
---|---|
AMERICAN.SHELLED.CORN | 0.2816839260673 |
AMERICAN.SWEET.CORN | 3.24783129571089 |
APPLE.PACK | 4.66968893580904 |
BABY.CORN.PACK | 0.889816225547805 |
BABY.CORN.PEELED.PACK | 1.09062725523836 |
BANANA.FLOWER.UNIT | 1.66607141260766 |
BANANA.RAW.PC | 2.7359110141486 |
BEANS.BUSH | 1.45713362974754 |
BEANS.CLUSTER | 1.6574921797012 |
BRINJAL.NAGPUR.250GM | 2.74464893418113 |
BRINJAL.STRIPPED.VARI….250.GMS.PK | 2.87578181751534 |
CARROT.PLAIN.500GM | 3.02484894310362 |
CAULIFLOWER.MEDIUM.UNIT | 2.5882960465169 |
CHILLI.GREEN | 0.2816839260673 |
CORIANDER.LEAF.PC.KOL | 3.24783129571089 |
We see that this model accounts for the over dispersion in the data and does slightly better than the poisson model . However, we see that the these models are still not adaptively learn the sudden changes in th series unlike the structural time series models analysed in the mid term project though these models do a good job in penalizing the effect of zeroes on model fit.
This provides us motivation to build a family of models that considers zero inflation as well as are dynamic in nature and are able to learn abrupt changes in th series .
We will analyze dynamic zero inflated poisson and negative binomial models. These use plug and play methods to estimate the parameters using sequential monte carlo . Therefore, it won’t be possible to run them for all the series due to enormous amount of computation time required.
I tried fitting a combination of state space model and zero inflated model in non-linear pomp framework but the code broke multiple times till the last moment . It took almost 7 hours for the iteration to run on a single SKU so I did not find it feasible to run and analyse them . However, I am going to work on these models during summers and update you on the progress. However, I found that this can be done in R with ZIM package but the package is incredibly slow and unstable as of now.
We conclude that a combination of zero inflated models along with state space modeling framework can yield best results. A very importatnt learning is that in such problems collaboaration with domain experts (supply chain in this case) can help build better models. Another very important thing to conclude was , it is hard to comment in such a problem about the forecast error without prior knowledge of lead time as it may be possible that even with a poor point forecast in general a model may do well on forecast over the lead time (aggregrated forecast) and a model with good point forecast may be inferior at the level of lead time . This makes us conclude that forecasting and inventory optimization has to go hand in hand in such problems.
The non linear methods are computationally intensive which does not allow a lot of experimentation . Running a single model takes about 7 hours . Secondly,Less understnading of non linear POMP package and sequential Monte carlo techniques restricts trying new ideas. It is easier to build models similar to present in case studies , However , it is difficult to build a model with a dataset like mine with not so thourough understanding of pomp internally.
This also serves as a motivation to make an effort to thoroughly understand liklihood evaluation through sequential monte carlo techniques.
This project taught me that sometimes conventional statistical diagnostics/error functions may not yield desired results and it becomes imminent in such cases to collaboare with domain experts for right model selection and getting outputs that can be used by the end users .
http://ir.uiowa.edu/cgi/viewcontent.cgi?article=3166&context=etd https://cran.r-project.org/web/packages/ZIM/ZIM.pdf http://www.bauer.uh.edu/gardner/docs/pdf/Forecasting%20intermittent%20demand%20R2%20(JBR)%203.pdf http://www.sciencedirect.com/science/article/pii/S092552731400190X http://kourentzes.com/forecasting/2014/06/11/on-intermittent-demand-model-optimisation-and-selection/