I used a dataset from a large charity in Europe. Due to non-disclosure agreement, I could not include any data for the submission of this project, and would prefer that this work not be posted online.
Donation behavior has been of keen academic interest in marketing. In particular, many of the works in customer relationship management (CRM) literature focused on donor-charity relationship. Donor-charity relationship is most often considered as a non-contractual relationship in the literature. By non-contractual relationship, researchers refer to the relationship between a firm and its customers where the termination of relationship is not observed by the firm. Take an extreme example, suppose a donor who has been giving for years dies, hence stopping any giving thereafter. This, to the charity, would simply appear as a long period of non-giving. Charities can infer whether the donor is active or inactive based on their giving histories, but cannot be 100% sure whether a donor has decided to permanently stop giving because they do not fully observe their donors’ latent interests.
This nature of donor-charity relationship makes it natural to assume that there is a latent state which governs donation behaviors, and thus suitable for POMP analysis.
For example, Fader, Hardie, and Shang (2010) proposed BG-BB (Beta-geometric - Beta-binomial) model by making the following assumptions:
They show that the model prediction succssesfully traces the trajectory of donation behaviors using summary variables that describe frequency (how many times one gave) and recency (when the last transaction) of donation, and how many donation opportunities the donors had.
Whereas Fader, Hardie, and Shang (2010) assumed that the likelihood of turning permanently inactive was independent at each period (geometric), Netzer, Lattin, and Srinivasan (2008) used Hidden Markov Model to model giving history to one’s alma mater of a large university. The hidden states were the underlying states of alumnus/alumna, which people transitioned across at each period (discrete \(t\)), which then governed the decision at each period of whether to give or not.
However, unlike the panel data structure employed by Netzer, Lattin, and Srinivasan (2008) and Fader, Hardie, and Shang (2010) (to be precise, Fader, Hardie, and Shang (2010) summarized history on an indivdidual level and aggregated the summary variables across population - but this still requires panel data to construct), often researchers may face a situation where the only available data is aggregated, monthly-level data on donation histories. With such aggregated dataset, it would be much more difficult to infer the heterogeneity across donors, which is often of significant interest to marketers, necessarily restricting the complexity of the model.
We can test some of the assumptions made in the previously mentioned works. For example, it is not clear whether assuming \(p\) and \(q\) in Fader, Hardie, and Shang (2010) is reasonable. It is possible that people who are more likely to donate while alive are less likely to drop out from the donor pool.
In this project, I would like to suggest a POMP model of donation behavior on an aggregate level for such cases. It is a simplified version of Fader, Hardie, and Shang (2010), where heterogeneity is ignored. However, the data requirement is much less severe than the panel data, and the result provides a good fit despite its simplicity, suggesting the usefulness of the simplified model.
The data comes from a large charity in Europe, which has individual-level donation history on a daily level. I aggregated the data in the following manner:
First, I focused on people who joined at the same month. As briefly explained above, donors tend to drop out over time; that is, their interest in the cause and in giving to the charity declines over time, eventually entering the so-called ‘death’ state. Therefore, if we include people joining at different time periods, there would be different dynamics of different cohorts are overlaid, making it difficult to identify the dynamic patterns at each cohort-level. Combining all cohorts, I expect that an established charity would have relatively stable amount and count of donation every month, masking the dyanmics within cohorts. I focus on a cohort who first gave in July, 1998, 91 donors for which we have 205 months of history.
Second, I counted the number of people who gave at a particular month, instead of the count of donations. This is to focus on the active vs. inactive (death) states of donors. Furthermore, because 98.16% of all observations were either 0 or 1, with maximum donation counts per month being 5 (1 out of 18655 observations).
Here, I show the time plot for the number of giving donors for each month.
At time 0, which is the timing at which people joined the charity for the first time, everybody (all 91 donors in the cohort) made a donation. As clearly shown in the above plot, donors are dropping out, leading to fewer and fewer donations per month over time. In the following section, I propose a POMP model for this setting.
As described above, previous research has conceptually modeled the customer-firm relationship as having discrete states, which then governs the behaviors of interest, such as transactions, usage, as well as donations. Fader, Hardie, and Shang (2010) assumed that there are two latent states: alive vs. dead (from the firm’s point of view). Netzer, Lattin, and Srinivasan (2008) found that there were three states: active vs. occasional vs. dormant. In both cases, the dead/dormant state was assumed to be an absorbing state, so that once a donor reaches that state, they were no longer dynamic.
Similar to SIR model dealth with in class, I suggest that we could model the process as a compartment model with two underlying states: namely, alive vs. dead states. Both Fader, Hardie, and Shang (2010) and Netzer, Lattin, and Srinivasan (2008) employs discrete time framework, which is not an extreme assumption for many marketing settings, and especially for donation settings because of the low frequency of giving in this context. However, customer relationship has been modeled in a continuous time framework (Schmittlein, Morrison, and Colombo (1987), Fader, Hardie, and Lee (2005)), and it is not unreasonable to imagine people over time gradually lose interest in the charity and stop giving. Hence, following Note 11, I model donor-charity relationship as a compartment model with continuous-time framework.
where \[\begin{gather} A= \text{Alive/active} \\ D = \text{Dead} \end{gather}\]
Moreover, the constant flow rate is modeled as
\[\mu_{AD} = q \text{, where } q\in[0,1]\] Because we are modeling a fixed cohort, there’s no inflow or outflow. This means that \(N = A+D\) is always constant at 91 people.
\[\begin{gather} A(t) &= A(0) - N_{AD}(t) \\ D(t) &= D(0) + N_{AD}(t) \end{gather}\]
Conditional on that a donor is alive, following previous work (Fader, Hardie, and Shang (2010), Netzer, Lattin, and Srinivasan (2008)), I model the data as \[ G_t \sim Binomial(A, p) \] where \(D_t\) refers to the number of donations made at \(t\).
AD_rprocess <- "
double dN_AD = rbinom(A, 1-exp(-q*dt));
A -= dN_AD;
D += dN_AD;
"
AD_init <- "
A = 91;
D = 0;
"
AD_dmeas <- "
lik = dbinom(G, A, p+1e-6, give_log);
"
AD_rmeas <- "
G = rbinom(A, p);
"
AD_fromEstimationScale <- "
Tp = expit(p);
Tq = expit(q);
"
AD_toEstimationScale <- "
Tp = logit(p);
Tq = logit(q);
"
AD_statenames <- c("A", "D")
AD_paramnames <- c("p", "q")
AD_obsnames <- "G"
ad_pomp <- pomp(monthly, time = "month", t0 = 0,
rprocess = euler.sim(step.fun = Csnippet(AD_rprocess), delta.t = 1/4),
initializer = Csnippet(AD_init),
paramnames = AD_paramnames,
statenames = AD_statenames,
obsnames = AD_obsnames,
fromEstimationScale = Csnippet(AD_fromEstimationScale),
toEstimationScale = Csnippet(AD_toEstimationScale),
dmeasure = Csnippet(AD_dmeas), rmeasure = Csnippet(AD_rmeas))
Here, I do a simple simulation of how the pomp object is doing before going into a deeper analysis. A priori, it is not very clear what are good ballpark estimates of the parameter values \(p\) and \(q\). Since we have a relatively simple model, we can explore the values to get a good estimate.
set.seed(2018)
sim <- NULL
for(p in seq(0.1, 0.9, by = 0.2)){
for(q in c(0.2, 0.1, 0.05, 0.01, 0.005)){
sim = rbind(sim, cbind(simulate(ad_pomp, params = c(p = p, q = q),
nsim = 10, as = T, include = T)[ , c("time", "G", "sim")],
p = p, q = q))
}
}
ggplot(sim, aes(x = time, y = G, group = sim, color = sim=="data")) +
geom_line() + guides(color = F) + facet_grid(p ~ q) +
labs(y = "No. of Donations per Month",
title = "Simulated Number of Donations per Month (vs. observed)")
From the simulations, we can infer that \(p\) values ranging from \([0.5, 0.9]\) and \(q\) values ranging \([0.005, 0.01]\) would be a good ballpark estimate of the parameter values.
set.seed(2018,kind="L'Ecuyer")
cl <- makeCluster(2)
registerDoParallel(cl)
mcopts <- list(set.seed=TRUE)
Because we have two parameters of interest, \(p\) and \(q\), we can visualize the likelihood space on a plane
gridPQ <- expand.grid(p = seq(0.5, 0.8, length = 30),
q = seq(0.015, 0.001, length = 30))
foreach(theta = iter(gridPQ, "row"),
.combine = rbind, .inorder = F, .options.multicore = mcopts,
.packages = c("pomp")) %dopar% {
pfilter(ad_pomp, params = unlist(theta), Np = 500) -> pf
theta$logLik <- logLik(pf)
theta
} -> gridPQ
pq <- mutate(gridPQ, logLik = ifelse(logLik > max(logLik) -100, logLik, NA))
ggplot(data = pq, aes(x = p, y =q, z = logLik, fill = logLik)) +
geom_tile(color = NA) + geom_contour(color = 'black', bindwidth = 3) +
scale_fill_gradient() + labs(x = "p", y = "q")
From the loglikelihood plane, \(p \approx 0.73\) and \(q \approx 0.005\) seems to maximize log likelihood of the model. Next, based on this, we do global search of the likelihood surface using randomized starting values.
run_level <- 6
switch(run_level,
{ad_Np=100; ad_Nmif=10; ad_Neval=10; ad_Nglobal=10; ad_Nlocal=10},
{ad_Np=1000; ad_Nmif=100; ad_Neval=10; ad_Nglobal=10; ad_Nlocal=10},
{ad_Np=5000; ad_Nmif=100; ad_Neval=10; ad_Nglobal=10; ad_Nlocal=10},
{ad_Np=10000; ad_Nmif=100; ad_Neval=10; ad_Nglobal=10; ad_Nlocal=10},
{ad_Np=10000; ad_Nmif=200; ad_Neval=10; ad_Nglobal=10; ad_Nlocal=10},
{ad_Np=20000; ad_Nmif=200; ad_Neval=10; ad_Nglobal=50; ad_Nlocal=10}
)
ad_box <- rbind(
p = c(0.4, 0.9),
q = c(0.003, 0.05)
)
ad.cooling.fraction50 = 0.5
prw.sd = 0.005
qrw.sd = 0.0001
ad.start = c(p = 0.7, q = 0.005)
stew(file = sprintf("box_eval-%d.rda", run_level), {
t_global <- system.time({
mifs_global <- foreach(i = 1:ad_Nglobal, .packages = 'pomp', .export = ls(globalenv()),
.combine = c, .options.multicore = mcopts) %dopar% {
mif2(ad_pomp,
start = ad.start, Np = ad_Np, Nmif = ad_Nmif,
cooling.type = "geometric", cooling.fraction.50 = ad.cooling.fraction50,
transform = TRUE,
rw.sd = rw.sd(
p = prw.sd,
q = qrw.sd
))}
})}, seed = 22156498, kind = "L'Ecuyer")
# t_global
# user system elapsed
# 9.292 15.351 1545.457
# evaluate likelihood
stew(file = sprintf("lik_global_eval-%d.rda", run_level), {
t_global_eval <- system.time({
liks_global <- foreach(i = 1:ad_Nglobal, .packages = 'pomp', .export = ls(globalenv()),
.combine = rbind, .options.multicore = mcopts) %dopar% {
evals <- replicate(ad_Neval,
logLik(pfilter(ad_pomp, params = coef(mifs_global[[i]]), Np = ad_Np)))
logmeanexp(evals, se = T)}
})
}, seed = 3349478, kind = "L'Ecuyer")
# t_global_eval
# user system elapsed
# 0.286 0.487 50.733
results_global <- data.frame(logLik = liks_global[ , 1],
logLik_se = liks_global[ , 2],
t(sapply(mifs_global, coef)))
summary(results_global$logLik, digits = 5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -590.7 -590.6 -590.6 -590.6 -590.5 -590.4
pairs(~logLik + p + q, data = subset(results_global, logLik > max(logLik) -250))
plot(mifs_global)
Lastly, diagnostic plots reveal that effective sample sizes are large enough at the minimum of 4,000. Convergence plot shows that log-likelihood converges at around 100th iteration, and values of \(p\) also at around 150th iteration. On the other hand, \(q\) seems to show less convergence, but close look at it shows that the difference between maximum and minimum value of \(q\) at 150th iteration is less than 0.0002, suggesting that it is very likely to have converged.
From these diagnostics, the model seems to have converged successfully. Moreover, we also get (partial) support for the assumption that the probability of donating (\(p\)) and the probability of death at each time interval (or the rate of death \(q\)) are independent. The values of \(p\) and \(q\) are approximately 0.73 and 0.005.
I simplified the models of Fader, Hardie, and Shang (2010) and Netzer, Lattin, and Srinivasan (2008) to a POMP model two latent state model without heterogeneity. The two latent states represent alive vs. dead states often employed in the CRM literature. In the alive state, people can either choose to give or not whereas once a donor is in the dead state, s/he is considered to have permanently lost interest in the charity, and won’t come back. The IF2 algorithm seems to have successfully converged, and the model seems to be a good description of donor-charity relationship, despite its simplicity and lack of heterogeneity.
The advantage of having a two-parameter model is that because we could visualize the parameter space on \([0, 1] \times [0, 1]\) space, we could easily narrow down what would be a good parameter space to refine the estimates.
There could be multiple extensions; first, states could be extended; for example, there could be, similar to Netzer, Lattin, and Srinivasan (2008), three states. However, it is not clear to me, a priori, whether a model with three states - active/occasional/dead state could be identified, especially without individual level data. Secondly, although not present in this dataset, there are many charities that run campaigns which does not necessarily require donations (e.g., petitions to save the Arctic region). It would be interesting to model the multivariate measurement model for donations and such activities.
In the appendix, I directly simulate data from the behavioral assumptions, using estimated parameters.
set.seed(1210)
N = 91; q = 0.005; p = 0.73; nper = 205
death.time <- data.frame(d.time = (rgeom(91, q)-1))
ggplot(death.time, aes(d.time)) +
geom_histogram(color = "black", fill = "white") +
theme_bw() + labs(title = "Distribution for Time of Death", x = "Days")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
donation <- rbinom(nper*N, 1, p)
dat <- data.frame(donation, d.time = rep(death.time$d.time, each = nper),
t = rep(0:204, N)) %>%
mutate(donation = ifelse(t == 0, 1, donation))
dat <- dat %>% mutate(number = rep(1:N, each = nper))
dat <- dat %>% mutate(time.after.death = ifelse(t > d.time, 1, 0),
donation = ifelse(time.after.death == 0, donation, 0)) %>%
arrange(d.time, number) %>% select(donation, t)
grouped_data <- dat %>% group_by(t) %>%
mutate(nDonation_at_t = sum(donation)) %>% select(-donation) %>% unique() %>%
mutate(sim = "sim")
monthly <- monthly %>% rename(nDonation_at_t = G,
t = month) %>% mutate(sim = "data")
combined_data = bind_rows(grouped_data, monthly)
ggplot(combined_data,
aes(x = t, y = nDonation_at_t, group = sim, color = sim)) +
geom_line() + labs(title = "Simulated vs. Data using estimated parameter values",
y = "No. of Donations made")
Ionides, Edward 2018, Stats531W18, Notes King, Aaron A. Getting started with pomp
Fader, Peter S, Bruce GS Hardie, and Ka Lok Lee. 2005. “‘Counting Your Customers’ the Easy Way: An Alternative to the Pareto/Nbd Model.” Marketing Science 24 (2). INFORMS: 275–84.
Fader, Peter S, Bruce GS Hardie, and Jen Shang. 2010. “Customer-Base Analysis in a Discrete-Time Noncontractual Setting.” Marketing Science 29 (6). INFORMS: 1086–1108.
Schmittlein, David C, Donald G Morrison, and Richard Colombo. 1987. “Counting Your Customers: Who-Are They and What Will They Do Next?” Management Science 33 (1). INFORMS: 1–24.