NBA POMP-ELO: A Stochastic System Approach to Modeling Team Strength and Predicting Games

Rockets SLAM Cover (Slam 254, 2025)

Introduction

Basketball, like all sports, is wildly unpredictable. A game filled with runs of hot and cold streaks, any team can win on a given night. However, in all this chaos is there is there an underlying truth? Are some teams just flat out better than another? Can we measure by how much, and even predict games? Luckily there is a traditional method to try and capture such a state, ELO. This simple yet beautiful statistic, introduced by Arpad Elo, sets all teams or players at the same base rating and over time adjusts their score depending on the result of their game and how strong their opponent was (Wikipedia contributors 2025). This baseline statistic even allows for win probabilities to be calculated, and sets it as a solid foundation for the questions we’re set to answer. However, this simple approach fails to account for other factors that go into the game of basketball. A a crucial player could get injured, and teams tend to play better on their home court, all of which would alter their ELO rating on that given day. In this report we’ll introduce what we’ll call POMP-ELO which will try and remedy these concerns by focusing on modeling the team strength and test its predict power for the Houston Rockets. We’ll walk you through our data preparation, our model selection as well as comparison to other baseline models, and finally a conclusion on our results.

More formally we’ll try and answer the questions:

Can we improve ELO’s predictive power by introducing dynamic covariates?

Can we learn what variables are important in understanding team strength?

Data Preparation

While ELO is a wildly known metric there is no readily available data for tracking a teams and their opponents ELO throughout a season. For this reason we had to calculate this ourselves by first pulling all matchups in the 2023-24, and 2024-25 season (164 games) from the database Basketball Reference(“Basketball Statistics & History of Every Team & NBA and WNBA Players” 2025). The data to gather attendance numbers for the games was sourced using ‘nba_api’,(“Nba_api” 2025) a useful Python wrapper which collects and statistical figures related to the NBA. Using the scores and results of each game we then use the formula of ELO to calculate this metric for every team (Wikipedia contributors 2025):

\[ E_{S} = \frac{1}{1+10^{\frac{TS - OP}{400}}} \] \[ TS = TS \pm K\cdot E_{S} \] Where \(TS\) is team strength and \(OP\) opponent strength. Then we updated \(TS\) depending on the result of the game, adding if they won and subtracting if they lost. Furthermore the K-factor, how sensitive ELO adjusts to new results, can be adjusted but we opted to use the results found by Australia Sports Tipping of 20 being the optimal value for basketball (“NBA - Elo Ratings” 2025).

Code for ELO

update_elo <- function(rating1, rating2, result, K = 20) {
  E1 <- 1 / (1 + 10^((rating2 - rating1) / 400))  # Expected score for team 1
  E2 <- 1 / (1 + 10^((rating1 - rating2) / 400))  # Expected score for team 2
  
  if (result == 1) {  # Team 1 wins
    rating1_new <- rating1 + K * (1 - E1)
    rating2_new <- rating2 + K * (0 - E2)
  } else {  # Team 2 wins
    rating1_new <- rating1 + K * (0 - E1)
    rating2_new <- rating2 + K * (1 - E2)
    
  } 
  
  return(c(rating1_new, rating2_new, E1, E2))
}

The following ELO over time came out to look like

We’ll be using the graph of ELO over win probabilities throughout this report as it provides a more comprehensive view on the teams performance throughout the season along with still showing the outcomes of the games. Now have the data for our latent space which we’ll try to model using POMP. To do this we’ll use the following covariates: Average Last 5 game Total BPM, Home, Opponent ELO, and Average Last 5 Opponent Total BPM (depending on the POMP model we chose). These stats were similarly pulled from Basketball Reference using the Box Score statistics for each game.

Here BPM is a statistics that tries to measure how impactful a certain player was in a given game. For the purposes of our model we added the BPM’s for all the starters (5 players that started the game) to get a measure of Total BPM. The rational behind this is that the starters are generally the best players for each team. Thus this measure would give us the best representation of how a team performed on a given night as it’ll be drawn from the most impactful players. To take one step further we then took the average of these Total BPM’s over their last 5 games to get a measure of team momentum. In the case the number of games were less than 5 we then took the average with respect to how many games they played up until that point. This is more clearly seen when we take a look at the data:

## # A tibble: 10 × 3
##    Date                `Total BPM` `Last 5 Games BPM`
##    <dttm>                    <dbl>              <dbl>
##  1 2023-10-25 00:00:00       -10.7               0   
##  2 2023-10-27 00:00:00        -8.2             -10.7 
##  3 2023-10-29 00:00:00        -2.4              -9.45
##  4 2023-11-01 00:00:00        16.3              -7.1 
##  5 2023-11-04 00:00:00        18.3              -1.25
##  6 2023-11-06 00:00:00        18.1               2.66
##  7 2023-11-08 00:00:00        28.5               8.42
##  8 2023-11-10 00:00:00        12.9              15.8 
##  9 2023-11-12 00:00:00        -7.9              18.8 
## 10 2023-11-17 00:00:00        -0.8              14.0

We’ll introduce three POMP models:

Opp ELO as a covariate
Opp ELO as a state itself
Attendance effect on home_court_avd

This will become more clear in our POMP section but under the 2nd condition we’ll use the Average Last 5 BPM for the opponent to then adjust the Opponent strength, while under the 1st it’ll not fluctuate as it’s not a random process like team strength.

Baseline Models

Since we’ll be using POMP-ELO to test its accuracy in predictive the outcome of games we’ll need a couple of baseline models to test. To start…

Logisitic Regression

We’ll define a logistic regression model to serve as a baseline of our understanding how ELO ratings relate to team performance indicators, independent of any latent process. The model is specified as:

\[ \log \left( \frac{P(Win_n = 1)}{P(Win_n = 0)} \right) = \gamma_0 + \gamma_1 \cdot LVBPM_n + \gamma_2 \cdot OLVBPM_n + \gamma_3 \cdot Home_n + \varepsilon_n. \]

where:

\(Win_n\) is the predicted result of the game \(n\) as a binary.
\(LVBPM_n\) is the team’s average Total BPM over the last 5 games.
\(OLVBPM_n\) is the opponent’s average Total BPM over their last 5 games.
\(Home_n\) is a binary indicator for whether the team played at home.
\(\varepsilon_n\) is the residual error term.

This model allows us to assess the direct contribution of observable team performance indicators to the probability of winning a game, without assuming latent states or time dependence.

log_elo <- glm(Win ~ `Last 5 Games BPM` + `Opp Last5  BPM` + Home, data = bpm, family = "binomial")
summary(log_elo)

## 
## Call:
## glm(formula = Win ~ `Last 5 Games BPM` + `Opp Last5  BPM` + Home, 
##     family = "binomial", data = bpm)
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)   
## (Intercept)        -0.16112    0.25003  -0.644  0.51933   
## `Last 5 Games BPM`  0.02411    0.02525   0.955  0.33978   
## `Opp Last5  BPM`   -0.04587    0.01715  -2.675  0.00748 **
## Home                0.94051    0.33416   2.815  0.00488 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 224.39  on 163  degrees of freedom
## Residual deviance: 206.54  on 160  degrees of freedom
## AIC: 214.54
## 
## Number of Fisher Scoring iterations: 4

The logistic regression model estimates the probability of winning using observable team performance indicators:

Home-court advantage is a statistically significant predictor (\(p = 0.0049\)), increasing the log-odds of winning by 0.94.
Opponent recent performance (Opp Last5 BPM) is also significant and negative (\(p = 0.0075\)), suggesting stronger opponents reduce win probability.
The team’s own recent performance (Last 5 Games BPM) is not statistically significant.
The model achieves a prediction accuracy of 64.02%, indicating that simple features explain some variation, but likely omit important latent or temporal structure.

## [1] "Pred Acc: 64.02 %"

Base ELO

Predictions using ELO are in the form as defined in our Data Preparation section.

\[ E_{S} = \frac{1}{1+10^{\frac{TS - OP}{400}}} \]

Yet how does it perform?

## [1] "Pred Acc: 57.93 %"

Can see it did worse than logistic Regression by a fair margin, but now we have two metrics to test POMP-ELO against!

POMP

Defining Model

As mentioned, three different approaches we’re taken, but both models followed the same general POMP structure. That being Team Strength (TS) was first adjusted by how well they performed up until the game \(n\) with “LVBPM” being the average of a teams last 5 Total BPM scores. In addition, some noise was added to act as some random event like a player injury that could occur during the game. However, if we’re not careful, TS can grow to always be bigger than Opponent Strength as it can grow to be larger than the scale of Opponent Strength (1800-1300). Under these cases, the the Team will always be predicted to win as their TS is inflated and not actually representative of how good they are. To combat this we added a regulating term to TS which as adjusted using the parameter \(\alpha\)

Pre-adjustment \[ TS_{n} = TS_{n} + \beta_{1}LVBPM_{n} - \alpha(TS_{n}-1500) + \epsilon \]

Following this pre-adjustment phase, TS was further adjusted using the same ELO logic, adding or subtracting TS by the a metric of how strong their opponent was.

The prediction for the winner for each matchup was found using the p in the Bradley Terry Model(“The Bradley-Terry Model” 2016) where \(hca\) was the parameter for home court advantage. This further level of complexity was added as it’s widely known that the home team has a slight advantage in winning a basketball game, thus a boost was given to the home side.
\[ p = \frac{e^{hca\cdot I(Home=1) + team_1}}{e^{hca\cdot I(Home=1)+team_1}+e^{hca\cdot I(Home=0)+team_2}} \]

Post-prediction \[ TS_{n+1} = TS_{n} \pm I(Win=[1,0])(20 \cdot(1 - E)) \] Where E is: \[ E = \frac{1}{1 + 10^{\frac{OPP-TS}{ 400}}} \] The only difference in both models came in how OPP was represented.

OPP as covariate

Here OPP was provided by the data and didn’t change for every simulation while…

OPP as a state

In this case, we had to represent OPP as a noisy measurement drawn using a similar process as TS. Specifically, we adjusted OPP using Opponent Last 5 AVG BPM (OLVBPM) along with some noise as a separate sate variable and the same regulating factor. The intuition behind this is that in the real world both teams are open to the possibilities of a random event occurring. Either a player going down in injury or a team catching fire in the game. Thus it might make more sense when both are random states.

\[ OPP_{n} = OPP_{n} + \beta_{2}OLVBPM + - \alpha(OPP_{n}-1500) + \epsilon \]

We also defined p_win to be a separate state which will be used to store the respective probabilities for each simulation

Code

Code under 1.

rproc <- Csnippet("
  team_strength += beta1 *last5_bpm - alpha * (team_strength - 1500)  + rnorm(0, sigma);
  
  p_win = 1.0 / (1.0 + pow(10, (opp_strength - team_strength) / 400));
  int sim_win = rbinom(1, p_win); 
  
  if (sim_win == 1) {
    team_strength += 20 * (1 - 1/(1+pow(10, (opp_strength - team_strength)/400)));  
  } else {
    team_strength -= 20 * (1 - 1/(1+pow(10, (opp_strength - team_strength)/400)));  
  }
")

dmeas <- Csnippet("
  double p;
  
  double team_score = team_strength / 100.0;
  double opp_score = opp_strength / 100.0;
  double hca = home_court_avd / 100.0;

  double max_val = fmax(team_score, opp_score);
  
  if (home == 1){
    team_score += hca;
  }
  
  p = exp(team_score - max_val) / (exp(team_score - max_val) + exp(opp_score - max_val));

  lik = dbinom(Win, 1, p, give_log);
")

rmeas <- Csnippet("
double p;
  if (home == 1){
  p = exp(home_court_avd + team_strength - opp_strength) / (1 + exp(home_court_avd + team_strength - opp_strength));
  }
  else{
  p = exp(team_strength - (opp_strength + home_court_avd) ) / (1 + exp(team_strength - (opp_strength + home_court_avd)));
  }
  Win = rbinom(1, p);
")

init <- Csnippet("
  team_strength = 1500;
  p_win = .5;
")

bpm %>% select(time,Win,Home,`Last 5 Games BPM`,opp_elo,elo) -> red_bpm

nba_pomp <- pomp(
  data = red_bpm,
  times = "time",
  t0 = 1,
  rprocess = euler(step.fun = rproc, delta.t = 1),
  rmeasure = rmeas,
  dmeasure = dmeas,
  rinit = init,
  statenames = c("team_strength","p_win"),
  paramnames = c("beta1", "sigma", "home_court_avd","alpha"),
  partrans = parameter_trans(
    log = c("alpha")
  ),
  covar = covariate_table(
    times = red_bpm$time,
    last5_bpm = red_bpm$`Last 5 Games BPM`,
    opp_strength = red_bpm$opp_elo,
    home = red_bpm$Home
  ),
  covarnames = c("last5_bpm","opp_strength","home")
)

Change under 2.

rproc2 <- Csnippet("
  team_strength += beta1 *last5_bpm  - alpha * (team_strength - 1500) + rnorm(0, sigma);
  opp_strength += beta2 * opp5_bpm - alpha * (opp_strength - 1500) + rnorm(0, sigma);
  
  p_win = 1.0 / (1.0 + pow(10, (opp_strength - team_strength) / 400));
  int sim_win = rbinom(1, p_win); 
  
  if (sim_win == 1) {
    team_strength += 20 * (1 - 1/(1+pow(10, (opp_strength - team_strength)/400)));  
  } else {
    team_strength -= 20 * (1 - 1/(1+pow(10, (opp_strength - team_strength)/400)));  
  }
")

init2 <- Csnippet("
  team_strength = 1500;
  opp_strength = 1500;
  p_win = .5;
")

bpm %>% select(time,Win,Home,`Last 5 Games BPM`,`Opp Last5  BPM`,elo) -> red_bpm2

nba_pomp2 <- pomp(
  data = red_bpm2,
  times = "time",
  t0 = 1,
  rprocess = euler(step.fun = rproc2, delta.t = 1),
  rmeasure = rmeas,
  dmeasure = dmeas,
  rinit = init2,
  statenames = c("team_strength","opp_strength","p_win"),
  paramnames = c("beta1", "beta2","sigma", "home_court_avd","alpha"),
  covar = covariate_table(
    times = red_bpm2$time,
    last5_bpm = red_bpm2$`Last 5 Games BPM`,
    opp5_bpm = red_bpm2$`Opp Last5  BPM`,
    home = red_bpm2$Home
  ),
  covarnames = c("last5_bpm","opp5_bpm","home")
)

Simulations

The following are simulations for both models under some set parameter values.

Under Model 1

Looking at predictive performance:

## [1] "Pred Acc: 67 %"

Model 2

We can see that both models seem to capture the trend of the ELO with Model 1 having less variance. This is to be expected as in Model 2 we’ve introduced another level of randomness.

Looking at this models predictive performance

## [1] "Pred Acc: 53 %"

Can see the extra randomness which might make sense in theory is far too noisy to improve in this predictive performance. :(

Local Search

Hopefully, under better model parameters we can see an improvement in its prediction accuracy.The local searches for the maximum likelihood for each model is conducted building upon the iterated filtering code discussed in the lectures(Ionides 2025)

Model 1

Model 2

From just these local searches we can see that Model 2 have more noisy estimates of the parameters which makes sense due to the extra randomness in its process. However, it’s interesting to note that the alpha parameter still converges in a smooth manner like in Model 1.

Global Search

Model 1

## # A tibble: 1 × 6
##   beta1 home_court_avd alpha sigma loglik loglik.se
##   <dbl>          <dbl> <dbl> <dbl>  <dbl>     <dbl>
## 1  1.85           89.3 0.571  5.00  -97.3   0.00692

Running simulations on these parameters

## [1] "Pred Acc: 71 %"

Model 2

## # A tibble: 1 × 7
##   beta1 home_court_avd beta2 alpha sigma loglik loglik.se
##   <dbl>          <dbl> <dbl> <dbl> <dbl>  <dbl>     <dbl>
## 1 0.896           75.8  2.48 0.833  5.00  -106.   0.00690

## [1] "Pred Acc: 72 %"

Attendence Effect

While the previous models account for the home court advantage not all teams have the same support from their fans as under our current assumption. This motivates the use of fan attendance to hopefully capture the difference in a teams home court advantage.

Let us visualize how the Home and Attendance are related to each other and give a sense of team support

Similarly, let us observe how the win can be affected using an interaction between the home and attendance through both ARMA and a linear regression model

model <- glm(Win ~`Last 5 Games BPM` + `Opp Last5  BPM` + Home * log(Attendance), data = bpm_att, family = "binomial" )
summary(model)

## 
## Call:
## glm(formula = Win ~ `Last 5 Games BPM` + `Opp Last5  BPM` + Home * 
##     log(Attendance), family = "binomial", data = bpm_att)
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)  
## (Intercept)           27.49480   24.39478   1.127   0.2597  
## `Last 5 Games BPM`     0.02597    0.02614   0.993   0.3205  
## `Opp Last5  BPM`      -0.03598    0.01772  -2.031   0.0423 *
## Home                 202.83544   92.74581   2.187   0.0287 *
## log(Attendance)       -2.82580    2.49077  -1.135   0.2566  
## Home:log(Attendance) -20.64451    9.47232  -2.179   0.0293 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 224.39  on 163  degrees of freedom
## Residual deviance: 194.93  on 158  degrees of freedom
## AIC: 206.93
## 
## Number of Fisher Scoring iterations: 5

## [1] "Pred Acc: 65.85 %"

As we can see the inclusion of Attedance increased our models accuracy compared to the logistic regression without it! Let’s see if we see similarly improvements in our POMP.

Now let us finally update the rmeas and the pomp model accordingly. Building off of Model 1 as its a simpler model with less noise in this process. Here we’ll simply adjust home_court_avd using home attendance numbers as that is a good measure in how many people actually follow and support the team. We ended up taking the log of this number as if adding without this rescaling the predicted TS would be much larger than what our actual ELO is measured under, due to Attendance being in the range of \(10^4\)s.

rmeas_att <- Csnippet("
double p;
  if (home == 1){
  p = exp(home_court_avd + team_strength +log(attendance) - opp_strength) / (1 + exp(home_court_avd + log(attendance) + team_strength - opp_strength));
  }
  else{
  p = exp(team_strength - (opp_strength + home_court_avd + log(attendance)) ) / (1 + exp(team_strength - (opp_strength + home_court_avd + log(attendance))));
  }
  Win = rbinom(1, p);
")

bpm_att %>% select(time,Win,Home,`Last 5 Games BPM`,opp_elo,elo, Attendance) -> red_bpm_att

nba_pomp_att <- pomp(
    data = red_bpm_att,
    times = "time",
    t0 = 1,
    rprocess = euler(step.fun = rproc, delta.t = 1),
    rmeasure = rmeas_att,
    dmeasure = dmeas,
    rinit = init,
    statenames = c("team_strength","p_win"),
    paramnames = c("beta1", "sigma", "home_court_avd","alpha"),
    partrans = parameter_trans(
        log = c("alpha")
    ),
    covar = covariate_table(
        times = red_bpm_att$time,
        last5_bpm = red_bpm_att$`Last 5 Games BPM`,
        opp_strength = red_bpm_att$opp_elo,
        home = red_bpm_att$Home,
        attendance = red_bpm_att$Attendance
    ),
    covarnames = c("last5_bpm","opp_strength","home", "attendance")
)

Simulation

## [1] "Pred Acc: 69 %"

Conducting Local Search:

Global search

## # A tibble: 1 × 6
##   beta1 home_court_avd alpha sigma loglik loglik.se
##   <dbl>          <dbl> <dbl> <dbl>  <dbl>     <dbl>
## 1  1.85           89.3 0.571  5.00  -97.3   0.00692

## [1] "Pred Acc: 71 %"

Model Comparison

How did these POMP models due compared to our baselines?

Model	Accuracy (%)
Prediction Accuracy Across Models
Logisitc	64.02
Logisitc_att	65.85
Base_ELO	66.46
POMP_Mod1	71.10
POMP_Mod2	72.50
POMP_att	71.10

To our surprise POMP has resulted in far better performance compared to our baseline models and far out performed base ELO! What’s even more surprising is that Model 2 resulted in our best performing model and was able to stabilize despite the extra randomness added to the procedure.

Let’s see how the simulated predictions change over time compared to ELO.

Can see that the underlying predictions follows very closely with ELO for all models. This isn’t surprising as our POMP-ELO is utilizing the ELO structure just with a different measure for team strength and the use of the Bradley Terry p instead of the ELO version.

What about Type 1 and 2 errors

This plot shows that Model 2 is the best in terms of Type 1 error by a fair margin followed by Model 1 and Base ELO. While Model 1 and Model Attendance make up for in the Type 2 error with Model Attendance actually coming on top!

Conclusion

In such a competitive and results driven world we are often enamored with the concept of ability. How am I doing compared to my peers and where do I rank? How good am I at performing my job; at my craft? Basketball is no exception. With the introduction of ELO we are able to grasp at this measure of team strength in all competitive sports. However, the world is chaotic, with many interacting parts in a complex system, something ELO fails to fully recognize. While POMP-ELO is by no measure a perfect solution, we feel from this report we’ve made a case that it offers an interesting yet still intuitive and effective result in filling in the gaps were ELO falls short.

In fact, from our Model Comparison section we feel we’ve shown that our approach to ELO has drastically improved its predictive power. From the reduction in Type 1 and Type 2 errors and in overall accuracy POMP-ELO shows some encouraging signs.

In terms of variable importance towards team strength it seems any measure of team performance is our best bet in modeling this state as seen in the summary charts from our baseline models. Other outside factors such as fan attendance also seems to show some sign of importance towards modeling this state. Granted, this part of the report could have been more thoroughly analysized, especially in terms of our POMP, but due to time constraints we we’re unable to.

References

“Basketball Statistics & History of Every Team & NBA and WNBA Players.” 2025. https://www.basketball-reference.com/.

Ionides, Edward L. 2025. “Lesson 4. Iterated Filtering: Principles and Practice.” Iterated Filtering Models Lecture, Ann Arbor, Michigan.

“NBA - Elo Ratings.” 2025. https://www.aussportstipping.com/sports/nba/elo_ratings/#:~:text=If%20k%20is%20set%20too,%20optimal%20k%20is%2020.&text=The%20same%20website%20also%20applied,%20their%20NFL%20Elo%20rating%20system.

“Nba_api.” 2025. https://github.com/swar/nba_api.

“The Bradley-Terry Model.” 2016. https://web.stanford.edu/class/archive/stats/stats200/stats200.1172/Lecture24.pdf.

Wikipedia contributors. 2025. “Elo Rating System.” https://en.wikipedia.org/wiki/Elo_rating_system.

NBA POMP-ELO: A Stochastic System Approach to Modeling Team Strength and Predicting Games

blinded

2025-04-18

Introduction

Data Preparation

Baseline Models

Logisitic Regression

Base ELO

POMP

Defining Model

Code

Simulations

Local Search

Global Search

Attendence Effect

Model Comparison

Conclusion

References