A reasonable selection of three questions.
A new source of time series data from the Bureau of Transportation Statistics, with careful documentation of how the dataset was derived.
A good decision to put some extra technical details in an appendix.
The results are reproducible from the provided code.
The project’s contribution is placed in the context of previous 531 projects.
“The p-value of the ADF test for the weekly delays is 0.0484, indicating that the weekly time series data is stationary, and that we do not need to apply differencing before proceeding with time series modeling.” This assertion is problematic at various levels: (i) the p-value is below 0.05 and the previous paragraph stated an intention to use a size of 0.05; (ii) the data may be better explained by a model with trend rather than a random walk model.
“To more formally test for periodicity, we use the stl function”. But this assumes seasonality rather than testing for it.
“We see both positive and negative fluctuations around the trend line,” This is a mathematical necessity not an insight. The trend line will always pass through the center of the data.
“Difference of 2.11 could be due to model complexity”. Not quite; it is mathematically impossible for this to be explained by nested pareters.
The choice of ARMA model is not sufficiently defended. They chose ARMA(1,1) since it had a reasonable AIC, despite the fact that ARMA(3,1) had a better AIC. Their preference for a simpler model makes some sense but would have been strengthened by more explanation. For instance, why didn’t they do a likelihood ratio test? They also could have plotted the roots of the ARMA(3,1). If one of them was on the border of the unit circle that would have been a stronger argument in favor of the ARMA(1,1) model.
Since forecasting was a stated goal of the project, it would be good to include forecasts in the project.