Flight delays happen very common in the airline industry. According to Bureau of Transportation statistics, appriximately 18% of flights were delayed more than 15 minutes from 2015 to 2017. Even though airline will accordingly compensate delayed passengers, there is no federal requirements in terms of compensations. Therefore, it is best to the passenger to choose an airline and airport with good on-time records.
This study focused on two airports located in New York City, LaGuardia Airport (LGA) and John F. Kennedy International Airport (JFK). The goal is first build a prediction model on delay time for each airport given some covarites and recommend which airport is better in terms of avoiding flight delay based on historical data.
A dataset from Kaggle, which are provided from the Bureau of Transportation statistics in DOT. This dataset provides 1,936,758 domestic flights in 2008 with information on minutes of delay, cancelation (yes/no), cancelation reason, date (year, month, week, day) and scheduled departure minute, carrier airline, origin and destination and so on.
We only include flights with origin in JFK and LGA whose carriers are one of the following airlines:
(1). American Airlines (AA)
(2). Delta Airlines (DL)
(3). United Airlines (UA)
(4). Comair Airline (OH)
(5). American Eagle Airline (MO)
Since the two airport have different distribution of airlines, we pick these five major airline that have similar flight numbers from in 2008.
9E | AA | B6 | CO | DL | EV | F9 | FL | MQ | NW | OH | UA | US | XE | YV | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
JFK | 0 | 4030 | 14828 | 285 | 5324 | 111 | 0 | 0 | 2252 | 558 | 5269 | 1287 | 587 | 0 | 175 |
LGA | 618 | 5962 | 989 | 1432 | 4016 | 399 | 297 | 2367 | 4900 | 1725 | 4246 | 2328 | 1574 | 168 | 164 |
Considering a goal of a general reflection of departure punctuality performance, we create a weekly delay time (DDT) for JFK and LGA, which takes the average of delay minutes for that particular week. For example, DDT = 30 in JFK, implies that flight tend to delay 30 minutes for departure for the given date in JFK.
The above figure shows the avergae delay in minutes from JFK or LGA from 2008-01-01 to 2008-12-31. We also shows a log-transformed version in comparison with the original picture. There are some interesting remarks from this simple observation.
First, it seems that some spike in LGA and JFK are not overlapping, indicating opportunities for passengers to choose one over the other.
Moreover, it seems that the general trend of daily delay score changed at around September, which might due to random or some specific reason that we need to explore. This is much more obvious in the log-transformed time trend plot.
Lastly, we might want to model the log-transformed DDS, which looks more “stationary” than the original sclae. Further, exploration will be performed in the following section.
It is natural to suspect more flights within the same day is correlated with minutes in departure delay. More flights means higher chance of congestion in departure coordinations and operation errors.
The above figure shows the time trend of total number of flights (the five major airlines we picked) in JFK or LGA. It seems that the number of flights is highly correlected with DDS.
During May-July when there is a surge of flights number, we see a clear correpsondence in increasing spikes in DDS.
During Sep-Nov when there is a decline of flights number, we see a clear more stable and lower level of DDS.
Becuase of the above observation, we will need to assess how much does the information of total number of flights improve the general prediction of DDS.
spec$freq[which.max(spec$spec[,1])] # JFK
## [1] 0.002777778
spec$freq[which.max(spec$spec[,2])] # LGA
## [1] 0.002777778
By plotting the smoothed periodogram, we find that the dominat frequency is 0.00277 for both JFK and LGA, corresponding to an annual cycle (not interesting finding since we only have one year data). Besides that the time-series of the daily delayed time is very “noisy” impling it might not have any seasonal components.
However, we do notice opposite direction between JFK and LGA. For example, at a frequency of 0.9 (cycles per day), we see a high power for specturm of LGA but low power for JFK. We can observe many such “opposite direction” instances in the dataset, impling opportunities to choose a better airport over the other.
From the observation in Section 2, we notice that correlation between daily delayed time and number of daily flights. This is further verfiy by the above peroidgram (Figure 4). The red-shaded line corresponds to number of flights at the same airport, they have a very good correspondence with DDT as shwon in Figure 2.
In conclusion,
from the frequency domain analysis, we did not observe any dominant cycle within one year.
Also, based on the simple time trend plot, the DDT looks “stationary” except for several outlier points.
Therefore, we will try to fit some ARMA models in the next section.
Since the log-DDT looks “stationary” from Figure 1, we will try to fit a simple \(ARMA(p,q)\) Model first without any predictor. A selection of \(p\) and \(q\) will be determnied based on both AIC and fitted ARMA being causal and invertibel.
Let \(log(Y_{1:N})\) be the log-Daily delayed time in JFK at time \(t_{1:N}\), We will try to first fit with \(ARMA(p,q)\).
In specific, \[\phi(B)(log(Y_n)-\mu)=\psi(B)\epsilon_n\] where \(\epsilon_n \sim N(0,\sigma^2)\), \(B\) is the backshift operator and \[\phi(x) = 1-\phi_1x-\phi_2x^2 +...-\phi_px^p\] \[\psi(x) = 1+\psi_1x+\psi_2x^2 +...+\psi_px^p\]
MA0 | MA1 | MA2 | MA3 | MA4 | |
---|---|---|---|---|---|
AR0 | 209.1 | 188.0 | 189.9 | 191.9 | 192.3 |
AR1 | 188.7 | 189.9 | 184.7 | 185.6 | 186.4 |
AR2 | 190.2 | 191.9 | 186.0 | 186.9 | 188.4 |
AR3 | 191.0 | 184.6 | 186.7 | 188.5 | 189.8 |
MA0 | MA1 | MA2 | MA3 | MA4 | |
---|---|---|---|---|---|
AR0 | 330.7 | 311.9 | 313.8 | 315.3 | 316.7 |
AR1 | 311.8 | 313.7 | 306.8 | 308.3 | 318.4 |
AR2 | 313.7 | 315.5 | 308.4 | 309.4 | 311.8 |
AR3 | 314.7 | 308.1 | 309.5 | 311.8 | 309.3 |
From the AIC tables, we are selecting several p,q combination (to see whether they are causal and invertible)
For JFK, we decided to try ARMA(3,1), ARMA (1,2), ARMA(1,3)
For LGA, we decided to try ARMA(1,2), ARMA(3,1), ARMA(1,3)
However, it seems that none of the fitted ARMA model is causal and invertible (calculcation hided, could be found back in code).
Moreover, we see that the auto-correlation has some significant repetitiv pattern impling that the DDT is definietly not white noise process.
Therefore, we know that simple ARMA model does not well capture the dependecy between adjacent time point.
We will repeat similar process in 3.2 only adding number of flights as predictor.
Let \(log(Y_{1:N})\) be the log-Daily delayed time in JFK at time \(t_{1:N}\), and \(log(X_{1:N})\) be the log-number of flights. We will try to first fit with ARMA(1,1) for simplicity (just as a start).
In specific, \[ log(Y_n) = \alpha + \beta \ log(X_n) + \epsilon_n\] Where, \[\phi(B)\epsilon_n=\psi(B)w_n\] With \(w_n \sim N(0,\sigma^2)\)
MA0 | MA1 | MA2 | MA3 | MA4 | |
---|---|---|---|---|---|
AR0 | 93.8 | 90.5 | 90.0 | 91.4 | 92.9 |
AR1 | 91.6 | 91.2 | 91.7 | 93.0 | 94.9 |
AR2 | 89.3 | 91.3 | 91.7 | 84.2 | 94.7 |
AR3 | 91.2 | 92.9 | 85.3 | 86.0 | 86.7 |
MA0 | MA1 | MA2 | MA3 | MA4 | |
---|---|---|---|---|---|
AR0 | 259.9 | 250.5 | 252.4 | 254.1 | 256.0 |
AR1 | 251.3 | 252.4 | 254.2 | 254.9 | 256.7 |
AR2 | 252.2 | 254.1 | 256.1 | 248.8 | 258.2 |
AR3 | 254.2 | 256.1 | 257.6 | 258.2 | 256.6 |
We first notice, that the entire value in AIC table are much smaller than in Section 3.2.
For JFK, we decided to try ARMA(2,3), ARMA(3,2), ARMA(3,3)
For LGA, we decided to try ARMA(2,3), AR(3), MA(3).
Based on the calculation of each root for the polynomials of the above fitted model (calculcation hided, could be found back in code), we decided to select the simpliest model which are causal and invertible. Therefore we will select ARMA(2,3) for JFK and AR(3) for LGA.
Since in Section 3.2 we did not find any causal-invertible ARMA model, we will not consider them and believe that the addition of number of flights is key to have precise prediction of daily delayed time.
In conclusion, we will use model for log-DDT regress on log-(number of flights):
For data in JFK, we found ARMA(2,3) fit the best.
For data in LGA, we found AR(3) fit the best.
fit1 = arima(log(daily_JFK$DDS),order = c(2,0,3),xreg = log(daily_JFK$Fl_num))
fit1
##
## Call:
## arima(x = log(daily_JFK$DDS), order = c(2, 0, 3), xreg = log(daily_JFK$Fl_num))
##
## Coefficients:
## ar1 ar2 ma1 ma2 ma3 intercept
## 0.5777 -0.9632 -0.4635 0.8895 0.1734 2.4627
## s.e. 0.0159 0.0169 0.0555 0.0376 0.0560 0.1237
## log(daily_JFK$Fl_num)
## 0.3695
## s.e. 0.0332
##
## sigma^2 estimated as 0.06985: log likelihood = -34.12, aic = 84.24
fit2 = arima(log(daily_LGA$DDS),order = c(3,0,0),xreg = log(daily_LGA$Fl_num))
fit2
##
## Call:
## arima(x = log(daily_LGA$DDS), order = c(3, 0, 0), xreg = log(daily_LGA$Fl_num))
##
## Coefficients:
## ar1 ar2 ar3 intercept log(daily_LGA$Fl_num)
## 0.1797 -0.0537 -0.011 2.9003 0.2402
## s.e. 0.0530 0.0535 0.053 0.1053 0.0285
##
## sigma^2 estimated as 0.1147: log likelihood = -121.08, aic = 254.16
Diagnostic
From the qq-plot for the residuals from the final models, we could see that the residuals approximately follow normal (for the middle part) but not for the extreme values. This is expected, since the extreme delayed time are probably due to extreme whether which could not be explained by the stational causal model. Therefore, our future recommendation should be taken with caution.
The auto-corelation figure shows that most residuals are well within the 95% bound of white noise process, impling a general reasonable good fitting.
Now, that we have build two time-series model for daily delayed time (DDT). The two questions are of interest:
Which airport is more sensitive to number of flights? What are the implications?
If the two airport have same number of flights that day, is there a better choice to avoid flight delay?
** 95% Profile Confidence Interval for slope of number of flight**:
In order to understand whether the number of flight is significant predictor in ARMA model, and how the effect is different between airport, we tries to get a profile confidence interval.
## [1] "The estiamted slope (JFK) is 0.37 with 95% profile CI [0.3,0.43]"
## [1] "The estiamted slope (LGA) is 0.24 with 95% profile CI [0.18,0.3]"
Therefore, it is clear that the information in number of flight is important. And the number of flight seems to have stronger influence on daily dealyed time in JFK than in LGA. Although the slope is based on log(number of flights) and log(DDT), we could interpreate them in a more meaningful way through simulations. (see nect section)
10% | 25% | 50% | 75% | 90% | |
---|---|---|---|---|---|
JFK Daily Delayed Time | 31.5 | 37 | 44.9 | 53.9 | 69.9 |
LGA Daily Delayed Time | 27.4 | 33 | 42.4 | 55.4 | 70.7 |
JFK Number of flights | 22.9 | 29 | 39.5 | 55.0 | 71.1 |
LGA Number of flight | 15.9 | 24 | 41.0 | 62.2 | 85.0 |
For simplicity, let us compare the daily delayed time when number of flight are:
10 (“perfect” time)
50 (“avergae” time)
100 (“worst” time)
Above is two simulated ARMA data at “perfect” and “worst” time. We will expect that the long-run mean of delayed time is constant since the model is stationary. However, it is more interesting to know the number of time they hit “high peaks” within one year. Therefore, we will repeat the simulation for 10000 times and count the day during one year period that have daily time greater than 5min, 10min, 15min, 30min.
From Figure 8, we can see that
Most delays are actually around 10-15 minutes (not 5-10 mininute)
JFK has more days of departure delays in all four category (>5,>10,>15,>30) than LGA
Little difference was observed for having different number flights, impling the predictor is statistically significant but might not have a enough practical affect on delayed time
In total, there will be around 6 days of delayed deaprture in JFK and 5 days of dealyed departure in LGA, which is way too optimistic than the 18% statistics we see in Introduction.
However, remeber that we took the mean of all flights delayed time during one day. And therefore 5 min in Daily Delayed Time (DDT) means that every flight in the airport in that particular date is delayed by 5 minute, which is impling either a sequential delayed for a most of flights or a extreme long delayed time for some flights.
DDT:
Before we provide the final conclusion, we would like to discuss some pros and cons of creating a dialy delay time (DDT), which is the mean of delayed time of all flights in one day.
Selection of 5 airlines:
We also only selected 5 major airlines from both airport to make the analysis result comparable. However, it should be expected thqt reason for departure delays not only depends on airport but also the carriers and destinations. In the future, more segmentation should be carried our and perform detailed analysis on each such cases. However, this study is aim to provide a general comparison between the two airport and thus we think our procedure is reasonable.
ARMA Model:
Every model is worong. If we could find a perfect model that perfectly fit the data, it will be overfiiting and not-generalizable. Our ARMA model provide time series of DDT, assuming everything is ususal (no extreme bas weather, no rare event). It is important to remeber such assumption and do not over-generalize our fina conclusion.
We found that number of flights is a statistically important predictor but not do have practically large influence for daily delayed time (DDT).
JFK seems to have a more days of departure delays than LGA.
We will recommend passengers to book flights departure from LaGuardia Airport over John F. Kennedy International Airport under a normal day (to avoid possible flight delayed).
[1] Statistics Source:Bureau of Transportation statistics. https://www.transtats.bts.gov/HomeDrillChart.asp
[2] Statistics Source: Delayed and Cancel Flights. U.S. Department of Transportation. https://www.transportation.gov/airconsumer/fly-rights
[3] Data Source from Kaggle. https://www.kaggle.com/giovamata/airlinedelaycauses
[4] Knowledge Sourece: Winter 2016 Midterm Exam. https://ionides.github.io/531w18/exam/w16/mt531w16.pdf
[5] Knowledge Sourece: Previous Midterm project (“Midterm Project - Monthly Fatal Crashes in Michigan”) https://ionides.github.io/531w16/midterm_project/project17/midterm_project_-_montyly_fatal_crashes_in_michigan.html
[6] Knowledge Sourece: Previous Midterm project (“A Study on Crude Oil Price and CPI Value”) https://ionides.github.io/531w16/midterm_project/project1/Stats_531_Midterm_Project.html
[7] Knowledge Sourece: Lecture Notes 3 https://ionides.github.io/531w18/03/notes03.html
[8] Knowledge Sourece: Lecture Notes 5 https://ionides.github.io/531w18/05/notes05.html
[9] Knowledge Sourece: Lecture Notes 7 https://ionides.github.io/531w18/07/notes07.html