1. Introduction
2. Data Preparation & Overview
3. Build Prediciton Model on each airport seperately
4. JFK or LGA ?
- 4.1 Sensitivity to number of flight
- 4.2 Comparison with same number of flights
5. Discussion & Conclusion
- 5.1 Dicussion
- 5.2 Conclusion
6. Reference & Source

1. Introduction

Flight delays happen very common in the airline industry. According to Bureau of Transportation statistics, appriximately 18% of flights were delayed more than 15 minutes from 2015 to 2017. Even though airline will accordingly compensate delayed passengers, there is no federal requirements in terms of compensations. Therefore, it is best to the passenger to choose an airline and airport with good on-time records.

This study focused on two airports located in New York City, LaGuardia Airport (LGA) and John F. Kennedy International Airport (JFK). The goal is first build a prediction model on delay time for each airport given some covarites and recommend which airport is better in terms of avoiding flight delay based on historical data.

2. Data Preparation & Overview

2.1 Data Desciption

A dataset from Kaggle, which are provided from the Bureau of Transportation statistics in DOT. This dataset provides 1,936,758 domestic flights in 2008 with information on minutes of delay, cancelation (yes/no), cancelation reason, date (year, month, week, day) and scheduled departure minute, carrier airline, origin and destination and so on.

2.2 Exclusion/Inclusion

We only include flights with origin in JFK and LGA whose carriers are one of the following airlines:

(1). American Airlines (AA)

(2). Delta Airlines (DL)

(3). United Airlines (UA)

(4). Comair Airline (OH)

(5). American Eagle Airline (MO)

Since the two airport have different distribution of airlines, we pick these five major airline that have similar flight numbers from in 2008.

Total number of airlines in 2008
	9E	AA	B6	CO	DL	EV	F9	FL	MQ	NW	OH	UA	US	XE	YV
JFK	0	4030	14828	285	5324	111	0	0	2252	558	5269	1287	587	0	175
LGA	618	5962	989	1432	4016	399	297	2367	4900	1725	4246	2328	1574	168	164

2.3 Daily Delay Time

Considering a goal of a general reflection of departure punctuality performance, we create a weekly delay time (DDT) for JFK and LGA, which takes the average of delay minutes for that particular week. For example, DDT = 30 in JFK, implies that flight tend to delay 30 minutes for departure for the given date in JFK.

Fig1: Time trend of Daily Delayed Time (DDT) for JFK and LGA

The above figure shows the avergae delay in minutes from JFK or LGA from 2008-01-01 to 2008-12-31. We also shows a log-transformed version in comparison with the original picture. There are some interesting remarks from this simple observation.

First, it seems that some spike in LGA and JFK are not overlapping, indicating opportunities for passengers to choose one over the other.
Moreover, it seems that the general trend of daily delay score changed at around September, which might due to random or some specific reason that we need to explore. This is much more obvious in the log-transformed time trend plot.
Lastly, we might want to model the log-transformed DDS, which looks more “stationary” than the original sclae. Further, exploration will be performed in the following section.

2.4 Number of Flightes

It is natural to suspect more flights within the same day is correlated with minutes in departure delay. More flights means higher chance of congestion in departure coordinations and operation errors.

Fig 2: Total number of flights per day vs log-DDT in JFK and LGA

The above figure shows the time trend of total number of flights (the five major airlines we picked) in JFK or LGA. It seems that the number of flights is highly correlected with DDS.

During May-July when there is a surge of flights number, we see a clear correpsondence in increasing spikes in DDS.
During Sep-Nov when there is a decline of flights number, we see a clear more stable and lower level of DDS.

Becuase of the above observation, we will need to assess how much does the information of total number of flights improve the general prediction of DDS.

3. Build Prediciton Model on each airport seperately

3.1 Frequency domain analysis for outcome and predictor

Fig 3. Smoothed periodgram for DDT JFK and LGA

spec$freq[which.max(spec$spec[,1])] # JFK

## [1] 0.002777778

spec$freq[which.max(spec$spec[,2])] # LGA

## [1] 0.002777778

By plotting the smoothed periodogram, we find that the dominat frequency is 0.00277 for both JFK and LGA, corresponding to an annual cycle (not interesting finding since we only have one year data). Besides that the time-series of the daily delayed time is very “noisy” impling it might not have any seasonal components.

However, we do notice opposite direction between JFK and LGA. For example, at a frequency of 0.9 (cycles per day), we see a high power for specturm of LGA but low power for JFK. We can observe many such “opposite direction” instances in the dataset, impling opportunities to choose a better airport over the other.

Fig 4. Smoothed periodgram for DDT and number of daily flights in JFK and LGA

From the observation in Section 2, we notice that correlation between daily delayed time and number of daily flights. This is further verfiy by the above peroidgram (Figure 4). The red-shaded line corresponds to number of flights at the same airport, they have a very good correspondence with DDT as shwon in Figure 2.

In conclusion,

from the frequency domain analysis, we did not observe any dominant cycle within one year.
Also, based on the simple time trend plot, the DDT looks “stationary” except for several outlier points.
Therefore, we will try to fit some ARMA models in the next section.

3.2 Fitting a simple ARMA Model without any predictor

Since the log-DDT looks “stationary” from Figure 1, we will try to fit a simple \(ARMA(p,q)\) Model first without any predictor. A selection of \(p\) and \(q\) will be determnied based on both AIC and fitted ARMA being causal and invertibel.

Let \(log(Y_{1:N})\) be the log-Daily delayed time in JFK at time \(t_{1:N}\), We will try to first fit with \(ARMA(p,q)\).

In specific, \[\phi(B)(log(Y_n)-\mu)=\psi(B)\epsilon_n\] where \(\epsilon_n \sim N(0,\sigma^2)\), \(B\) is the backshift operator and \[\phi(x) = 1-\phi_1x-\phi_2x^2 +...-\phi_px^p\] \[\psi(x) = 1+\psi_1x+\psi_2x^2 +...+\psi_px^p\]

AIC for ARMA Model without predictor (JFK)
	MA0	MA1	MA2	MA3	MA4
AR0	209.1	188.0	189.9	191.9	192.3
AR1	188.7	189.9	184.7	185.6	186.4
AR2	190.2	191.9	186.0	186.9	188.4
AR3	191.0	184.6	186.7	188.5	189.8

AIC for ARMA Model without predictor (LGA)
	MA0	MA1	MA2	MA3	MA4
AR0	330.7	311.9	313.8	315.3	316.7
AR1	311.8	313.7	306.8	308.3	318.4
AR2	313.7	315.5	308.4	309.4	311.8
AR3	314.7	308.1	309.5	311.8	309.3

From the AIC tables, we are selecting several p,q combination (to see whether they are causal and invertible)

For JFK, we decided to try ARMA(3,1), ARMA (1,2), ARMA(1,3)
For LGA, we decided to try ARMA(1,2), ARMA(3,1), ARMA(1,3)

Fig5. ACF plot of DDT and log-DDT

However, it seems that none of the fitted ARMA model is causal and invertible (calculcation hided, could be found back in code).
Moreover, we see that the auto-correlation has some significant repetitiv pattern impling that the DDT is definietly not white noise process.
Therefore, we know that simple ARMA model does not well capture the dependecy between adjacent time point.

3.3 Fitting a simple ARMA Model regress on number of flights

We will repeat similar process in 3.2 only adding number of flights as predictor.

Let \(log(Y_{1:N})\) be the log-Daily delayed time in JFK at time \(t_{1:N}\), and \(log(X_{1:N})\) be the log-number of flights. We will try to first fit with ARMA(1,1) for simplicity (just as a start).

In specific, \[ log(Y_n) = \alpha + \beta \ log(X_n) + \epsilon_n\] Where, \[\phi(B)\epsilon_n=\psi(B)w_n\] With \(w_n \sim N(0,\sigma^2)\)

AIC for AMRA Model with regressor on number of flights (JFK)
	MA0	MA1	MA2	MA3	MA4
AR0	93.8	90.5	90.0	91.4	92.9
AR1	91.6	91.2	91.7	93.0	94.9
AR2	89.3	91.3	91.7	84.2	94.7
AR3	91.2	92.9	85.3	86.0	86.7

AIC for AMRA Model with regressor on number of flights (LGA)
	MA0	MA1	MA2	MA3	MA4
AR0	259.9	250.5	252.4	254.1	256.0
AR1	251.3	252.4	254.2	254.9	256.7
AR2	252.2	254.1	256.1	248.8	258.2
AR3	254.2	256.1	257.6	258.2	256.6

We first notice, that the entire value in AIC table are much smaller than in Section 3.2.

For JFK, we decided to try ARMA(2,3), ARMA(3,2), ARMA(3,3)
For LGA, we decided to try ARMA(2,3), AR(3), MA(3).

Based on the calculation of each root for the polynomials of the above fitted model (calculcation hided, could be found back in code), we decided to select the simpliest model which are causal and invertible. Therefore we will select ARMA(2,3) for JFK and AR(3) for LGA.

Since in Section 3.2 we did not find any causal-invertible ARMA model, we will not consider them and believe that the addition of number of flights is key to have precise prediction of daily delayed time.

3.4 Conclusions

In conclusion, we will use model for log-DDT regress on log-(number of flights):

For data in JFK, we found ARMA(2,3) fit the best.
For data in LGA, we found AR(3) fit the best.

fit1 = arima(log(daily_JFK$DDS),order = c(2,0,3),xreg = log(daily_JFK$Fl_num))
fit1

## 
## Call:
## arima(x = log(daily_JFK$DDS), order = c(2, 0, 3), xreg = log(daily_JFK$Fl_num))
## 
## Coefficients:
##          ar1      ar2      ma1     ma2     ma3  intercept
##       0.5777  -0.9632  -0.4635  0.8895  0.1734     2.4627
## s.e.  0.0159   0.0169   0.0555  0.0376  0.0560     0.1237
##       log(daily_JFK$Fl_num)
##                      0.3695
## s.e.                 0.0332
## 
## sigma^2 estimated as 0.06985:  log likelihood = -34.12,  aic = 84.24

fit2 = arima(log(daily_LGA$DDS),order = c(3,0,0),xreg = log(daily_LGA$Fl_num))
fit2

## 
## Call:
## arima(x = log(daily_LGA$DDS), order = c(3, 0, 0), xreg = log(daily_LGA$Fl_num))
## 
## Coefficients:
##          ar1      ar2     ar3  intercept  log(daily_LGA$Fl_num)
##       0.1797  -0.0537  -0.011     2.9003                 0.2402
## s.e.  0.0530   0.0535   0.053     0.1053                 0.0285
## 
## sigma^2 estimated as 0.1147:  log likelihood = -121.08,  aic = 254.16

Fig7. Final Model’s Diagnostic

Diagnostic

From the qq-plot for the residuals from the final models, we could see that the residuals approximately follow normal (for the middle part) but not for the extreme values. This is expected, since the extreme delayed time are probably due to extreme whether which could not be explained by the stational causal model. Therefore, our future recommendation should be taken with caution.
The auto-corelation figure shows that most residuals are well within the 95% bound of white noise process, impling a general reasonable good fitting.

4. JFK or LGA ?

Now, that we have build two time-series model for daily delayed time (DDT). The two questions are of interest:

Which airport is more sensitive to number of flights? What are the implications?
If the two airport have same number of flights that day, is there a better choice to avoid flight delay?

4.1 Sensitivity to number of flight

** 95% Profile Confidence Interval for slope of number of flight**:

In order to understand whether the number of flight is significant predictor in ARMA model, and how the effect is different between airport, we tries to get a profile confidence interval.

Fig6. Profile Liklihood for slope

## [1] "The estiamted slope (JFK) is 0.37 with 95% profile CI [0.3,0.43]"

Fig6. Profile Liklihood for slope

## [1] "The estiamted slope (LGA) is 0.24 with 95% profile CI [0.18,0.3]"

Therefore, it is clear that the information in number of flight is important. And the number of flight seems to have stronger influence on daily dealyed time in JFK than in LGA. Although the slope is based on log(number of flights) and log(DDT), we could interpreate them in a more meaningful way through simulations. (see nect section)

4.2 Comparison with same number of flights

Summary table of the quantile for daily delayed time and number of flights within 2008
	10%	25%	50%	75%	90%
JFK Daily Delayed Time	31.5	37	44.9	53.9	69.9
LGA Daily Delayed Time	27.4	33	42.4	55.4	70.7
JFK Number of flights	22.9	29	39.5	55.0	71.1
LGA Number of flight	15.9	24	41.0	62.2	85.0

For simplicity, let us compare the daily delayed time when number of flight are:

10 (“perfect” time)
50 (“avergae” time)
100 (“worst” time)

Fig8. Simulated Daily Delayed Time under perfect and worst time

Above is two simulated ARMA data at “perfect” and “worst” time. We will expect that the long-run mean of delayed time is constant since the model is stationary. However, it is more interesting to know the number of time they hit “high peaks” within one year. Therefore, we will repeat the simulation for 10000 times and count the day during one year period that have daily time greater than 5min, 10min, 15min, 30min.

Fig9. Number of Days that have delayed departure based on flight number = 10,50,100

From Figure 8, we can see that

Most delays are actually around 10-15 minutes (not 5-10 mininute)
JFK has more days of departure delays in all four category (>5,>10,>15,>30) than LGA
Little difference was observed for having different number flights, impling the predictor is statistically significant but might not have a enough practical affect on delayed time
In total, there will be around 6 days of delayed deaprture in JFK and 5 days of dealyed departure in LGA, which is way too optimistic than the 18% statistics we see in Introduction.
However, remeber that we took the mean of all flights delayed time during one day. And therefore 5 min in Daily Delayed Time (DDT) means that every flight in the airport in that particular date is delayed by 5 minute, which is impling either a sequential delayed for a most of flights or a extreme long delayed time for some flights.

5. Discussion & Conclusion

5.1 Dicussion

DDT:

Before we provide the final conclusion, we would like to discuss some pros and cons of creating a dialy delay time (DDT), which is the mean of delayed time of all flights in one day.

Pros:
- Since, we are interested to compare the general delayed performance in two airport, DDT summarize all flights information and thus create one time-series for each airport
- DDT is easy to analyze
Cons:
- DDT is a much dense score, and thus DDT = 5 could mean a lot of scenarios (all flights delayed by 5 minute or 5 % of flights delayed by 100 minutes). Therefore, the interpretation should be careful that DDT = 5 is in fact a reflection of modertae delayed time of an airport.
- DDT is tricky to interpretate

Selection of 5 airlines:

We also only selected 5 major airlines from both airport to make the analysis result comparable. However, it should be expected thqt reason for departure delays not only depends on airport but also the carriers and destinations. In the future, more segmentation should be carried our and perform detailed analysis on each such cases. However, this study is aim to provide a general comparison between the two airport and thus we think our procedure is reasonable.

ARMA Model:

Every model is worong. If we could find a perfect model that perfectly fit the data, it will be overfiiting and not-generalizable. Our ARMA model provide time series of DDT, assuming everything is ususal (no extreme bas weather, no rare event). It is important to remeber such assumption and do not over-generalize our fina conclusion.

5.2 Conclusion

We found that number of flights is a statistically important predictor but not do have practically large influence for daily delayed time (DDT).
JFK seems to have a more days of departure delays than LGA.
We will recommend passengers to book flights departure from LaGuardia Airport over John F. Kennedy International Airport under a normal day (to avoid possible flight delayed).

6. Reference & Source

[1] Statistics Source:Bureau of Transportation statistics. https://www.transtats.bts.gov/HomeDrillChart.asp

[2] Statistics Source: Delayed and Cancel Flights. U.S. Department of Transportation. https://www.transportation.gov/airconsumer/fly-rights

[3] Data Source from Kaggle. https://www.kaggle.com/giovamata/airlinedelaycauses

[4] Knowledge Sourece: Winter 2016 Midterm Exam. https://ionides.github.io/531w18/exam/w16/mt531w16.pdf

[5] Knowledge Sourece: Previous Midterm project (“Midterm Project - Monthly Fatal Crashes in Michigan”) https://ionides.github.io/531w16/midterm_project/project17/midterm_project_-_montyly_fatal_crashes_in_michigan.html

[6] Knowledge Sourece: Previous Midterm project (“A Study on Crude Oil Price and CPI Value”) https://ionides.github.io/531w16/midterm_project/project1/Stats_531_Midterm_Project.html

[7] Knowledge Sourece: Lecture Notes 3 https://ionides.github.io/531w18/03/notes03.html

[8] Knowledge Sourece: Lecture Notes 5 https://ionides.github.io/531w18/05/notes05.html

[9] Knowledge Sourece: Lecture Notes 7 https://ionides.github.io/531w18/07/notes07.html

Which Airport should you choose to avoid flight delayed

March 8, 2018