Introduction

Motivation

Fire alarm can be triggered by lots of reason. As a student living in student apartment, I sometimes can hear fire alarm several time a day. Most of them are false alarm, and people who hear these alarms everyday get tired and stop evacuating from the building in time. This can be dangerous if real fire breaks out. So I want to explore and fit a time series model of fire alarm emergency.

Goals

In this project, my goal is to explore the following questions:

  1. What time series model fits fire alarm daily data well?

  2. Is there any significant seasonality or cycle?

If we find a pattern of the fire alarms, we can be more prepared to the fire related emergency and prevent possible waste of public resource.

exploratory data analysis

data description and selection

The data we are using is 2016-2018 daily Emergency (911) Calls dataset from Kaggle, including Fire, Traffic, EMS for Montgomery County, PA. The raw data includes 141 different kinds of emergency and more than 400,000 data points. So we first need to select the cases that relates to our topic. In the following analysis, we only focus on the category “Fire: FIRE ALARM”.

After selecting the data we need, there are 23945 cases of fire alarms in total, and we aggregate the datapoints by date.

data visualization

First, we visualize the fire alarm time series data.

The blue line is the mean of all cases. The mean of our data is 22.78, which indicates that there are around 23 reported cases of fire alarm in average.

DF$date[which.max(DF$count)]
## [1] "2018-03-03"
max(DF$count)
## [1] 91

On 2018-03-03, there were 91 cases of fire alarm in Montgomery County, which is the maximum of the data we are using. Other data points are mostly under 60.

As we can see from this plot, there is no obvious violation of mean stationary. So it is possible to fit an AR model using the data.

ACF

We can plot the autocorrelation function (ACF) of the dataset.

As we can see, almost all the autocorrelation is greater than 0. It shows weakly decreasing trend and no sign of sinusoidal pattern, so AR(1) might be a good fit to this time series.

Model fitting

ARMA table

We assume our data can fit the model \(ARMA(p,q)\), which has the form: \[X_n=\mu+\phi_1 (X_{n-1}-\mu)+\dots+\phi_p (X_{n-p}-\mu)+\epsilon_{n}+\psi_1\epsilon_{n-1}+\dots+\psi_q\epsilon_{n-q}\] {\(X_n\)} is the time series data. \(\mu\) is the mean of all emergency calls (the intercept in model description). {\(\phi_p\)} and {\(\psi_q\)} are coefficients to be determined. {\(\epsilon_n\)} is Gaussian white noise with variance \(\sigma^2\). We need to choose the values of p and q. So we tabulate AIC values for a range of different choices of p and q.

MA0 MA1 MA2 MA3
AR0 7308.87 7189.67 7173.46 7164.75
AR1 7161.80 7148.42 7139.36 7141.34
AR2 7156.67 7140.28 7141.35 7143.33
AR3 7149.16 7141.25 7143.27 7144.62

In this table, we can see that the AIC of ARMA(1,2), 7139.36 is the least of all. The differences of AIC among ARMA(1,3), ARMA(2,2) and ARMA(1,2) are no larger than 2, so we need to consider all these three models and choose one from them.

ARMA(1,2)

arma12=arima(DF$count,order=c(1,0,2))
arma12
## 
## Call:
## arima(x = DF$count, order = c(1, 0, 2))
## 
## Coefficients:
##          ar1      ma1      ma2  intercept
##       0.8718  -0.5586  -0.1316    22.7654
## s.e.  0.0422   0.0537   0.0380     0.5340
## 
## sigma^2 estimated as 51.69:  log likelihood = -3564.68,  aic = 7139.36
ARMA_roots <- polyroot(c(1,-coef(arma12)[c("ar1","ma1","ma2")]))
ARMA_roots
## [1]  0.7047401+0.920562i  0.7047401-0.920562i -5.6547738-0.000000i

ARMA(1,3)

arma13=arima(DF$count,order=c(1,0,3))
arma13
## 
## Call:
## arima(x = DF$count, order = c(1, 0, 3))
## 
## Coefficients:
##          ar1      ma1      ma2     ma3  intercept
##       0.8688  -0.5562  -0.1325  0.0048    22.7659
## s.e.  0.0483   0.0571   0.0385  0.0344     0.5323
## 
## sigma^2 estimated as 51.69:  log likelihood = -3564.67,  aic = 7141.34
ARMA_roots <- polyroot(c(1,-coef(arma13)[c("ar1","ma1","ma2","ma3")]))
ARMA_roots
## [1]  0.7025237+0.9267792i  0.7025237-0.9267792i -4.9443765-0.0000000i
## [4] 31.0807053+0.0000000i

ARMA(2,2)

arma22=arima(DF$count,order=c(2,0,2))
arma22
## 
## Call:
## arima(x = DF$count, order = c(2, 0, 2))
## 
## Coefficients:
##          ar1     ar2      ma1      ma2  intercept
##       0.8272  0.0354  -0.5147  -0.1549    22.7527
## s.e.  0.2590  0.2043   0.2566   0.1402     0.5313
## 
## sigma^2 estimated as 51.69:  log likelihood = -3564.68,  aic = 7141.35
ARMA_roots <- polyroot(c(1,-coef(arma22)[c("ar1","ar2","ma1","ma2")]))
ARMA_roots
## [1]  0.7872347+0.612690i  0.7872347-0.612690i -2.4480545-0.701687i
## [4] -2.4480545+0.701687i

The roots of all models are outside the unit circle and all of the coefficients are less than 1, suggesting we have stationary invertible causal fitted ARMA.

Since these three models do not have discriminative difference, we choose the simplest model \(ARMA(1,2)\) to avoid computational inefficiency and risk of overfitting.

So our model equation is \[X_n=\mu+\phi_1 (X_{n-1}-\mu)+\psi_1\epsilon_{n-1}+\psi_2\epsilon_{n-2}\] With \(\mu=22.7654, \phi_1=0.8718, \psi_1=-0.5586, \psi_2=-0.1316, \sigma^2=51.69\).

Seasonality

To further improve the model, we try to fit a SARMA model. The possible numbers of period are 7 (as a week) and 30 (as a month).

This is the model description of period = 7:

## 
## Call:
## arima(x = DF$count, order = c(1, 0, 2), seasonal = list(order = c(1, 0, 0), 
##     period = 7))
## 
## Coefficients:
##          ar1      ma1      ma2    sar1  intercept
##       0.8505  -0.5381  -0.1229  0.0459    22.7627
## s.e.  0.0526   0.0623   0.0404  0.0330     0.5248
## 
## sigma^2 estimated as 51.6:  log likelihood = -3563.71,  aic = 7139.41

This is the model description of period = 30:

## 
## Call:
## arima(x = DF$count, order = c(1, 0, 2), seasonal = list(order = c(1, 0, 0), 
##     period = 30))
## 
## Coefficients:
##          ar1      ma1      ma2    sar1  intercept
##       0.8709  -0.5576  -0.1310  0.0147    22.7659
## s.e.  0.0422   0.0537   0.0379  0.0311     0.5405
## 
## sigma^2 estimated as 51.68:  log likelihood = -3564.57,  aic = 7141.14

The ARMA(1,2) model without seasonality:

## 
## Call:
## arima(x = DF$count, order = c(1, 0, 2))
## 
## Coefficients:
##          ar1      ma1      ma2  intercept
##       0.8718  -0.5586  -0.1316    22.7654
## s.e.  0.0422   0.0537   0.0380     0.5340
## 
## sigma^2 estimated as 51.69:  log likelihood = -3564.68,  aic = 7139.36

We can see that the log likelihoods are improved by less than 1 in both SARMA. So there is no obvious seasonality in the fire alarm daily data.

Frequency Domain Analysis

We plot the spectral density plot:

s=h$freq[which.max(h$spec)]
s
## [1] 0.0009259259
1/s
## [1] 1080

The unit of periodogram frequency is cycles per day. The highest frequency is at 0.0009. This maximum point is significant, which can be seen by sliding the confidence interval bar on the right.

However, 0.0009 corresponds to a period of 1080 days. 1080 days is almost three years. We only have three-year data, which means we do not have enough data to discover the cycle.

Model Diagnostics

r<-arma12$residuals
acf(r)

qqnorm(r)
qqline(r)

We can see the ACF plot is within the dashed lines, so we can say that the residuals are uncorrelated. But the Q-Q plot shows that the residual is not gaussian. It has a lone tail on the right. To have the residual act like normal, we can try log transformation.

Conclusion

our model is ARMA(1,2) \[X_n=\mu+\phi_1 (X_{n-1}-\mu)+\psi_1\epsilon_{n-1}+\psi_2\epsilon_{n-2}\] {\(\epsilon_n\)} is Gaussian white noise with variance \(\sigma^2\). \(\mu=22.7654, \phi_1=0.8718, \psi_1=-0.5586, \psi_2=-0.1316, \sigma^2=51.69\).

There is no obvious seasonality or cycle.

This result suggests that the fire alarm emergency can happen anytime in the year.

Discussion

The residual is right-skewed, so we can try log transformation to the fire alarm data in the future. If we want to find a cycle, we will need to find more data. Ths data we are using is just data in one county. We may need dataset of more places to have a more general model.

Reference

data from: https://www.kaggle.com/mchirico/montcoalert

notes and codes from: https://ionides.github.io/531w20/