As one of the earliest cities in U.S. which adopted new crime records management system, Seattle Police Department has one of the best data base on crimes since 1982. Based on the great dataset, multiple researches have been conducted on Seattle crimes. Pamela Rountree, Kenneth Land, and Terance Miethe investiagated individuals’ risks of violent crime and burglary victimization as a function of both individual crime opportunity factors and contextual indicators of neighborhood social disorganization using hierarchical methods (ROUNTREE, LAND, and MIETHE 1994). David Weiburd, Nancy Morris, and Elizabeth Groff performed a longitudinal study to investigate the juvenile crimes by micro level units of geography (Weisburd, Morris, and Groff 2009). David Weisburd and John Eck proceed to research on micro level crime incidients over a 14-year period and found the crimes on stree segments in Seattle are stable (Weisburd and Eck 2004). There are many more researches concerning on the different parts of Seattle crimes, thanks to the detailed crime data provided by Seattle Police Department.
Unfortunately, a better data base did not lead to a decrease in crime rate or lower police administration cost for Seattle Police Department. Even though there are many researches done on finding relationship between time, weather, and crimes. For example, a famous study conducted by Matthew Ranson analyzed 30-year panel of monthly crime accross 2997 US counties(Ranson 2014), which set up a great example in relating time trends, weather, and crimes. However, there are seldom studies done to investigate the relationship of time, weather, and crimes for Seattle specifically, especially given the fact that Seattle Police Department’s budget increased 9.7% in the past year. It would be valuable if we could use time series analysis to get insights from the whole crime data in the past 10 years, which can benefit Seattle Police Department allocate seasonal resources in controlling crimes and save some budget.
Following our discussion above, I would like to raise two questions that I want to solve in my report:
As mentioned above, The data were collected from Seattle Police Department. The data covered the reported offenses and offense categorization coded to simulate the standard reported to the FBI under the National Incident Based Reporting System (NIBRS) in Seattle from January 1974 to May 2019. In its original form, it had detailed variables including incident ID, location, crime types, and so on. Each row contains the record of a unique event where at least one criminal offense was reported by a member of the community or detected by an officer in the field. I extracted the crime counts by month from the data’s original form, which covered all sources/types of cimres. There is no missing data in the original data set, hence all the data were used for my analysis. There are 132 months included in the scope. Here is a first look at the data:
## # A tibble: 6 x 3
## # Groups: SEcrime$YearNum [1]
## `SEcrime$YearNum` `SEcrime$MonthNum` CrimeCounts
## <chr> <chr> <int>
## 1 2008 01 3322
## 2 2008 02 3124
## 3 2008 03 3367
## 4 2008 04 3327
## 5 2008 05 3637
## 6 2008 06 3615
First we could take a look at the general distribution by looking at the summary statistics and boxplot(Figure 1):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2829 3545 3904 3840 4145 4552
From the statistics and boxplot, we could see that the monthly death could range from 2829 to 4552, but the majority of the death are around 3500 to 4200. The mean (3840) and median (3904) are very close to each other. It gives a good indication that there is no substantial outlier which might cause potential problem in our analysis.
Then we could look at the plot of the time series data (Figure 2):
We could see that generally the Crimes per month has a general increasing trend from 2008 to 2018. The increasing trend is more obvious from 2011 to 2018. Also, we could see the data does not have constant variance along the whole time line.
We would like to confirm our guess on the trend through spectrum analysis(Figure 3). Spectrum analysis is the technique that allows us to analyze time series in the frequency domain. We represent the covariance of time series as spectral density and estimate them on periodogram, which is the squared correlation between time series and sin/consine waves at the different frequencies spanned by the series. Deatiled information could be found at the class note 8. I choose to use loess smoothing as indicated in the class for the spectrum with a span of (5,3,5), which smooths 3 times with different moving average smoothers. The loess smoothing is a repreated moving average for each span, and the (5,3,5) is chosen by tradition which is introduced in class.
Here we could observe that the dominate frequency is at frequency of 0.081481481. Since specturm has the unit of cycles per month. Hence the the period is 1/0.081481481 = 12.27 months, approximately 1 year. We could also check other secondary dominate frequencies except for the first one. We could see that the spectrum also peaks at frequency of 414814815 (a period of 2.4 months). In general, there are evidences provided by spectrum that there are some seasonal trends presented in the data. So, we would like to detrend the data before we fit the models we learned, also we would like to stablize the data to solve the problem of heteroskedasticity.
Following class note 8’s explanation on loess smoothing, we use different span of loess smoothing extracted high and low frequencies from the original data(Figure 4). High frequency has the span of 0.05, which takes 5% of the data points near any time t (except for the beginning and the end, which is handled by tapper built in loess function in r) and performed local linear regression with weighting function. Similar action is done to low frequency with span of 0.4, which looks at 40% of data points around any time t. After detrend the high and low frequency components from the original time series, we could extract the mid-range frenquency. The low frequency components are corresponding to the so-called “increasing trend along the time series” (extracting the low frequency components correctly could help us achieve the relatively constant mean), and the high frequency components are the small local variation that comes with the time series (extracting the high frequency components correctly could help us remove the heteroskedasticity). We could observe that “Cycles” here represents the detrended time series which removed the high and low frequency components. It looks weakly staionary with relative constant mean and constant variance.
In order to support our argument on the weak stationarity, we check the histogram(Figure 5), qqplot(Figure 6), and the shapiro test for the mid-range frequency time series too, we could found that it not only looks stationary, the histogram shows a clear bell curve for the data points. Moreover, According to wikipedia for normality test, I choose Shapiro–Wilk test to double confirm my result.
The Shapiro–Wilk test has:
.
where \(x_{(i)}\) is the ith order statistics and \(\bar{x}\) is the sample mean. The coefficients \(a_{i}\) are given by \(\left(a_{1}, \ldots, a_{n}\right)=\frac{m^{\top} V^{-1}}{C}\). C is \(C=\left\|V^{-1} m\right\|=\left(m^{\top} V^{-1} V^{-1} m\right)^{1 / 2}\) and m is \(m=\left(m_{1}, \dots, m_{n}\right)^{\top}\).
Hence when p-value of Shapiro–Wilk test is less than the chosen alpha level, the null hypothesis could be rejected.
By conducting Shapiro–Wilk test, we have the p-value of 0.8919, which is bigger than 0.05, which indicates the normality distribution of the detrended data points is strongly supported by Shapiro–Wilk test. We could also see from the qqnorm plot that most of the data points are distributed within the confidence interval, which could serve as another evidence of the normality achieved by the detrended time series.
##
## Shapiro-Wilk normality test
##
## data: Cycles
## W = 0.99081, p-value = 0.5381
## [1] "Mean of the detrended time series is: 2.71813198702524"
Most points here are distributed as a stragiht line too.
We could see that after detrending the data, the mid-range frequency components behaves like the white noise. Along with the smoothed spectrum((Figure4)’s trend part), it proved there is some long term trend underlying the crime time series. From 2008 to 2013, the crime counts are flucuating around 3600; while from 2013 to 2018, the crime counts has a clear upwarding trend (low frequency component). It answered our Question1 from section 1.1.
So far, we have used frequency decompsing to prove the existence of trends. Hence, we hae the motivation to also get weak stationarity for our raw time series.
Taking log on the raw data set could help to approach constant variance. Let’s check our data set after taking log once(Figure 7).
We could see that, even the variances are a little bit more stable, the last two years still have apprentally less variance comparing to first few years. Hence, we could take one more log(Figure 8):
We could see that even the pattern did not change a lot, the magintude of the y-axis is become very small, which has a span of 0.05 only. Hence the variance is very small. Now we need to get the constant mean for our time series. I choose take difference on the log(log(Crimes per Month))(Figure 9).
We could clearly observe that after taking the difference of log(log(Crimes per Month)), the time series is now fluctuating around 0. We could also check the spectrum(Figure 10) of the tranformed data get get some senese of the dominant frequencies.
Now the the x-axis has the unit of cycles per month, and we could see that the dominant frequencies is 0.41814 now, which is a period of 2.4 months.
Before we move on to model the data, we would like to check the ACF(Figure 11) for the detrended time series too.
ACF is defined as:
where \(\gamma_h\) and \(\gamma_0\) are actually \(\gamma_{n,n+h}\) and \(\gamma_{n,n}\) repectively. They are the autocovariances. h is the lag inbetween time.
For random variable \(Y_{1:N}\), autocovariances are \(\gamma_{m, n}=\mathbb{E}\left[\left(Y_{m}-\mu_{m}\right)\left(Y_{n}-\mu_{n}\right)\right]\)
By plotting thr autocovariance versus the lags, we can get ACF plot for the detredned data:
Here 1 lag means 1 month. We could clearly observe some periodical patterns in the ACF. We could first observe a significant spike at lag = 1 and 6. Also, we could observe the local spikes at around every 12 lags, which could indicate the Moving Average of 12 or seasonal model if we fit ARIMA model. The ACF gives us a great indicator of the main features about the mid-frequency range that we care about.
In conclusion, we could observe differnt indicators for correlation in between time points for the Diff(Log(Log(Monthly Crimes))).
.
Or it could be written as a function of B as \(\phi(B)\) and ends up in \(\phi(B)Yn\) :\(\phi(B)Y_n = Y_n- BY_n - B^2Y_n - ..... - B^PY_n = Y_n - Y_{n-1} - Y_{n-2} - ... - Y_{n-p}\)
After fitting the \(ARMA(p,q)\), we are expect to generate a vector \(\theta = (\phi_{1:p}, \epsilon_{1:q}, \mu, \theta^2)\), according to the description from class note 5
To find the p and q combination using likelihood ratio test to compare different models. Too much items (more complexity) will be penalized . We could check the fomula:
In more details, for \(l(\theta)\), \(\theta\) is from 2.2’s vector we generated. Suppose we have two nested hypotheses:
where \(\Theta_0 \subseteq \Theta_1\), with respective dimensions \(D_0 < D_1 \le D\). In our case, the null hypothesis \(H_0\) corresponds to the simpler model, for example \(ARMA(2,1)\), while \(H_1\) corresponds to the more complex model, for example \(ARMA(4,5)\). Since \(ARMA(4,5)\) is more complex than \(ARMA(2,1)\), with more levels of \(AR\) and \(MA\) components involved
We then consider the log likelihood maximized over each of the hypotheses:
and the \(l_1 - l_0\) could be approximated distributed as \((1/2)\chi^2_{D_1-D_0}\), where \(\chi^2_{d}\) is a chi-squared random variable on \(d\) degrees of freedom.
Given these, we could understand that AIC is comparing different models as models getting more complex. AIC will penalize the complexity(numbers of parameters). It is used to minimize the prediction error, while increasing parameters will lead to overfitting. So the lower the value for AIC, the better prediction performance the models obtains comparing to other models.
MA0 | MA1 | MA2 | MA3 | MA4 | MA5 | MA6 | |
---|---|---|---|---|---|---|---|
AR0 | -845.67 | -884.71 | -883.07 | -885.82 | -884.92 | -884.41 | -888.84 |
AR1 | -882.06 | -883.61 | -881.90 | -889.15 | -887.17 | -886.41 | -890.90 |
AR2 | -882.90 | -890.59 | -888.80 | -893.71 | -885.18 | -893.25 | -892.96 |
AR3 | -881.14 | -888.87 | -887.58 | -885.92 | -902.62 | -900.67 | -891.26 |
AR4 | -880.54 | -887.18 | -885.77 | -902.24 | -896.01 | -910.62 | -885.88 |
AR5 | -878.73 | -885.22 | -883.94 | -900.76 | -887.32 | -909.31 | -888.83 |
Given the output above, we could see that \(AIC\) is minimized at model \(ARMA(4,5)\) with a value of -910.62. However, we could observe that for \(ARMA(4,4)\) AND \(ARMA(3,4)\), leaving out one parameter has a better AIC of over 2. Accoding to the AIC defination above, it is not trustable. Combining with the model output below:
##
## Call:
## arima(x = CyclesUse, order = c(4, 0, 5))
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 ma4
## -0.0122 0.9996 -0.0072 -0.9969 -0.5911 -1.0048 0.5833 0.9811
## s.e. 0.0081 0.0142 0.0081 0.0006 0.0886 0.0458 0.0882 0.0594
## ma5 intercept
## -0.6021 2e-04
## s.e. 0.0836 2e-04
##
## sigma^2 estimated as 4.426e-05: log likelihood = 466.31, aic = -910.62
We also ploted the unit cycle for both AR and MA roots(Figure 12) . We could clearly see that both of them have roots at the boundary of unit cycle. We could check the root numbers below too:
## [1] 1.001488 1.000058 1.000058 1.001488
## [1] 1.020207 1.000004 1.000004 1.020207 1.595730
According to class note 4’s explnation on causality and invertibility, roots of AR should be outside the unit cycle to obtain invertibility and causality. Otherwise, the model might be at the risk of instability. Here the roots from \(ARIMA(4,5)\)are suffering from the problem of lack of invertibility and causality, making the model unstable. Hence, we would like to use simpler model when simpler model works too. By cancelling out the AR and MA roots, we could try \(ARMA(0,1)\) model.
##
## Call:
## arima(x = CyclesUse, order = c(0, 0, 1))
##
## Coefficients:
## ma1 intercept
## -0.5857 3e-04
## s.e. 0.0855 3e-04
##
## sigma^2 estimated as 6.505e-05: log likelihood = 445.36, aic = -884.71
## [1] "The p-value for likelihood test is : 1.41009820242566e-06"
By choosing \(ARMA(0,1)\), we conduct the log likelihood test inbetween \(ARMA(4,5)\) and \(ARMA(0,1)\).
The null hypothsis is \(ARMA(0,1)\) is the better model;
The alternate hypothsis is \(ARMA(4,5)\) (bigger model) is the better model
AS we could see the p-value for likelihood test is : 1.4100982026477e-06 < 0.05 (siginificance level). Hence \(ARMA(0,1)\) is the better model. It is simpler and is constant with the raw data ACF’s indication. Let’s also check the root for \(ARMA(0,1)\).
MA_roots <- polyroot(c(1,coef(AA_Low001)[c("ma1")]))
abs(MA_roots)
## [1] 1.707461
We could see the the MA root is 1.707461 bigger than 1, which indicates the model has the invertibility. Hence, we would like to proceed with the \(ARMA(0,1)\)
The model function could be written as:
We need to diagnostic our model bedore we proceed to draw conclusion from it or decide to do further modification. Performing diagnostic on residuals after fitting model could help us to reach to our dicision. We are expecting the residuals after modeling fitting behave like identically and independently distributed white noise.
The first thing we need to check is if the redisuals are distrbuted independently after fitting \(ARMA(0,1)\). We plot ACF(Figure 13) for the residuals.
We could see from (Figure 13) that after fitting \(ARMA(0,1)\), we still have outstanding spikes at around lag = 12 months and small lags at the period around 6. It indicates that it might be good to include a seasonl pattern with lag 12 to handle the correlation in between data points
The second thing we want to check is that if the residual is white noise. We plot the residual versus fitted data (Figure 14), qqplot (Figure 15), and ran Shpiro test.
From the residual plot we could see that the residuals are distributed around 0 with vairance that is not quite constant. There are some time points(begining of 2010, begining of 2014 and so on) that variances go really high. Also, the variance is generally bigger from 2008 to 2012 comparing to latter years. The residual plot indicates that we still need to do more work to find a better fit.
Then we could also check the qqplot and shapiro test. The ggplot shows the residuals has a tail that is not normally distributed as expected. The p-value from Shapiro test for residuals barely reached 3.942e-05, which is not at all Guassian distributed. Hence, \(ARMA(0,1)\) did not reach the expectation of a good fit for our data. We would like to proceed with adding seasonal patterns.
##
## Shapiro-Wilk normality test
##
## data: AA_Low001$residuals
## W = 0.94426, p-value = 3.942e-05
From section 3.2, we have noticed the siginificant seasonl pattern at lag of 12 from residual ACF. However, we are not quite sure here if it is coming from a seasonal AR or MA. Hence, it would be good to run likelihood test again to help in choosing the model.
SMA0 | SMA1 | SMA2 | SMA3 | SMA4 | SMA5 | |
---|---|---|---|---|---|---|
SAR0 | -884.71 | -909.80 | -915.65 | -919.94 | -926.97 | -925.56 |
SAR1 | -924.56 | -948.66 | -946.67 | -945.93 | -944.44 | -942.48 |
SAR2 | -931.08 | -946.67 | -944.80 | -944.58 | -944.70 | -942.64 |
SAR3 | -938.40 | -945.91 | -944.52 | -943.81 | -943.13 | -941.15 |
SAR4 | -943.28 | -944.13 | -943.96 | -943.07 | -940.94 | -938.98 |
Given the output above, we could see that \(AIC\) is minimized at model \(SARMA(0,1)X(1,1)_{12}\) with a value of -948.66. Also, we could notice that there is no big jumps in decreasing AIC when we take out one parameter from the model. Combining with the model output below:
##
## Call:
## arima(x = CyclesUse, order = c(0, 0, 1), seasonal = list(ordero = c(1L, 0L,
## 1L), period = 12))
##
## Coefficients:
## ma1 sar1 sma1 intercept
## -0.4943 0.9979 -0.9327 2e-04
## s.e. 0.0806 0.0127 0.2008 9e-04
##
## sigma^2 estimated as 3.238e-05: log likelihood = 479.33, aic = -948.66
Therefore the model we pick by likelihood ratio test has the function:
Now let’s use the same method diagostic the \(SARMA(0,1)X(1,1)_{12}\) model fitting.
First, let’s check the ACF of residuals (Figure 16), we could see that there is no siginificant spike that goes over the confidence interval any more. Which indicates the \(SARMA(0,1)X(1,1)_{12}\) model captured the autocorelation in between lags.
Second, let;s check the residual plot(Figure 17), we could see that the \(SARMA(0,1)X(1,1)_{12}\) residuals are more like white noise comparing to \(ARMA(0,1)\) before. But we also want to confirm with plotting the qqplot and conducting the Shapiro test.
Here is the output from qqplot(Figure 18) and Shpiro test, we could see that the majority of the residuals are distributedd within the confidence interval for qqplot. Moreover, the Shapiro test yield the p-value of 0.5712, which is much bigger than the significance level of 0.05, indicating the normality of residuals.
##
## Shapiro-Wilk normality test
##
## data: AA_LowS$residuals
## W = 0.99109, p-value = 0.5712
According to the model diagostic result from sectiion 3.4, we could reach to our final model of:
Moreover, we could also fit the model and compare with the original time series(Figure 19). Here the red line denotes the original time series, and the blue line denotes the model fitting results. We could see that our model fitts pretty well with the original model.
While it might not be interesting to only look at the fitted value for SARMA model. We could compare the fit with a simple linear regression. Again, here the red line is the original data set, while the blue straight line is the linear regression fitted value. We could see our SARMA model could extract much more information from the time series comapring to some simple statistical models like simple linear regression (The mean squared error for Linear model is much bigger than SARIMA model).
## [1] "The mean squared error for SARMA Model is: 59914.4237014559"
## [1] "The mean squared error for Simple Linear Regression Model is: 81208.3027734006"
To put it into a nutshell, we could arrive at these conclusion from our analysis and answer the Questions raised in section 1.1:
By conducting spectral analysis in frequency domian, we could extract the long term trend (low frequency component) from our crime numbers per month in Seattle’s time series. There are increasing trend in general, and especially from Year 2013 to 2018. However, during year 2008 to 2012, the crime numbers are pretty stable with no clear upwarding or downwarding trend. Hence, it is important for Seattle Police Department to put more resources on controlling the increasing crime numebers which might carry on to this year.
By fitting Seasonal ARMA model \(SARMA(0,1)X(1,1)_{12}\), we dervied a good model (It is indicated in Section 3.5) that could capture the seasonality, Autoregression, and Moving Average part for the time series and its residuals. Again, the model finalized as:
From the model, we could see that (a) the non-seasonal moving average compnent indicates the one-month-lagged observation incorporates the dependency between an observation and its residual error. (b) The seaonsal AR part indicates that the Crime number has correlation with the Crime number 12 months ago. (c) Similarly, the crime number’s residual error has deencency with the crime number’s residual error 12 months ago. The Seaonal part of the \(SARMA(0,1)X(1,1)_{12}\) model shows the one year periodical patterns that Seattle Police Department could pay attention to. In other words, we could see from the original data points that the warm weathers (around summer) are higher than cold weathers (around winter), and this pattern carries on year by year. (It does not necessarily casued by the weather, but by common sense, they are certainly have some correlation with each other). The findings of 12-month pattern correspond to the result from smoothed periodogram ,which indicates there might be a cycle of 12.27 months existing for the crime numbers. All in all, in general, the \(SARMA(0,1)X(1,1)_{12}\) did a good job in explain the information carried by the crime numbers time series.
The data collected by Seattle Police Department might change periodically accroding to yearly renewal, hence 2019 data is not used in our scope due to the version control. We might be able to get to know more about the data if we have 2019 data.
The \(SARMA(0,1)X(1,1)_{12}\) model might suffer from overfitting. Even though it looks like a good fit for the data we have, it might not be able to generalized in the future data.
The spectral analysis and ARIMA are pointing to different seasonal patterns. Even though it is not necessary for them to converage, it would still be interesting if we could find the relationship in between them in the future.
The future investigation in between the correlation of climate/weather and crime numbers would be interesting to perform.
Ranson, Matthew. 2014. “Crime, Weather, and Climate Change.” Journal of Environmental Economics and Management 67 (3). Elsevier BV: 274–302. https://doi.org/10.1016/j.jeem.2013.11.008.
ROUNTREE, PAMELA WILCOX, KENNETH C. LAND, and TERANCE D. MIETHE. 1994. “MACRO-MICRO INTEGRATION IN THE STUDY OF VICTIMIZATION: A HIERARCHICAL LOGISTIC MODEL ANALYSIS ACROSS SEATTLE NEIGHBORHOODS.” Criminology 32 (3). Wiley: 387–414. https://doi.org/10.1111/j.1745-9125.1994.tb01159.x.
Weisburd, David, and John E. Eck. 2004. “What Can Police Do to Reduce Crime, Disorder, and Fear?” The ANNALS of the American Academy of Political and Social Science 593 (1). SAGE Publications: 42–65. https://doi.org/10.1177/0002716203262548.
Weisburd, David, Nancy A. Morris, and Elizabeth R. Groff. 2009. “Hot Spots of Juvenile Crime: A Longitudinal Study of Arrest Incidents at Street Segments in Seattle, Washington.” Journal of Quantitative Criminology 25 (4). Springer Science; Business Media LLC: 443–67. https://doi.org/10.1007/s10940-009-9075-9.