Introduction

Electricity is an important energy in our daily life. In my project I will use the data of household electricity consumption from 2016/1/12 0:00 to 2016/1/23 23:50 to analyze the pattern and trend of people’s daily usage of electricity. The intervel of data is 10 minutes.

From these datas, I want to see the periodic pattern and trends of household electricity consumption and try to find a good model for it. For example, does people use more energy in certain time of a day or a week? Does ARMA model we learnt proper to fit the data of household electricity consumption? Is there any periodic pattern on people’s household electricity consumption?

Data overview

We can first see the overview and plot of our data which is the data of energy consume from 2016/1/12 0:00 to 2016/1/23 23:50 and the intervel of data is 10 minutes.

e <- read.csv(file="energydata_complete.csv", header = TRUE)
head(e)
##             date Appliances
## 1 2016/1/12 0:00         40
## 2 2016/1/12 0:10         30
## 3 2016/1/12 0:20         40
## 4 2016/1/12 0:30         50
## 5 2016/1/12 0:40        310
## 6 2016/1/12 0:50        380
app <- e$Appliances
plot(app,type = "l")

From the plot of data we can see a pattern of periodic and the data seems good for arma analysis.

Then we can see the acf plot of data:

acf(app, na.action = na.pass)

From the acf plot we can see that there is an evident for arma model since the ACF is descending and fall into confidence lag after some lag.

Frequency domain analysis

Then we can convert the data into frequency domain and analyze the frequency property of data.

spec=spectrum(app,spans=c(10,10))

fre <- spec$freq[which.max(spec$spec)]
fre
## [1] 0.006944444

The domain of frequency is shown as previous 0.00694 and we can try to convert it into cycle.

Since the data we get have intervals of 10 miniutes, we can convert them into hour to see the pattern:

((1/fre)*10)/60
## [1] 24

We can see there is a pattern of 24 hours on energy consumption which is really reasonable.

Then we can try to approach the trend of data:

time <- seq(from=12, length = length(app), by = 1/144)
plot(time, app, type = "l")
appt_l <- loess(app~time, span = 0.5)
lines(appt_l$x,appt_l$fitted,type="l")

u_low <- ts(loess(app~time,span=0.5)$fitted,
start=12,frequency=144)
u_hi <- ts(app - loess(app~time,span=0.1)$fitted,
start=12,frequency=144)
u_cycles <- app - u_hi - u_low
plot(ts.union(app, u_low,u_hi,u_cycles),
main="Decomposition of unemployment as trend + noise + cycles")

The trend of data show a pattern of having more electricity consumed on 15, 16 and 17 of 2016 January which are weekends as searching the calendar. So the trend from the data can be an evidence that people can consume more electricity on weekends which is reasonable.

Fix with ARMA model

First I try to use ARMA(p, 0, q) to fit the data. The AIC table is shown as follow:

## Loading required package: knitr
MA0 MA1 MA2 MA3 MA4 MA5
AR0 21651.88 20415.98 20147.03 19987.30 19947.05 19922.51
AR1 19951.08 19899.37 19879.34 19880.95 19871.15 19871.48
AR2 19915.11 19878.31 19879.67 19880.98 19870.54 19874.42
AR3 19889.82 19879.79 19871.27 19872.20 19872.43 19874.50
AR4 19888.92 19881.18 19871.72 19884.85 19874.38 19876.43
AR5 19879.40 19873.55 19871.40 19873.92 19863.03 19865.00

I finally choose ARMA(2, 4) which doesn’t have the lowest AIC but have the relatively low AIC and less parameters.

So I try to use ARIMA(2, 0, 4) to fit my data:

arma204 <- arima(app, order = c(2, 0, 4))
arma204
## 
## Call:
## arima(x = app, order = c(2, 0, 4))
## 
## Coefficients:
##          ar1      ar2      ma1      ma2     ma3      ma4  intercept
##       1.2650  -0.3167  -0.3329  -0.2873  0.0219  -0.0899   104.3127
## s.e.  0.1836   0.1643   0.1838   0.0288  0.0625   0.0322    10.9115
## 
## sigma^2 estimated as 5718:  log likelihood = -9927.27,  aic = 19870.54

Then we can see the ma_root and ar_root of ARMA(2, 4) model.

AR_roots <- polyroot(c(1,-coef(arma204)[c("ar1","ar2")]))
MA_roots <- polyroot(c(1,coef(arma204)[c("ma1","ma2","ma3","ma4")]))

AR roots:

AR_roots
## [1] 1.085568+0i 2.908459-0i

MA roots:

MA_roots
## [1]  1.224250+0.000000i -1.651014+0.000000i  0.335137-2.321834i
## [4]  0.335137+2.321834i

Since all the roots are outside the unit circle so we can prove the causality and invertibility of our model.

Try Sarima model

Then I tray to add seasonal part in my model. In the previous section we find that there is a cycle of 24 hour for our data. Since our data has the unit of 10 minites. So we add a period of 144. The information of new SARIMA model:

arma204101 <- arima(app, 
                    order = c(2,0,4),                           seasonal=list(order=c(0,1,0),period=144))
arma204101
## 
## Call:
## arima(x = app, order = c(2, 0, 4), seasonal = list(order = c(0, 1, 0), period = 144))
## 
## Coefficients:
##           ar1     ar2     ma1     ma2      ma3      ma4
##       -0.0833  0.5370  0.9471  0.0376  -0.0345  -0.0003
## s.e.   0.1603  0.0954  0.1616  0.1031   0.0707   0.0432
## 
## sigma^2 estimated as 10799:  log likelihood = -9603.52,  aic = 19221.04

We can see there is a decreasing of aic which may indicate that SARIMA model fits the data better than ARIMA model.

I finally use \(SARIMA(2, 0, 4) * (0, 1, 0)_144\) as model to fit the data of energy consumption.

Diagnostic Analysis

We can first see the plot of residual of the fitted model. The plot and acf plot of residual can be shown as follow:

plot(arma204101$residual)

acf(arma204101$residual)

We can see from the plot of residual that the ACF is all within the dashed line after lag 1 and that can be a indication of the mean statinarity of the model and which is also an evidence that our model fit the data well.

qqnorm(arma204101$residual)
qqline(arma204101$residual)

From the qqplot we can see that it has a tail near the end and the tail can’t be neglected. So we can conclude that the model is good for the data but we can still improve the model by adding more parameters or we can try to seek for better performance by looking for some enhanced models.

Conclusion

From our frequency analysis we can find a cycle of 1440 minutes which prove a cycle of 24 hours on the consumption of electricity with is reasonable. And we use this information in the fitting of SARIMA model. Also we discover the trend of using more energy on weekends which is also interesting. From the analysis we can see that the \(SARIMA(2, 0, 4) * (0, 1, 0)_144\) is relatively fitted for energy consumption data. However, we can see that the model of data can still be improved so that may need more novel models and further study.

Reference

[1]Dataset https://archive.ics.uci.edu/ml/machine-learning-databases/00374/

[2]STATS 531 course notations: Some R codes are modified from course notations.