Introduction

Consumer sentiment is a statistical measurement and economic indicator of the overall health of the economy as determined by consumer opinion. Consumer sentiment takes into account an individual’s feelings toward his or her current financial health, the health of the economy in the short-term and the prospects for longer-term economic growth.[1]

Consumer sentiment will change over time.If people are confident about the future they are likely to shop more, boosting the economy. In contrast, when consumers are uncertain about what lies ahead, they tend to save money and make fewer discretionary purchases. Gloomy sentiment weakens demand for goods and services, impacting corporate investment, the stock market, and employment opportunities, among other things.If people are confident about the future they are likely to shop more, boosting the economy. In contrast, when consumers are uncertain about what lies ahead, they tend to save money and make fewer discretionary purchases. Gloomy sentiment weakens demand for goods and services, impacting corporate investment, the stock market, and employment opportunities, among other things. [1]

In this project, we will analyze the monthly consumer sentiment over the past twenty-five years. Our goals are as follows:

  1. despite consumer sentiment will change over time, however, we want to explore whether or not the consumer sentiment tends to move towards the mean value consumer sentiment in the midst of the fluctuations. we will also use spectrum to check the period and cycle of consumer sentiment.finally we will use model to predict future cosumer sentiment.

  2. we will use various methods and models learnd in stats 531 and explore the fluctuations.


Exploratory Data Analysis

The data used for this project is the consumer sentiment for the past 68 years from https://fred.stlouisfed.org/series/UMCSENT [2]. The consumer sentiment for the first of each month is recorded and used for the data, from 1952 to 2020

First, we will read in the data. There are 2 variable of 807 observations each. the variable “DATE” give us the date recording the consumer sentiment, the variable “UMCSENT” give us the consumer sentiment,however, we just need the last 25 years data, since there are some missing data in the longer time, besides, this data has less meaning for us. so there are only 301 data we need to analysis.

##           DATE UMCSENT
## 507 1995-01-01    97.6
## 508 1995-02-01    95.1
## 509 1995-03-01    90.3
## 510 1995-04-01    92.5
## 511 1995-05-01    89.8
## 512 1995-06-01    92.7
## 513 1995-07-01    94.4
## 514 1995-08-01    96.2
## 515 1995-09-01    88.9
## 516 1995-10-01    90.2
## 517 1995-11-01    88.2
## 518 1995-12-01    91.0
## 519 1996-01-01    89.3
## 520 1996-02-01    88.5
## 521 1996-03-01    93.7
## 522 1996-04-01    92.7
## 523 1996-05-01    89.4
## 524 1996-06-01    92.4
## 525 1996-07-01    94.7
## 526 1996-08-01    95.3
## 527 1996-09-01    94.7
## 528 1996-10-01    96.5
## 529 1996-11-01    99.2
## 530 1996-12-01    96.9
## 531 1997-01-01    97.4
## 532 1997-02-01    99.7
## 533 1997-03-01   100.0
## 534 1997-04-01   101.4
## 535 1997-05-01   103.2
## 536 1997-06-01   104.5
## 537 1997-07-01   107.1
## 538 1997-08-01   104.4
## 539 1997-09-01   106.0
## 540 1997-10-01   105.6
## 541 1997-11-01   107.2
## 542 1997-12-01   102.1
## 543 1998-01-01   106.6
## 544 1998-02-01   110.4
## 545 1998-03-01   106.5
## 546 1998-04-01   108.7
## 547 1998-05-01   106.5
## 548 1998-06-01   105.6
## 549 1998-07-01   105.2
## 550 1998-08-01   104.4
## 551 1998-09-01   100.9
## 552 1998-10-01    97.4
## 553 1998-11-01   102.7
## 554 1998-12-01   100.5
## 555 1999-01-01   103.9
## 556 1999-02-01   108.1
## 557 1999-03-01   105.7
## 558 1999-04-01   104.6
## 559 1999-05-01   106.8
## 560 1999-06-01   107.3
## 561 1999-07-01   106.0
## 562 1999-08-01   104.5
## 563 1999-09-01   107.2
## 564 1999-10-01   103.2
## 565 1999-11-01   107.2
## 566 1999-12-01   105.4
## 567 2000-01-01   112.0
## 568 2000-02-01   111.3
## 569 2000-03-01   107.1
## 570 2000-04-01   109.2
## 571 2000-05-01   110.7
## 572 2000-06-01   106.4
## 573 2000-07-01   108.3
## 574 2000-08-01   107.3
## 575 2000-09-01   106.8
## 576 2000-10-01   105.8
## 577 2000-11-01   107.6
## 578 2000-12-01    98.4
## 579 2001-01-01    94.7
## 580 2001-02-01    90.6
## 581 2001-03-01    91.5
## 582 2001-04-01    88.4
## 583 2001-05-01    92.0
## 584 2001-06-01    92.6
## 585 2001-07-01    92.4
## 586 2001-08-01    91.5
## 587 2001-09-01    81.8
## 588 2001-10-01    82.7
## 589 2001-11-01    83.9
## 590 2001-12-01    88.8
## 591 2002-01-01    93.0
## 592 2002-02-01    90.7
## 593 2002-03-01    95.7
## 594 2002-04-01    93.0
## 595 2002-05-01    96.9
## 596 2002-06-01    92.4
## 597 2002-07-01    88.1
## 598 2002-08-01    87.6
## 599 2002-09-01    86.1
## 600 2002-10-01    80.6
## 601 2002-11-01    84.2
## 602 2002-12-01    86.7
## 603 2003-01-01    82.4
## 604 2003-02-01    79.9
## 605 2003-03-01    77.6
## 606 2003-04-01    86.0
## 607 2003-05-01    92.1
## 608 2003-06-01    89.7
## 609 2003-07-01    90.9
## 610 2003-08-01    89.3
## 611 2003-09-01    87.7
## 612 2003-10-01    89.6
## 613 2003-11-01    93.7
## 614 2003-12-01    92.6
## 615 2004-01-01   103.8
## 616 2004-02-01    94.4
## 617 2004-03-01    95.8
## 618 2004-04-01    94.2
## 619 2004-05-01    90.2
## 620 2004-06-01    95.6
## 621 2004-07-01    96.7
## 622 2004-08-01    95.9
## 623 2004-09-01    94.2
## 624 2004-10-01    91.7
## 625 2004-11-01    92.8
## 626 2004-12-01    97.1
## 627 2005-01-01    95.5
## 628 2005-02-01    94.1
## 629 2005-03-01    92.6
## 630 2005-04-01    87.7
## 631 2005-05-01    86.9
## 632 2005-06-01    96.0
## 633 2005-07-01    96.5
## 634 2005-08-01    89.1
## 635 2005-09-01    76.9
## 636 2005-10-01    74.2
## 637 2005-11-01    81.6
## 638 2005-12-01    91.5
## 639 2006-01-01    91.2
## 640 2006-02-01    86.7
## 641 2006-03-01    88.9
## 642 2006-04-01    87.4
## 643 2006-05-01    79.1
## 644 2006-06-01    84.9
## 645 2006-07-01    84.7
## 646 2006-08-01    82.0
## 647 2006-09-01    85.4
## 648 2006-10-01    93.6
## 649 2006-11-01    92.1
## 650 2006-12-01    91.7
## 651 2007-01-01    96.9
## 652 2007-02-01    91.3
## 653 2007-03-01    88.4
## 654 2007-04-01    87.1
## 655 2007-05-01    88.3
## 656 2007-06-01    85.3
## 657 2007-07-01    90.4
## 658 2007-08-01    83.4
## 659 2007-09-01    83.4
## 660 2007-10-01    80.9
## 661 2007-11-01    76.1
## 662 2007-12-01    75.5
## 663 2008-01-01    78.4
## 664 2008-02-01    70.8
## 665 2008-03-01    69.5
## 666 2008-04-01    62.6
## 667 2008-05-01    59.8
## 668 2008-06-01    56.4
## 669 2008-07-01    61.2
## 670 2008-08-01    63.0
## 671 2008-09-01    70.3
## 672 2008-10-01    57.6
## 673 2008-11-01    55.3
## 674 2008-12-01    60.1
## 675 2009-01-01    61.2
## 676 2009-02-01    56.3
## 677 2009-03-01    57.3
## 678 2009-04-01    65.1
## 679 2009-05-01    68.7
## 680 2009-06-01    70.8
## 681 2009-07-01    66.0
## 682 2009-08-01    65.7
## 683 2009-09-01    73.5
## 684 2009-10-01    70.6
## 685 2009-11-01    67.4
## 686 2009-12-01    72.5
## 687 2010-01-01    74.4
## 688 2010-02-01    73.6
## 689 2010-03-01    73.6
## 690 2010-04-01    72.2
## 691 2010-05-01    73.6
## 692 2010-06-01    76.0
## 693 2010-07-01    67.8
## 694 2010-08-01    68.9
## 695 2010-09-01    68.2
## 696 2010-10-01    67.7
## 697 2010-11-01    71.6
## 698 2010-12-01    74.5
## 699 2011-01-01    74.2
## 700 2011-02-01    77.5
## 701 2011-03-01    67.5
## 702 2011-04-01    69.8
## 703 2011-05-01    74.3
## 704 2011-06-01    71.5
## 705 2011-07-01    63.7
## 706 2011-08-01    55.8
## 707 2011-09-01    59.5
## 708 2011-10-01    60.8
## 709 2011-11-01    63.7
## 710 2011-12-01    69.9
## 711 2012-01-01    75.0
## 712 2012-02-01    75.3
## 713 2012-03-01    76.2
## 714 2012-04-01    76.4
## 715 2012-05-01    79.3
## 716 2012-06-01    73.2
## 717 2012-07-01    72.3
## 718 2012-08-01    74.3
## 719 2012-09-01    78.3
## 720 2012-10-01    82.6
## 721 2012-11-01    82.7
## 722 2012-12-01    72.9
## 723 2013-01-01    73.8
## 724 2013-02-01    77.6
## 725 2013-03-01    78.6
## 726 2013-04-01    76.4
## 727 2013-05-01    84.5
## 728 2013-06-01    84.1
## 729 2013-07-01    85.1
## 730 2013-08-01    82.1
## 731 2013-09-01    77.5
## 732 2013-10-01    73.2
## 733 2013-11-01    75.1
## 734 2013-12-01    82.5
## 735 2014-01-01    81.2
## 736 2014-02-01    81.6
## 737 2014-03-01    80.0
## 738 2014-04-01    84.1
## 739 2014-05-01    81.9
## 740 2014-06-01    82.5
## 741 2014-07-01    81.8
## 742 2014-08-01    82.5
## 743 2014-09-01    84.6
## 744 2014-10-01    86.9
## 745 2014-11-01    88.8
## 746 2014-12-01    93.6
## 747 2015-01-01    98.1
## 748 2015-02-01    95.4
## 749 2015-03-01    93.0
## 750 2015-04-01    95.9
## 751 2015-05-01    90.7
## 752 2015-06-01    96.1
## 753 2015-07-01    93.1
## 754 2015-08-01    91.9
## 755 2015-09-01    87.2
## 756 2015-10-01    90.0
## 757 2015-11-01    91.3
## 758 2015-12-01    92.6
## 759 2016-01-01    92.0
## 760 2016-02-01    91.7
## 761 2016-03-01    91.0
## 762 2016-04-01    89.0
## 763 2016-05-01    94.7
## 764 2016-06-01    93.5
## 765 2016-07-01    90.0
## 766 2016-08-01    89.8
## 767 2016-09-01    91.2
## 768 2016-10-01    87.2
## 769 2016-11-01    93.8
## 770 2016-12-01    98.2
## 771 2017-01-01    98.5
## 772 2017-02-01    96.3
## 773 2017-03-01    96.9
## 774 2017-04-01    97.0
## 775 2017-05-01    97.1
## 776 2017-06-01    95.0
## 777 2017-07-01    93.4
## 778 2017-08-01    96.8
## 779 2017-09-01    95.1
## 780 2017-10-01   100.7
## 781 2017-11-01    98.5
## 782 2017-12-01    95.9
## 783 2018-01-01    95.7
## 784 2018-02-01    99.7
## 785 2018-03-01   101.4
## 786 2018-04-01    98.8
## 787 2018-05-01    98.0
## 788 2018-06-01    98.2
## 789 2018-07-01    97.9
## 790 2018-08-01    96.2
## 791 2018-09-01   100.1
## 792 2018-10-01    98.6
## 793 2018-11-01    97.5
## 794 2018-12-01    98.3
## 795 2019-01-01    91.2
## 796 2019-02-01    93.8
## 797 2019-03-01    98.4
## 798 2019-04-01    97.2
## 799 2019-05-01   100.0
## 800 2019-06-01    98.2
## 801 2019-07-01    98.4
## 802 2019-08-01    89.8
## 803 2019-09-01    93.2
## 804 2019-10-01    95.5
## 805 2019-11-01    96.8
## 806 2019-12-01    99.3
## 807 2020-01-01    99.8
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   55.30   81.20   91.20   88.64   97.00  112.00

let’s look the format of the date. The first column is the date, and the second column is consumer sentiment on that day. from the summary data, we see the mean data is 88.64, max data is 112, min data is 55.30.

then, we can plot the data to observe the trend. The red line is the mean value of the consumer sentiment

From the plot, we can see consumer sentiment fluctuates around the mean overall, around the year 2000, consumer sentiment reached the max value, aound 2010 , consumer sentiment is in the lowest. No matter at the biggest and lowest, they all return to the mean value gradually. from the plot, we can’t see obvious seasonality property of consumer sentiment.


Data Smoothing

In this section, we will use Loess smoothing to take a look at the general trend of the consumer sentiment.

from the smoothing plot, we can find at frist the trend is increasing, and around 2010, the consumer sentiment is lowest. around 2000, the consumer sentiment is highest, but they all trend to the mean value

We now look at the dataset in terms of the low frequency component, which acts as the trend, the high frequency component, which acts as noise, and the cycles component. I decomposite the time series by different frequency

The first plot is just the time plot of our dataset. The second plot show the trend of data. in the third graph which shows us the low frequency, it looks like white noise. the last graph shows the period/cycle of the consumer sentiment[3]


Fitting an ARMA(p,q) model

from the analysis of the data, we can found although data is fluctuate, but it trend to reach the mean value of the consumer sentiment, it looks like staionary. and from the specturm and plot, we can’t found obvious period property of the data, so we first try to use arma model to analysis the data. Because there are some unstationay aound 2010, so later on we will use other model to better analysis the data.

An ARMA(p,q) equation is written as[3]

\[\phi(B)(Y_n-\mu)=\psi(B)\epsilon_n\]

then we will use ACF to choose the best value of p and q. The AIC of a model is :

\[AIC=-2 \times \ell(\theta^*) + 2D\] where \(\ell(\theta^**)\) is the log likelihood function \(\mathscr{L}=f_{Y_{1:N}}(y^*_{1:N};\theta)\) and \(D\) is the number of parameters. The second term \(2D\) serves as a penalty for models with more parameters, in order to deal with the problem of overfitting. [3]

We will construct a table that displays the AIC values for the different ARMA(p,q) models

## Warning in arima(data, order = c(p, 0, q)): possible convergence problem:
## optim gave code = 1

## Warning in arima(data, order = c(p, 0, q)): possible convergence problem:
## optim gave code = 1

## Warning in arima(data, order = c(p, 0, q)): possible convergence problem:
## optim gave code = 1
## Loading required package: knitr
MA 0 MA 1 MA 2 MA 3 MA 4 MA 5 MA 6
AR 0 2385.49 2096.46 1930.72 1859.43 1803.34 1781.56 1763.26
AR 1 1666.59 1667.31 1662.33 1661.57 1663.57 1665.48 1663.46
AR 2 1667.67 1662.67 1662.21 1663.17 1664.71 1666.37 1664.39
AR 3 1664.70 1661.70 1660.16 1664.56 1666.55 1658.43 1667.69
AR 4 1662.88 1665.86 1664.39 1657.84 1667.32 1665.82 1667.87
AR 5 1663.70 1664.88 1666.35 1665.05 1658.49 1662.46 1664.47
AR 6 1663.27 1664.36 1665.98 1668.26 1658.55 1660.42 1669.86
from the aic, w e can see ARMA(4,3) has lowest value of aic 1657.8 4, and and ARMA(3,5) has second lowest value 1658.43
so arma(4,3) is simpler t han model ARMA(3,5), it only h ave 7 para meters.

then we use ARMA(4,3) to fit data and analysis.

## 
## Call:
## arima(x = umcs, order = c(4, 0, 3))
## 
## Coefficients:
##          ar1     ar2     ar3      ar4     ma1      ma2      ma3  intercept
##       0.0424  0.7544  0.6174  -0.4483  0.8538  -0.0766  -0.6951    90.6628
## s.e.  0.1440  0.0807  0.0766   0.1330  0.1221   0.1872   0.1208     5.8940
## 
## sigma^2 estimated as 13.28:  log likelihood = -819.92,  aic = 1657.84

The equation is written as[3] \[ (X_n-\mu)-\phi_1(X_{n-1}-\mu)-\phi_2(X_{n-2}-\mu)-\phi_3(X_{n-3}-\mu)-\phi_4(X_{n-4}-\mu)=\epsilon_n+\theta_1\epsilon_{n-1}+\theta_2\epsilon_{n-2}+\theta_3\epsilon_{n-3} \] we see that the mean \(\mu=90.6628\), \(\phi_1=0.0424\), \(\phi_2=0.7544\), \(\phi_3=0.6174\), \(\theta_1=0.8538\), \(\theta_2=-0.0766\), and \(\theta_3=-0.6951\). the estimated variance for the error is 13.28.

check casuality/invertibility of model

we can check the causality and invertibility of our model, to check wheter is suitable to use arma(4,3) model to fit the data.

In order for the model to be causal, we will need to check the roots of the AR polynomial \(1-\phi_1z-\phi_2z^2-\phi_3z^3-\phi_4z^4\). to check whether the roots are outside the circle, if the roots are outside, we can prove the moodel is cansual

abs(polyroot(c(1,0.0424,0.7544,0.6174,-0.4483)))
## [1] 0.8941717 1.2096297 0.8941717 2.3064123

we can see some roots are not ouside, so the model is not causal

In order for the model to be invertibility, we will need to check the roots of the MA polynomial \(1+\theta_1z+\theta_2z^2+\theta_3z^3\). to check whether the roots are outside the circle, if the roots are outside, we can prove the moodel is cansual

abs(polyroot(c(1,0.8538,-0.0766,-0.6951)))
## [1] 0.9999921 0.9999921 1.4386646

we can see some roots are not ouside, so the model is not vertible. so there are some problem of the arma(4,3) model, it is not reliable to use this model to predict the data.

Diagnostics

We will now do diagnostics for the assumptions to see if the ARMA model is appropriate. first we can check whether the residuals are white noise

so the residuals’s mean value is around 0, and the variance is constant. so the model seems like statisfiy the white noise requirement.

then we can check ACF to verity whether the residuals are iid we can see there is no autocorrelation between different residuals. so it satisfy our expectation.

then we can assume the residual follow the normal distribution with mean value 0, variance 13.28. we can use qqplot to verify.

we can found that most of the residuals are on the qqline, it seems like satisfied the normal distribution.

Spectrum Analysis

In this section, we will use spectrum analysis to find the period of the cycles component at the frequency 0, it has the highest spectrum. so we can’t find obvious cycle now.

then we use smoothed periodgram to check whether there is cycle.

smoothed$freq[which.max(smoothed$spec)]
## [1] 0.003125

so the cycle is 0.003125 per month, about 320 months per cycle, 26 years per cycle. but we only choose 25 years, so it seems that there are obvious cycle of consumer sentiment,but we will use SARIMA model to fit the data and obserse wthether can get improvements

Fitting an SARIMA model

According to notes06, SARIMA\((0,1,1)\times(0,1,1)_{12}\) model is always used in monthly economic data. we can use this model to predict our data.[3]

## 
## Call:
## arima(x = dat$UMCSENT, order = c(0, 1, 1), seasonal = list(order = c(0, 1, 1), 
##     period = 12))
## 
## Coefficients:
##           ma1     sma1
##       -0.1485  -0.9538
## s.e.   0.0721   0.0791
## 
## sigma^2 estimated as 14.88:  log likelihood = -811.28,  aic = 1628.57

Diagnostics

We will now do diagnostics for the assumptions to see if the SARIMA model is appropriate. first we can check whether the residuals are white noise

so the residuals’s mean value is around 0, and the variance is constant. so the model seems like statisfiy the white noise requirement.

we can see there is no autocorrelation between different residuals. so it satisfy our expectation.

then we can assume the residual follow the normal distribution with mean value 0, variance 14.88. we can use qqplot to verify.

we can found that most of the residuals are on the qqline, it seems like satisfied the normal distribution.


Forecasting future exchange rate using both models

We will now use our ARMA(4,3) and ARIMA(2,1,2) and ARIMA models to predict the consumer sentiment in the near future.[4]

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

We see AIRMA(2,1,2) and arima(0,1,1)(0,1,1) model that the predicted consumer sentiment in the future trend toward to the mean value also there are some fluctuations. but we use arma(4,3) to predict the future consumer sentiment, there’s a lot of volatility and it’s very different from the mean value of the past consumer sentiment. so we will use arima(2,1,2) and arima(0,1,1)(0,1,1) to predict the data.

Conclusion

Through this project, we analysis the past 25 years monthly consumer sentiment and try to find a suitable model to predict and analysis the data.

Through dataplot, we found that the even though consumer sentiment will fluctuate over time. but no matter data become high or low, they finally toward to the mean value of the data. and through prediction analysis, we try to use model to find mean value of the consumer sentiment in the future.

we also use spectrum analysis to find the period of the cycle, however, no matter on the plot or the specturm plot, there in no obvious cycle on the consumer sentiment, we found the period is about 25 years, but we only use 25 years data. so maybe our data set is not big encough. If we expand the data set, maybe we could find a period of consumer sentiment.

we first use arma(4,3) to fit the data, even though it satisfied the white noise, but it is not stable encough and not causal and invertibal, so we try to find arima(0,1,1) and sarima(0,1,1)(0,1,1) to fit the data, later model is recommend in course to analysis the financial montly data, through the transformation of data, the data become more stable and can predict the data better, which gave us the prediction of the mean value of the consumer sentiment, even though there are some fluctuation due to some factors outside.

Exploration

we can’t found obvious cycle of the consumer sentiment in this project, maybe the data set is too small, so we could explore more data to find the period of the consumer sentiment using sepctrum analysis.

we can also to analysis the volatility of the consumer sentiment, because it can help enterprise carries on the market forecast and the production better, promotes the economic development


Reference

[1] https://www.investopedia.com/terms/c/consumer-sentiment.asp [2] https://fred.stlouisfed.org/series/UMCSENT [3] https://ionides.github.io/531w18/midterm_project/ as reference [4] https://www.datascience.com/blog/introduction-to-forecasting-with-arima-in-r-learn-data-science-tutorials