Introduction

Cryptocurrencies are a big deal nowadays, catching the eye of investors, traders, and researchers worldwide. Bitcoin (BTC) and Dogecoin (DOGE) are two that really stand out. Bitcoin, introduced in 2009, is the first and most well-known cryptocurrency, often regarded as digital gold. Dogecoin, on the other hand, started as a joke in 2013 but ended up gaining significant followings, partly due to its friendly community and low barrier to entry.

Hence, the main purpose of our report is to examine the relationship between Dogecoin and Bitcoin and construct predictive models capable of forecasting the closing prices of both cryptocurrencies. By doing so, we seek to provide valuable insights into the dynamics between Dogecoin and Bitcoin, aiding investors, traders, and researchers in making informed decisions within the volatile cryptocurrency market.

Hypothesis

  1. Bitcoin’s Long-term Periodicity: We hypothesize that Bitcoin exhibits long-term periodicity in its price movements. Its price is more stable compared to DOGE due to its market dominance.

  2. Dogecoin’s Short-term Fluctuation: In contrast, we hypothesize Dogecoin to show more short-term fluctuation. Its price is likely influenced by social media trends and news events.

Data Information

##         Date    Open    High     Low   Close   Volume Dividends Stock.Splits
## 1 2014-09-17 465.864 468.174 452.422 457.334 21056800         0            0
## 2 2014-09-18 456.860 456.860 413.104 424.440 34483200         0            0
## 3 2014-09-19 424.103 427.835 384.532 394.796 37919700         0            0
## 4 2014-09-20 394.673 423.296 389.883 408.904 36863600         0            0
## 5 2014-09-21 408.085 412.426 393.181 398.821 26580100         0            0
## 6 2014-09-22 399.100 406.916 397.130 402.152 24127600         0            0
##         Date     Open     High      Low    Close  Volume Dividends Stock.Splits
## 1 2017-11-09 0.001207 0.001415 0.001181 0.001415 6259550         0            0
## 2 2017-11-10 0.001421 0.001431 0.001125 0.001163 4246520         0            0
## 3 2017-11-11 0.001146 0.001257 0.001141 0.001201 2231080         0            0
## 4 2017-11-12 0.001189 0.001210 0.001002 0.001038 3288960         0            0
## 5 2017-11-13 0.001046 0.001212 0.001019 0.001211 2481270         0            0
## 6 2017-11-14 0.001201 0.001239 0.001131 0.001184 2660340         0            0
## [1] 3439    8
## [1] 2290    8
##       Date                 Open              High              Low         
##  Min.   :2014-09-17   Min.   :  176.9   Min.   :  211.7   Min.   :  171.5  
##  1st Qu.:2017-01-23   1st Qu.:  948.6   1st Qu.:  974.1   1st Qu.:  920.4  
##  Median :2019-06-02   Median : 8364.4   Median : 8592.0   Median : 8182.7  
##  Mean   :2019-06-02   Mean   :14963.0   Mean   :15311.6   Mean   :14590.9  
##  3rd Qu.:2021-10-08   3rd Qu.:25834.6   3rd Qu.:26116.6   3rd Qu.:25422.0  
##  Max.   :2024-02-15   Max.   :67549.7   Max.   :68789.6   Max.   :66382.1  
##      Close             Volume            Dividends  Stock.Splits
##  Min.   :  178.1   Min.   :5.915e+06   Min.   :0   Min.   :0    
##  1st Qu.:  962.5   1st Qu.:1.822e+08   1st Qu.:0   1st Qu.:0    
##  Median : 8368.8   Median :1.204e+10   Median :0   Median :0    
##  Mean   :14976.6   Mean   :1.668e+10   Mean   :0   Mean   :0    
##  3rd Qu.:25842.3   3rd Qu.:2.699e+10   3rd Qu.:0   3rd Qu.:0    
##  Max.   :67566.8   Max.   :3.510e+11   Max.   :0   Max.   :0
##       Date                 Open               High               Low          
##  Min.   :2017-11-09   Min.   :0.001046   Min.   :0.001210   Min.   :0.001002  
##  1st Qu.:2019-06-04   1st Qu.:0.002697   1st Qu.:0.002779   1st Qu.:0.002624  
##  Median :2020-12-27   Median :0.009047   Median :0.009558   Median :0.008313  
##  Mean   :2020-12-27   Mean   :0.063404   Mean   :0.066622   Mean   :0.060312  
##  3rd Qu.:2022-07-22   3rd Qu.:0.081373   3rd Qu.:0.083784   3rd Qu.:0.079749  
##  Max.   :2024-02-15   Max.   :0.687801   Max.   :0.737567   Max.   :0.608168  
##      Close              Volume            Dividends  Stock.Splits
##  Min.   :0.001038   Min.   :1.432e+06   Min.   :0   Min.   :0    
##  1st Qu.:0.002698   1st Qu.:4.013e+07   1st Qu.:0   1st Qu.:0    
##  Median :0.009064   Median :1.824e+08   Median :0   Median :0    
##  Mean   :0.063449   Mean   :9.151e+08   Mean   :0   Mean   :0    
##  3rd Qu.:0.081366   3rd Qu.:6.332e+08   3rd Qu.:0   3rd Qu.:0    
##  Max.   :0.684777   Max.   :6.941e+10   Max.   :0   Max.   :0


Date: A specific date and time.
Open: The opening price for each corresponding date.
High: The highest price reached during the day.
Low: The lowest price reached during the day.
Close: The closing price for each corresponding date.
Volume: The trading volume for each corresponding date.
Dividends: The dividends, if any, for each corresponding date.
Stock Splits: Stock splits, if any, for each corresponding date.

There are 3439 rows of data for btc_data.csv abd 2290 rows of data for doge_data. Both of them have 8 variables and our report will focus on “Date” and “Close” variables for analysis.

Exploratory Data Analysis (EDA)

Price Plots

BTC Closing Prices Plot Description

The BTC price plot illustrates volatility alongside discernible patterns. The plot reveals that its closing price has experienced several peaks and troughs over the years, reflecting distinct market cycles. Despite dramatic price fluctuations (price increases and decreases), the long-term trend appears relatively stable, aligning with our hypothesis.

DOGE Closing Prices Plot Description

The DOGE price plot looks more unstable compared to BTC’s. It is relatively flat from 2018 to 2021 and then experiences a big spike followed by a sharp decline. This could serve as evidence for the short-term fluctuations hypothesized, where prices are volatile and influenced by social media and news events.


Discussion of Visualization Findings One significant observation is the remarkable increase in the price of DOGE coin from February 2021 to April 2021, driven primarily by a series of tweets from Elon Musk endorsing DOGE. We view this event as an outlier, unlikely to occur frequently [1]. Given that this spike was not the result of standard market dynamics and, at present, we lack appropriate models to account for such anomalies, we have chosen to exclude this atypical price fluctuation during the first half of 2021, treating it as an external disruption in the data. Consequently, we will segment our analysis into two periods: the first spanning from November 9, 2011, to December 31, 2020, before the unusual price movement, and the second from July 1, 2021, to February 2024, following the event. A similar approach will be applied to Bitcoin (BTC) data to enable a comparative analysis between the two cryptocurrencies.

Another key observation is that prior to the first half of 2021, both currencies were trading at relatively lower prices but experienced substantial growth post-2021. This underscores the importance of excluding the aforementioned spike from our analysis, as the market dynamics at higher trading volumes may differ significantly.

## Visualization of Splited Data
We plot two sections of data to see the price movement before and after the spike. # Data processing ## Log Tranform of Prices Plots Due to the high variability of these two cryptocurrencies identified in the previous section, we consider applying a logarithmic transform to the closing price. We expect to see a more clear pattern once we reduce the difference between the high and low values with the Log-transformation. Plotting the prices after applying a logarithmic transformation makes it more evident: BTC exhibits greater stability in the long term, while DOGE jumps in year 2021 and shows more obvious fluctuations. Given that the logarithmic transformation is working well to the price surge, we anticipate that it will interfere our analysis and time-series models. To investigate the issue, we will split the data into two parts, “before 2021” and “after June 2021”, and compare to the data with the entire duration. As of right now, the stability increases in the plot of the log-transformation price after we split the data. The first half of the Log DOGE price fluctuates while the second half of DOGE price goes down. In contrast, a similar pattern is identified by naked eyes in the two splits of BTC data.

Return of cryptocurrencies

Indeed, analyzing closing price directly is more intuitive to understand the market, but return is another popular observations for both stocks and cryptocurrencies. Here, we follow an equation from the lecture slide
\[ log(r_t)=\Delta log(Close_t) = log(Close_t)-log(Close_{t-1}) \] where r is the return from the day t-1 to day t based on the closing price [2]. In fact, although “return” is a term used a lot in the Wall Street, it is the first difference of a time series data. As we have already seen in the previous sections, a big surge of DOGE leads to a big return in 2021. By contrast, a negative return is more obvious than other peaks in BTC price history. Before we applied any further models and analysis, we firstly investigate stationarity of the three different types of data, closing price, Log closing price, and return. \[ log(r_t)=\frac{\psi(B)}{\phi(B)}\epsilon_t \] Using either closing price or log closing price, ACF plots gradually decrease which points out an non-stationary characteristic (Not shown). Even if we split the data, the characteristic is not going away. Surprisingly, the pattern of ACF plots gets closer to the pattern of white noise ACF based on the return, especially for BTC. Additionally, the two splits of time-series do not evidently improve in terms of ACF plots; thus, we will analyze the entire time series in the following sections. More importantly, it can be seen that the ACFs of BTC decay quickly to zero without periodicity while significant spikes are present in the ACF of DOGE plot. Therefore, the data is more possible to be stationary. However, the worries about Burton Malkiel’s “A Random Walk Down Wall Street” raises [3]. The theory tells us that the market data is not predictable. To validate this idea, we will perform spectral analysis and fit the data with ARIMA models to answer the questions: 1) Is there any periodic or seasonal behaviors? 2) Is there any existing ARIMA model that are capable of modeling/forecasting the time-series of cryptocurrencies?

Spectral analysis

Although the pattern of ACFs look like white noise processes, some significant pikes, especially for DOGE, inspire us to investigate the frequency domain.

Unsmoothed Periodogram

Both periodograms show a clear trend of decreasing power with increasing frequency. However, DOGE displays greater variability than BTC, potentially indicating heightened short-term fluctuations. To gain further insights, smoothed periodograms are required.


Smoothed Periodogram

## [1] 0.3168403

## The peak frequency of BTC:
## [1] 0.3168403
## The peak frequency of DOGE:
## [1] 0.3168403

## [1] 0

## [1] 0

Both BTC and DOGE lack clear periodic cycles in their smoothed periodograms, indicating stronger long-term trends. Smoothed periodograms support the opinion that DOGE’s spectrum shows more variability at higher frequencies, which may suggest relatively more short-term fluctuations. Despite this, the highest frequency in the smoothed BTC and DOGE data are both 0.3168 which is a period \(T=1/\omege=3.16\). 3.16 years implies that a long-term periodicity may still exist in less obvious forms, such as trends or recurring patterns, which do not produce distinct peaks in the periodogram but influence price movement over longer time periods.

That being said, using AIC to select the best estimators considers 0 frequency as the highest peak in which larger values of span are certainly used to smooth the data. The result leads to an infinity period that means no periodic behavior. All in all, we are hard to claim that there is any periodic behavior in BTC and DOGE.


Build ARIMA models and select the best parameters

Like we mentioned in the previous section, we wonder if it is possible to model the two cryptocurrencies’ data. We firstly leverage AIC as an indicator \[AIC=2k-2log(L)\] to choose the best combinations of p and q for ARIMA models. \(k\) is number of parameter, and \(L\) is likelihood [4]. The lower AIC is, the better model is.
For selection of parameter selection we decide to use AIC score as an indicator. Although there is no obvious periodic behaviors identified in the previous analysis, we assume that a linear trend exists in the time-series datasets according to the consensus of the market [5]. We then simply scan the two parameters to build an ARIMA model with linear trend [6, 7]. \[ (1 − ϕ_1B)(Y_n − μ − βt_n) = ϵ_n \]

## Loading required package: knitr
MA0 MA1 MA2 MA3 MA4 MA5
AR0 3576.00 646.46 -1589.51 -3143.71 -4288.96 -5027.34
AR1 -8495.03 -8494.49 -8496.59 -8494.65 -8494.09 -8494.34
AR2 -8494.62 -8495.19 -8493.47 -8492.67 -8491.27 -8490.11
AR3 -8496.71 -8497.01 -8497.49 -8494.05 -8497.13 -8489.72
AR4 -8494.89 -8492.72 -8493.06 -8488.36 -8499.20 -8498.07
AR5 -8494.13 -8493.80 -8491.48 -8496.36 -8494.39 -8488.27
## [1] -8499.202
## [1] "Err msg:"
## <simpleError in optim(init[mask], armafn, method = optim.method, hessian = TRUE,     control = optim.control, trans = as.logical(transform.pars)): non-finite finite-difference value [1]>
MA0 MA1 MA2 MA3 MA4 MA5
AR0 7149.89 4134.32 1798.05 196.49 -952.09 -1847.89
AR1 -5523.13 -5521.49 -5521.28 -5542.64 -5540.64 -5539.89
AR2 -5521.47 -5530.76 -5529.24 -5540.64 -5538.93 -5538.70
AR3 -5520.88 -5529.19 -5546.89 -5547.40 -5540.20 -5550.99
AR4 -5545.04 -5543.07 -5548.12 -5543.21 -5548.54 -5551.34
AR5 -5543.06 -5541.36 -5548.10 -5548.71 -5550.16 NA
## [1] -5551.335

The lowest AIC values (-8499.202 for BTC and -5551.335 for DOGE) are obtained from an ARIMA(4,0,4) and an ARIMA(4,0,5) models. However, an ARMA models with P+Q>5 could be problematic. To verify if the linear trend model is required for the ARIMA model, we apply Likelihood Ratio Tests (LRT) to test a null hypothesis: the coefficient of the linear model is zero and an alternative hypothesis: the coefficient of the linear model is not zero [8]. \[ l_1-l_0 ≈ 1/2χ_1^2 \]

## [1] 0.01577435
## [1] 0.02340156

As a result, we can see that both models show significance (p=0.01577435 for BTC and 0.02340156 for DOGE) and reject the null hypothesis which means the linear model is required. The LRT tests lead us to difference the data. That being said, coupling ARIMA with a linear model might not be sufficient to represent the times series of crypotocurrencies. We aggressively fit a quadratic model with ARIMA on log prices of both assets to study and visualize the trend of price movement. \[log(Close_t)=\beta_0+\beta_1t+\beta_2t^2+ARMA(p,q)\]

##           ma1     intercept             t     t_squared 
##  9.704601e-01 -6.609308e+00  2.327201e-03 -7.252910e-08

##     intercept             t     t_squared 
##  8.472552e+00  1.419576e-03 -2.255522e-07

The two plots suggest that trends exist and go upward, but linear function would be sufficient to represent the trend while the fitting curve is not totally linear in the BTC plot. With the sense to choose simple models, we tend to choose an ARIMA model with a linear model and decide to difference the data with d=1. In fact, applying d=1 to the Log closing prices is similar to turning the data into the Log returns. Thus, in the following sections, we will build models with the first-order differencing.

MA0 MA1 MA2 MA3 MA4 MA5
AR0 -8497.49 -8497.03 -8499.02 -8497.06 -8496.41 -8496.54
AR1 -8497.16 -8497.64 -8498.78 -8497.32 -8495.96 -8494.86
AR2 -8499.13 -8498.88 -8497.80 -8495.98 -8506.35 -8500.50
AR3 -8497.28 -8495.15 -8495.85 -8498.13 -8499.45 -8496.93
AR4 -8496.43 -8495.78 -8493.59 -8491.98 -8489.82 -8488.03
AR5 -8496.34 -8493.83 -8491.70 -8490.02 -8487.59 -8494.54
## [1] -8506.348
MA0 MA1 MA2 MA3 MA4 MA5
AR0 -5527.88 -5526.21 -5526.04 -5547.22 -5545.22 -5544.53
AR1 -5526.20 -5535.47 -5533.87 -5545.22 -5543.55 -5543.36
AR2 -5525.65 -5533.88 -5535.01 -5552.02 -5544.65 -5555.46
AR3 -5549.58 -5547.60 -5552.77 -5553.56 -5553.08 -5555.58
AR4 -5547.61 -5545.94 -5552.68 -5553.02 -5551.09 -5550.24
AR5 -5547.36 -5546.33 -5555.79 -5550.44 -5550.05 -5557.53
## [1] -5557.53

According to the AIC table, we aquire the lowest AIC with ARIMA(2,1,4) for BTC and with ARIMA(5,1,5) for DOGE. In addition to scanning parameters with AIC, we are interested in the model selections with auto.arima method in R.

## Series: doge_data$log_Close 
## ARIMA(1,1,4) 
## 
## Coefficients:
##           ar1     ma1      ma2     ma3     ma4
##       -0.5631  0.5798  -0.0134  0.0843  0.0654
## s.e.   0.3352  0.3341   0.0251  0.0250  0.0325
## 
## sigma^2 = 0.005181:  log likelihood = 2777.78
## AIC=-5543.55   AICc=-5543.52   BIC=-5509.14
## Series: btc_data_filtered$log_Close 
## ARIMA(4,1,1) 
## 
## Coefficients:
##          ar1     ar2      ar3     ar4      ma1
##       0.8461  0.0640  -0.0280  0.0093  -0.8733
## s.e.  0.3006  0.0287   0.0304  0.0307   0.3000
## 
## sigma^2 = 0.001427:  log likelihood = 4253.89
## AIC=-8495.78   AICc=-8495.74   BIC=-8461.36

auto.arima chooses ARIMA(1,1,4) and ARIMA(4,1,1) for DOGE and BTC, respectively. It is a good sign to see that auto.arima automatically suggests d=1 for both models which aligns with our findings with linear and quadratic models. To pick the best models for the two data, we apply LRT again. In other words, we test that the performance of the models generated from the AIC tables and auto.arima are equal for null hypothesis or not equal for the alternative hypothesis [9].

## [1] 1
## [1] 1

Both tests report p-values greater than 0.05, so we cannot reject the null hypothesis. Although auto.arima suggests two models with less complexity and with P+Q=5, we will keep all the models for the next diagnosis. More importantly, we have previously mentioned that modeling the crypto data are doubtful since it is considered as a random walk process. We here compare the auto.arima-based models with the first-order differencing data ARIMA(0,1,0) with respect to the LRT p-values.

## [1] 0.1413314
## [1] 0.0001030596
## [1] 0.001949054
## [1] 3.092019e-07

A good news and a bad news for the auto.arima-based models–we cannot reject the null hypothesis for BTC while it is significant for DOGE. That is, the ARIMA model could capture certain patterns which cannot be modeled by a model of “yesterday’s return”. The ARIMA model of BTC, however, is unfortunately outperformed by ARIMA(0,1,0). The two AIC-table-based models are significantly better, albeit more complex than the auto.arima`-based models, than the ARIMA(0,1,0). Given the ambiguous results, We tend to explain that the ARIMA models are required to be improved or might not be ideal for the crypo data. Despite the significant LRT for the AIC-table-based models, P+Q>5 implies hidden unsolvable issues. To further confirm the idea, we will proceed more diagnosis.

Diagnosis of selected models

Compare the original time series to the fitted prices

We simply visualize the original the Log closing prices and the fitted prices generated from the two different methods.

## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

The fitted values apparently recapitulate the behaviors of the original closing prices.

Diagnosis of residuals

Based on the assumptions for time-series models, the residuals should behave like a white noise process which means nothing else left to be modeled. We then visualize the ACF of residuals. In spite of one or two significant spikes, the ACFs indeed drops from 0 to 1 lags and are similar to the patterns of ACFs of white noises. Besides, there are more significant spikes detected in DOGE.

Diagnosis of invertibility, stationarity, and causality

If the roots fall outside the unit circle for the AR part, the model is stationary. Likewise, if the roots are outside the circle, it is invertible. However, our analysis suggests that the model is neither stationary and invertible. This also confirms that causality is not satisfied [10]. If the roots fall outside the unit circle for the AR part, the model is stationary. Likewise, if the roots are outside the circle, it is invertible. However, our analysis suggests that the model is neither stationary and invertible while the roots are almost on the unit circles. This also confirms that causality is not satisfied. To verify if the models are really not stable, we then perform Bootstrapping method to the ARIMA(5,1,5) model which has roots close to the unit circle [11].

## Loading required package: foreach
## 
## Attaching package: 'foreach'
## The following objects are masked from 'package:purrr':
## 
##     accumulate, when
## Loading required package: iterators
## Loading required package: parallel

We can find a dominant peak in each plot for all the coefficients with the scale and breaks provided in the lecture slides [5]. The result suggests the stability of the ARIMA model.

Then, we applied the same method to investigate ARIMA(1,1,4) generated from auto.arima. Concentrating peaks are still detectable in the ARIMA model that suggests a stable model.

Conclusion

We successfully applied time-series models and analysis to study the closing prices of cryptocurrencies, BTC and DOGE. Our research indicates that a log-transformation can improve time series analysis while the transformed data are still not stationary. We also recognize linear trends in the time-series data, but no periodic behaviors are identified. After de-trending the data, we select models that can provide good fitted values and satisfy the assumptions of residuals. While the model selected by auto.arima for BTC does not outperform the ARIMA(0,1,0), the significance of the rest of the models do not support the idea of “A Random Walk Down Wall Street”. In other words, a good combination of P and Q is possible to improve an ARIMA model detecting and recapitulating the time-series of cryptocurrencies. We indeed demonstrate the stability of the parameters, but the concerns of the causality, invertibility, and stationarity still questions the models. In addition, we anticipate that the greater fluctuating behaviors of DOGE would lead to worse models and unpredictable outcomes compared to BTC. However, except for the high frequencies of “ups and downs”, both BTC and DOGE share similar characteristics such as trends, ACFs, and models. In the end, while we tend to claim that the time series of crypto market can be modeled by ARIMA, the results may vary due to different time windows. Particularly, the forecasting and predictive power of the models are not yet tested, which is considered one of the most important reasons to study time-series modeling.

Reference

  1. Dogecoin price soars more than 100% to new record after Elon Musk tweets. CNN. https://www.cnn.com/2021/04/16/investing/dogecoin-price-elon-musk-int-hk/index.html
  2. Returns of S&P data. lecture slides Ch01 p.19.
  3. A random walk down Wall Street: the time-tested strategy for successful investing. Burton Malkiel (2011).
  4. Akaike’s information criterion (AIC). lecture slides Ch05 p.21.
  5. Understanding the crypto-asset phenomenon, its risks and measurement issues. ECB Economic Bulletin (2019). https://www.ecb.europa.eu/pub/economic-bulletin/articles/2019/html/ecb.ebart201905_03~c83aeaa44c.en.html
  6. Project02: Ethereum and Investment. (2022).
  7. Project07: Average Price for Car License. (2022).
  8. Likelihood ratio test. lecture slides Ch05 p.19.
  9. Likelihood Ratio Test. Peter Roessler-Caram. (2018) https://rpubs.com/roes7096/LTR
  10. Fitting ARMA models in R: Choosing p and q. lecture slides Ch05 p.32.
  11. Fitting ARMA models in R: Boostrapping test. lecture slides Ch05 p.36.