Cryptocurrencies are a big deal nowadays, catching the eye of investors, traders, and researchers worldwide. Bitcoin (BTC) and Dogecoin (DOGE) are two that really stand out. Bitcoin, introduced in 2009, is the first and most well-known cryptocurrency, often regarded as digital gold. Dogecoin, on the other hand, started as a joke in 2013 but ended up gaining significant followings, partly due to its friendly community and low barrier to entry.
Hence, the main purpose of our report is to examine the relationship between Dogecoin and Bitcoin and construct predictive models capable of forecasting the closing prices of both cryptocurrencies. By doing so, we seek to provide valuable insights into the dynamics between Dogecoin and Bitcoin, aiding investors, traders, and researchers in making informed decisions within the volatile cryptocurrency market.
Bitcoin’s Long-term Periodicity: We hypothesize that Bitcoin exhibits long-term periodicity in its price movements. Its price is more stable compared to DOGE due to its market dominance.
Dogecoin’s Short-term Fluctuation: In contrast, we hypothesize Dogecoin to show more short-term fluctuation. Its price is likely influenced by social media trends and news events.
## Date Open High Low Close Volume Dividends Stock.Splits
## 1 2014-09-17 465.864 468.174 452.422 457.334 21056800 0 0
## 2 2014-09-18 456.860 456.860 413.104 424.440 34483200 0 0
## 3 2014-09-19 424.103 427.835 384.532 394.796 37919700 0 0
## 4 2014-09-20 394.673 423.296 389.883 408.904 36863600 0 0
## 5 2014-09-21 408.085 412.426 393.181 398.821 26580100 0 0
## 6 2014-09-22 399.100 406.916 397.130 402.152 24127600 0 0
## Date Open High Low Close Volume Dividends Stock.Splits
## 1 2017-11-09 0.001207 0.001415 0.001181 0.001415 6259550 0 0
## 2 2017-11-10 0.001421 0.001431 0.001125 0.001163 4246520 0 0
## 3 2017-11-11 0.001146 0.001257 0.001141 0.001201 2231080 0 0
## 4 2017-11-12 0.001189 0.001210 0.001002 0.001038 3288960 0 0
## 5 2017-11-13 0.001046 0.001212 0.001019 0.001211 2481270 0 0
## 6 2017-11-14 0.001201 0.001239 0.001131 0.001184 2660340 0 0
## [1] 3439 8
## [1] 2290 8
## Date Open High Low
## Min. :2014-09-17 Min. : 176.9 Min. : 211.7 Min. : 171.5
## 1st Qu.:2017-01-23 1st Qu.: 948.6 1st Qu.: 974.1 1st Qu.: 920.4
## Median :2019-06-02 Median : 8364.4 Median : 8592.0 Median : 8182.7
## Mean :2019-06-02 Mean :14963.0 Mean :15311.6 Mean :14590.9
## 3rd Qu.:2021-10-08 3rd Qu.:25834.6 3rd Qu.:26116.6 3rd Qu.:25422.0
## Max. :2024-02-15 Max. :67549.7 Max. :68789.6 Max. :66382.1
## Close Volume Dividends Stock.Splits
## Min. : 178.1 Min. :5.915e+06 Min. :0 Min. :0
## 1st Qu.: 962.5 1st Qu.:1.822e+08 1st Qu.:0 1st Qu.:0
## Median : 8368.8 Median :1.204e+10 Median :0 Median :0
## Mean :14976.6 Mean :1.668e+10 Mean :0 Mean :0
## 3rd Qu.:25842.3 3rd Qu.:2.699e+10 3rd Qu.:0 3rd Qu.:0
## Max. :67566.8 Max. :3.510e+11 Max. :0 Max. :0
## Date Open High Low
## Min. :2017-11-09 Min. :0.001046 Min. :0.001210 Min. :0.001002
## 1st Qu.:2019-06-04 1st Qu.:0.002697 1st Qu.:0.002779 1st Qu.:0.002624
## Median :2020-12-27 Median :0.009047 Median :0.009558 Median :0.008313
## Mean :2020-12-27 Mean :0.063404 Mean :0.066622 Mean :0.060312
## 3rd Qu.:2022-07-22 3rd Qu.:0.081373 3rd Qu.:0.083784 3rd Qu.:0.079749
## Max. :2024-02-15 Max. :0.687801 Max. :0.737567 Max. :0.608168
## Close Volume Dividends Stock.Splits
## Min. :0.001038 Min. :1.432e+06 Min. :0 Min. :0
## 1st Qu.:0.002698 1st Qu.:4.013e+07 1st Qu.:0 1st Qu.:0
## Median :0.009064 Median :1.824e+08 Median :0 Median :0
## Mean :0.063449 Mean :9.151e+08 Mean :0 Mean :0
## 3rd Qu.:0.081366 3rd Qu.:6.332e+08 3rd Qu.:0 3rd Qu.:0
## Max. :0.684777 Max. :6.941e+10 Max. :0 Max. :0
Date: A specific date and time.
Open: The opening price for each corresponding
date.
High: The highest price reached during the day.
Low: The lowest price reached during the day.
Close: The closing price for each corresponding
date.
Volume: The trading volume for each corresponding
date.
Dividends: The dividends, if any, for each
corresponding date.
Stock Splits: Stock splits, if any, for each
corresponding date.
There are 3439 rows of data for btc_data.csv abd 2290 rows of data for doge_data. Both of them have 8 variables and our report will focus on “Date” and “Close” variables for analysis.
BTC Closing Prices Plot Description
The BTC price plot illustrates volatility alongside discernible patterns. The plot reveals that its closing price has experienced several peaks and troughs over the years, reflecting distinct market cycles. Despite dramatic price fluctuations (price increases and decreases), the long-term trend appears relatively stable, aligning with our hypothesis.
DOGE Closing Prices Plot Description
The DOGE price plot looks more unstable compared to BTC’s. It is relatively flat from 2018 to 2021 and then experiences a big spike followed by a sharp decline. This could serve as evidence for the short-term fluctuations hypothesized, where prices are volatile and influenced by social media and news events.
Discussion of Visualization Findings One significant
observation is the remarkable increase in the price of DOGE coin from
February 2021 to April 2021, driven primarily by a series of tweets from
Elon Musk endorsing DOGE. We view this event as an outlier, unlikely to
occur frequently [1]. Given that this spike was not the result of
standard market dynamics and, at present, we lack appropriate models to
account for such anomalies, we have chosen to exclude this atypical
price fluctuation during the first half of 2021, treating it as an
external disruption in the data. Consequently, we will segment our
analysis into two periods: the first spanning from November 9, 2011, to
December 31, 2020, before the unusual price movement, and the second
from July 1, 2021, to February 2024, following the event. A similar
approach will be applied to Bitcoin (BTC) data to enable a comparative
analysis between the two cryptocurrencies.
Another key observation is that prior to the first half of 2021, both currencies were trading at relatively lower prices but experienced substantial growth post-2021. This underscores the importance of excluding the aforementioned spike from our analysis, as the market dynamics at higher trading volumes may differ significantly.
## Visualization of Splited Data
We plot two sections of data to see the price movement before and after
the spike.
# Data processing ## Log Tranform of Prices Plots Due to the high
variability of these two cryptocurrencies identified in the previous
section, we consider applying a logarithmic transform to the closing
price. We expect to see a more clear pattern once we reduce the
difference between the high and low values with the Log-transformation.
Plotting the prices after applying a logarithmic transformation makes it
more evident: BTC exhibits greater stability in the long term, while
DOGE jumps in year 2021 and shows more obvious fluctuations. Given that
the logarithmic transformation is working well to the price surge, we
anticipate that it will interfere our analysis and time-series models.
To investigate the issue, we will split the data into two parts, “before
2021” and “after June 2021”, and compare to the data with the entire
duration.
As of right now, the stability increases in the plot of the
log-transformation price after we split the data. The first half of the
Log DOGE price fluctuates while the second half of DOGE price goes down.
In contrast, a similar pattern is identified by naked eyes in the two
splits of BTC data.
Indeed, analyzing closing price directly is more intuitive to
understand the market, but return is another popular observations for
both stocks and cryptocurrencies. Here, we follow an equation from the
lecture slide
\[
log(r_t)=\Delta log(Close_t) = log(Close_t)-log(Close_{t-1})
\] where r is the return from the day t-1 to day t based on the
closing price [2]. In fact, although “return” is a term used a lot in
the Wall Street, it is the first difference of a time series data.
As we have already seen in the previous sections, a big surge of DOGE
leads to a big return in 2021. By contrast, a negative return is more
obvious than other peaks in BTC price history. Before we applied any
further models and analysis, we firstly investigate stationarity of the
three different types of data, closing price, Log closing price, and
return. \[
log(r_t)=\frac{\psi(B)}{\phi(B)}\epsilon_t
\]
Using either closing price or log closing price, ACF plots gradually
decrease which points out an non-stationary characteristic (Not shown).
Even if we split the data, the characteristic is not going away.
Surprisingly, the pattern of ACF plots gets closer to the pattern of
white noise ACF based on the return, especially for BTC. Additionally,
the two splits of time-series do not evidently improve in terms of ACF
plots; thus, we will analyze the entire time series in the following
sections. More importantly, it can be seen that the ACFs of BTC decay
quickly to zero without periodicity while significant spikes are present
in the ACF of DOGE plot. Therefore, the data is more possible to be
stationary. However, the worries about Burton Malkiel’s “A Random Walk
Down Wall Street” raises [3]. The theory tells us that the market data
is not predictable. To validate this idea, we will perform spectral
analysis and fit the data with ARIMA models to answer the questions: 1)
Is there any periodic or seasonal behaviors? 2) Is there any existing
ARIMA model that are capable of modeling/forecasting the time-series of
cryptocurrencies?
Although the pattern of ACFs look like white noise processes, some significant pikes, especially for DOGE, inspire us to investigate the frequency domain.
Both periodograms show a clear trend of decreasing power with increasing frequency. However, DOGE displays greater variability than BTC, potentially indicating heightened short-term fluctuations. To gain further insights, smoothed periodograms are required.
## [1] 0.3168403
## The peak frequency of BTC:
## [1] 0.3168403
## The peak frequency of DOGE:
## [1] 0.3168403
## [1] 0
## [1] 0
Both BTC and DOGE lack clear periodic cycles in their smoothed periodograms, indicating stronger long-term trends. Smoothed periodograms support the opinion that DOGE’s spectrum shows more variability at higher frequencies, which may suggest relatively more short-term fluctuations. Despite this, the highest frequency in the smoothed BTC and DOGE data are both 0.3168 which is a period \(T=1/\omege=3.16\). 3.16 years implies that a long-term periodicity may still exist in less obvious forms, such as trends or recurring patterns, which do not produce distinct peaks in the periodogram but influence price movement over longer time periods.
That being said, using AIC to select the best estimators considers 0 frequency as the highest peak in which larger values of span are certainly used to smooth the data. The result leads to an infinity period that means no periodic behavior. All in all, we are hard to claim that there is any periodic behavior in BTC and DOGE.
Like we mentioned in the previous section, we wonder if it is
possible to model the two cryptocurrencies’ data. We firstly leverage
AIC as an indicator \[AIC=2k-2log(L)\]
to choose the best combinations of p and q for ARIMA models. \(k\) is number of parameter, and \(L\) is likelihood [4]. The lower AIC is,
the better model is.
For selection of parameter selection we decide to use AIC score as an
indicator. Although there is no obvious periodic behaviors identified in
the previous analysis, we assume that a linear trend exists in the
time-series datasets according to the consensus of the market [5]. We
then simply scan the two parameters to build an ARIMA model with linear
trend [6, 7]. \[
(1 − ϕ_1B)(Y_n − μ − βt_n) = ϵ_n
\]
## Loading required package: knitr
MA0 | MA1 | MA2 | MA3 | MA4 | MA5 | |
---|---|---|---|---|---|---|
AR0 | 3576.00 | 646.46 | -1589.51 | -3143.71 | -4288.96 | -5027.34 |
AR1 | -8495.03 | -8494.49 | -8496.59 | -8494.65 | -8494.09 | -8494.34 |
AR2 | -8494.62 | -8495.19 | -8493.47 | -8492.67 | -8491.27 | -8490.11 |
AR3 | -8496.71 | -8497.01 | -8497.49 | -8494.05 | -8497.13 | -8489.72 |
AR4 | -8494.89 | -8492.72 | -8493.06 | -8488.36 | -8499.20 | -8498.07 |
AR5 | -8494.13 | -8493.80 | -8491.48 | -8496.36 | -8494.39 | -8488.27 |
## [1] -8499.202
## [1] "Err msg:"
## <simpleError in optim(init[mask], armafn, method = optim.method, hessian = TRUE, control = optim.control, trans = as.logical(transform.pars)): non-finite finite-difference value [1]>
MA0 | MA1 | MA2 | MA3 | MA4 | MA5 | |
---|---|---|---|---|---|---|
AR0 | 7149.89 | 4134.32 | 1798.05 | 196.49 | -952.09 | -1847.89 |
AR1 | -5523.13 | -5521.49 | -5521.28 | -5542.64 | -5540.64 | -5539.89 |
AR2 | -5521.47 | -5530.76 | -5529.24 | -5540.64 | -5538.93 | -5538.70 |
AR3 | -5520.88 | -5529.19 | -5546.89 | -5547.40 | -5540.20 | -5550.99 |
AR4 | -5545.04 | -5543.07 | -5548.12 | -5543.21 | -5548.54 | -5551.34 |
AR5 | -5543.06 | -5541.36 | -5548.10 | -5548.71 | -5550.16 | NA |
## [1] -5551.335
The lowest AIC values (-8499.202 for BTC and -5551.335 for DOGE) are obtained from an ARIMA(4,0,4) and an ARIMA(4,0,5) models. However, an ARMA models with P+Q>5 could be problematic. To verify if the linear trend model is required for the ARIMA model, we apply Likelihood Ratio Tests (LRT) to test a null hypothesis: the coefficient of the linear model is zero and an alternative hypothesis: the coefficient of the linear model is not zero [8]. \[ l_1-l_0 ≈ 1/2χ_1^2 \]
## [1] 0.01577435
## [1] 0.02340156
As a result, we can see that both models show significance (p=0.01577435 for BTC and 0.02340156 for DOGE) and reject the null hypothesis which means the linear model is required. The LRT tests lead us to difference the data. That being said, coupling ARIMA with a linear model might not be sufficient to represent the times series of crypotocurrencies. We aggressively fit a quadratic model with ARIMA on log prices of both assets to study and visualize the trend of price movement. \[log(Close_t)=\beta_0+\beta_1t+\beta_2t^2+ARMA(p,q)\]
## ma1 intercept t t_squared
## 9.704601e-01 -6.609308e+00 2.327201e-03 -7.252910e-08
## intercept t t_squared
## 8.472552e+00 1.419576e-03 -2.255522e-07
The two plots suggest that trends exist and go upward, but linear function would be sufficient to represent the trend while the fitting curve is not totally linear in the BTC plot. With the sense to choose simple models, we tend to choose an ARIMA model with a linear model and decide to difference the data with d=1. In fact, applying d=1 to the Log closing prices is similar to turning the data into the Log returns. Thus, in the following sections, we will build models with the first-order differencing.
MA0 | MA1 | MA2 | MA3 | MA4 | MA5 | |
---|---|---|---|---|---|---|
AR0 | -8497.49 | -8497.03 | -8499.02 | -8497.06 | -8496.41 | -8496.54 |
AR1 | -8497.16 | -8497.64 | -8498.78 | -8497.32 | -8495.96 | -8494.86 |
AR2 | -8499.13 | -8498.88 | -8497.80 | -8495.98 | -8506.35 | -8500.50 |
AR3 | -8497.28 | -8495.15 | -8495.85 | -8498.13 | -8499.45 | -8496.93 |
AR4 | -8496.43 | -8495.78 | -8493.59 | -8491.98 | -8489.82 | -8488.03 |
AR5 | -8496.34 | -8493.83 | -8491.70 | -8490.02 | -8487.59 | -8494.54 |
## [1] -8506.348
MA0 | MA1 | MA2 | MA3 | MA4 | MA5 | |
---|---|---|---|---|---|---|
AR0 | -5527.88 | -5526.21 | -5526.04 | -5547.22 | -5545.22 | -5544.53 |
AR1 | -5526.20 | -5535.47 | -5533.87 | -5545.22 | -5543.55 | -5543.36 |
AR2 | -5525.65 | -5533.88 | -5535.01 | -5552.02 | -5544.65 | -5555.46 |
AR3 | -5549.58 | -5547.60 | -5552.77 | -5553.56 | -5553.08 | -5555.58 |
AR4 | -5547.61 | -5545.94 | -5552.68 | -5553.02 | -5551.09 | -5550.24 |
AR5 | -5547.36 | -5546.33 | -5555.79 | -5550.44 | -5550.05 | -5557.53 |
## [1] -5557.53
According to the AIC table, we aquire the lowest AIC with
ARIMA(2,1,4) for BTC and with ARIMA(5,1,5) for DOGE. In addition to
scanning parameters with AIC, we are interested in the model selections
with auto.arima
method in R.
## Series: doge_data$log_Close
## ARIMA(1,1,4)
##
## Coefficients:
## ar1 ma1 ma2 ma3 ma4
## -0.5631 0.5798 -0.0134 0.0843 0.0654
## s.e. 0.3352 0.3341 0.0251 0.0250 0.0325
##
## sigma^2 = 0.005181: log likelihood = 2777.78
## AIC=-5543.55 AICc=-5543.52 BIC=-5509.14
## Series: btc_data_filtered$log_Close
## ARIMA(4,1,1)
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1
## 0.8461 0.0640 -0.0280 0.0093 -0.8733
## s.e. 0.3006 0.0287 0.0304 0.0307 0.3000
##
## sigma^2 = 0.001427: log likelihood = 4253.89
## AIC=-8495.78 AICc=-8495.74 BIC=-8461.36
auto.arima
chooses ARIMA(1,1,4) and ARIMA(4,1,1) for
DOGE and BTC, respectively. It is a good sign to see that
auto.arima
automatically suggests d=1 for both models which
aligns with our findings with linear and quadratic models. To pick the
best models for the two data, we apply LRT again. In other words, we
test that the performance of the models generated from the AIC tables
and auto.arima
are equal for null hypothesis or not equal
for the alternative hypothesis [9].
## [1] 1
## [1] 1
Both tests report p-values greater than 0.05, so we cannot reject the
null hypothesis. Although auto.arima
suggests two models
with less complexity and with P+Q=5, we will keep all the models for the
next diagnosis. More importantly, we have previously mentioned that
modeling the crypto data are doubtful since it is considered as a random
walk process. We here compare the auto.arima
-based models
with the first-order differencing data ARIMA(0,1,0) with respect to the
LRT p-values.
## [1] 0.1413314
## [1] 0.0001030596
## [1] 0.001949054
## [1] 3.092019e-07
A good news and a bad news for the auto.arima
-based
models–we cannot reject the null hypothesis for BTC while it is
significant for DOGE. That is, the ARIMA model could capture certain
patterns which cannot be modeled by a model of “yesterday’s return”. The
ARIMA model of BTC, however, is unfortunately outperformed by
ARIMA(0,1,0). The two AIC-table-based models are significantly better,
albeit more complex than the auto.arima`-based models, than the
ARIMA(0,1,0). Given the ambiguous results, We tend to explain that the
ARIMA models are required to be improved or might not be ideal for the
crypo data. Despite the significant LRT for the AIC-table-based models,
P+Q>5 implies hidden unsolvable issues. To further confirm the idea,
we will proceed more diagnosis.
We simply visualize the original the Log closing prices and the fitted prices generated from the two different methods.
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
The fitted values apparently recapitulate the behaviors of the original closing prices.
Based on the assumptions for time-series models, the residuals should behave like a white noise process which means nothing else left to be modeled. We then visualize the ACF of residuals. In spite of one or two significant spikes, the ACFs indeed drops from 0 to 1 lags and are similar to the patterns of ACFs of white noises. Besides, there are more significant spikes detected in DOGE.
If the roots fall outside the unit circle for the AR part, the model is stationary. Likewise, if the roots are outside the circle, it is invertible. However, our analysis suggests that the model is neither stationary and invertible. This also confirms that causality is not satisfied [10]. If the roots fall outside the unit circle for the AR part, the model is stationary. Likewise, if the roots are outside the circle, it is invertible. However, our analysis suggests that the model is neither stationary and invertible while the roots are almost on the unit circles. This also confirms that causality is not satisfied. To verify if the models are really not stable, we then perform Bootstrapping method to the ARIMA(5,1,5) model which has roots close to the unit circle [11].
## Loading required package: foreach
##
## Attaching package: 'foreach'
## The following objects are masked from 'package:purrr':
##
## accumulate, when
## Loading required package: iterators
## Loading required package: parallel
We can find a dominant peak in each plot for all the coefficients with the scale and breaks provided in the lecture slides [5]. The result suggests the stability of the ARIMA model.
Then, we applied the same method to investigate ARIMA(1,1,4)
generated from auto.arima
.
Concentrating peaks are still detectable in the ARIMA model that
suggests a stable model.
We successfully applied time-series models and analysis to study the
closing prices of cryptocurrencies, BTC and DOGE. Our research indicates
that a log-transformation can improve time series analysis while the
transformed data are still not stationary. We also recognize linear
trends in the time-series data, but no periodic behaviors are
identified. After de-trending the data, we select models that can
provide good fitted values and satisfy the assumptions of residuals.
While the model selected by auto.arima
for BTC does not
outperform the ARIMA(0,1,0), the significance of the rest of the models
do not support the idea of “A Random Walk Down Wall Street”. In other
words, a good combination of P and Q is possible to improve an ARIMA
model detecting and recapitulating the time-series of cryptocurrencies.
We indeed demonstrate the stability of the parameters, but the concerns
of the causality, invertibility, and stationarity still questions the
models. In addition, we anticipate that the greater fluctuating
behaviors of DOGE would lead to worse models and unpredictable outcomes
compared to BTC. However, except for the high frequencies of “ups and
downs”, both BTC and DOGE share similar characteristics such as trends,
ACFs, and models. In the end, while we tend to claim that the time
series of crypto market can be modeled by ARIMA, the results may vary
due to different time windows. Particularly, the forecasting and
predictive power of the models are not yet tested, which is considered
one of the most important reasons to study time-series modeling.