Crypto-Algos.com

Bitcoin Price Forecasting ARIMA vs NNAR

Bitcoin Price Forecasting in Python using ARIMA and Neural Networks.

This post analyzes forecasts of Bitcoin price using the autoregressive integrated moving average (ARIMA) and neural network autoregression (NNAR) models. Employing the static forecast approach, we will forecast a next-day Bitcoin price.

Bitcoin, the world’s first decentralized and currently biggest cryptocurrency, it was introduced in 2008, some argue that it has the same finite economic attributes as gold and labeled as digital gold. The price volatility of Bitcoin makes it one of the most speculative digital currency, however, Bitcoin is coming into the mainstream with large institutional investors
eyeing its potential. Despite its limitations, Bitcoin is the most valuable and popular cryptocurrency to
date (Corbet et al. 2019).
Bitcoin price has been extremely volatile since the inception of the cryptocurrency (Dwyer 2015).
Due to concerns with speculative trading, in January 2018, Facebook banned all ads for Bitcoin and
other cryptocurrencies (Robertson 2018). Additionally, experts foresee another financial crisis in the
near future caused by the cryptocurrency boom (Lam et al. 2018). A major crash of the Bitcoin price can
be triggered by a cyber hack and a government crackdown, and can take weeks or months to bounce
back (Roberts 2017). Typically, investors predict future Bitcoin price based on past trends. But it is
not easy to predict future Bitcoin price with a high level of accuracy. The price of Bitcoin follows a
J. Risk Financial Manag. 2019, 12, 103; doi:10.3390/jrfm12020103 www.mdpi.com/journal/jrfmJ. Risk Financial Manag. 2019, 12, 103 2 of 15
boom-bust pattern due to its speculative nature (Cheah and Fry 2015). Additionally, Bitcoin investors
are speculative and short term oriented (Salisu et al. 2019).
Due to the progressive price change and increase in the market cap, the popularity of investment
in Bitcoin has been increasing dramatically. Meanwhile, Urquhart (2016) found that Bitcoin is an
inefficient market. Caporale et al. (2018) also observe inefficiency in the cryptocurrency market. In the
same vein, due to high price volatility, speculators have a generic question of whether the price of
Bitcoin can be forecasted in advance.
Thus far, there have been limited attempts in the literature to forecast the price of Bitcoin.
Katsiampa (2017) scrutinised the in-sample goodness-of-fit of GARCH models for Bitcoin price but did
not perform out-sample forecast. Kristjanpoller and Minutolo (2018) proposed a hybrid-forecasting
model to predict Bitcoin price volatility, integrating artificial neural network (ANN), generalised
autoregressive conditional heteroscedasticity (GARCH) and principal components analysis (PCA).
They found that the accuracy of the hybrid model increases after incorporation of PCA pre-processing.
Similar to Katsiampa (2017), Kristjanpoller and Minutolo (2018) also only examined in-sample forecast
performance. Meanwhile, Aalborg et al. (2018) found that Bitcoin returns cannot be predicted
(R2 ≤ 0.01) using explanatory variables (i.e., google trend, trading volume, transaction volume, VIX
index, and unique user addresses).
Given that Bitcoin is becoming more popular, yet still volatile and not well explained, a need exists
to study methods to better understand its price fluctuations. Accurately forecasting the daily movements
can increase the returns of day traders and consequently make the market more efficient. The choice of
forecasting models can have a significant effect on performance (Chen et al. 2019). We contribute to the
Bitcoin forecasting literature by testing autoregressive integrated moving average (ARIMA) and neural
network autoregression (NNAR). ARIMA is one of the traditional forecasting methods, and NNAR is a
rather sophisticated and more modern approach to forecasting (Hyndman and Athanasopoulos 2018).
To ensure validation and implementation, as suggested by Adya and Collopy (1998), we use ex-ante
Bitcoin forecast performance using a relatively large sample, and multiple training and testing samples
to demonstrate the stability of the forecast results. By following this procedure, we differentiate the
results of this study with the ones mentioned earlier.
In the next section, we discuss existing literature on Bitcoin price modelling. Section 3 presents
daily Bitcoin price data used in this study. Section 4 presents the adopted forecast methodologies and
performance measures. The analysis and findings are presented in Section 5, and final remarks are
made in Section 6.

Literature Review
Bitcoin is the most popular among the cryptocurrencies (Kyriazis 2019). Recent fluctuations
in Bitcoin price has captured the attention of academic researchers (Beneki et al. 2019). Given the
nescancy of this research stream, previous studies on Bitcoin and other digital currencies (for instance,
Ethereum, Litecoin, Ripple) mainly explain the concepts, principles and economics of cryptocurrencies
(Segendorf 2014; Dwyer 2015; Becker et al. 2011). Among the authors, Dwyer (2015) addressed the
principles of Bitcoin and other relevant digital currencies. The author explains the supply and demand
of digital currencies, equilibria of Bitcoin, uses of Bitcoin in exchange for goods and services with a
rivalry to other currencies (Dwyer 2015). Likewise, Brière et al. (2015) investigated the connection of
Bitcoin with other cryptocurrencies.
The total market cap of Bitcoin is approximately USD237 billion (as of 30 March 2018), which
is nearly 42.69% of the entire cryptocurrency market capitalizations (coinmarketcap.com). As such,
some studies consider the price dynamics of Bitcoin (Brandvold et al. 2015; Ciaian et al. 2016).
Brandvold et al. (2015) investigated the price discovery of Bitcoin exchanges and find that two
exchanges—Mt.Gox and BTC-e, are leading the market with the maximum information share. Besides,
Ciaian et al. (2016) studied the underlying economics of Bitcoin price by taking into account the
traditional determinants of the currency price. Moreover, Shubik (2014) and Rogojanu and Badea (2014)J. Risk Financial Manag. 2019, 12, 103 3 of 15
studied Bitcoin in the setting of alternative monetary systems by considering the challenges of the
economic environment. Meanwhile, Bouoiyour and Selmi (2014); Bouoiyour et al. (2014) and Yermack
(2013) described Bitcoin as a speculative investment or speculative bubble. Similarly, according to
Yermack (2013), Bitcoin behaves more like a speculative investment rather than currency. It fails to
satisfy the features of currency as a medium of exchange, a store of value, and a unit of account.
In the same vein, Molnár et al. (2015) studied the exchange rate risk of Bitcoin by comparing with
other variables, for instance, gold and Euro and find that Bitcoin is more volatile and riskier than
gold and Euro, which restrict the applicability of Bitcoin as a medium of transaction. Furthermore,
Bouri et al. (2017) investigated the Bitcoin price and its volatility and found persistence in the Bitcoin
price and volatility.
As Bitcoin price volatility is exceptionally high, speculators have a general quest whether future
Bitcoin price can be forecasted. Bitcoin price or return forecasting is getting more attention due to its
boom-bust nature. Speculators are looking for tools and techniques that can forecast Bitcoin price
with higher accuracy, at least better than the naïve forecast to set their investment portfolios in a
profit margin. The majority of the studies on Bitcoin either focus on price returns and volatility or
consider Bitcoin as a speculative investment or bubble (Bouoiyour and Selmi 2014; Bouoiyour et al.
2014; Yermack 2013). Some studies consider risk, hedge and safe haven attributes of Bitcoin and
Ethereum (Beneki et al. 2019, Bouri et al. 2017). However, to the best of the authors’ knowledge, there
are no studies on the forecasting of test-sample (out-sample) Bitcoin price (Corbet et al. 2019). Thus,
this study presents a novel approach to forecasting daily Bitcoin price using both with and without
model re-estimation at each step while comparing ARIMA and NNAR models.

Data
Daily Bitcoin exchange rate data (USD per Bitcoin) is collected from the Quandl1 database. Data
from the same source has been used by others, too (Chu et al. 2015). We use daily Bitcoin price
data from 1 January 2012 to 4 October 20182, daily Bitcoin price of 2466 days. Figure 1 presents the
(a) original time series along with (b) log-transformed and (c) first differenced log operator series.
For the effectiveness of forecast validation (Adya and Collopy 1998), we divide the dataset into a
training-sample (in-sample) and test-sample (out-sample). We consider two training-samples and
subsequently two-test samples for cross-validation purposes. The first training sample is from 1
January 2012 until 14 May 2013 (500 days), and the second from 1 January 2012 until 25 June 2017 (2000
days). As a consequence, the first test-sample is from 15 May 2013 to 4 October 2018 (1966 days), and
the second from 16 June 2017 to 4 October 2018 (466 days).
At the end of 2014, the price of Bitcoin dropped significantly to USD 302 (www.coindesk.com).
The cause of the price decline was the suspension of trading of Bitcoin by Mt. Gox, one of the leading
Bitcoin exchanges, which handled 70% of the Bitcoin exchange worldwide at that time. They reported
that around 850,000 Bitcoins were hacked, which belongs to customers and are worth around USD 3.5
billion (Roberts 2017). The incident resulted in a lack of confidence in the security system of Bitcoin;
thus, the price decline continued until 2016. At the beginning of 2017, the Bitcoin price increased
dramatically, and at the end of 2017 the price of Bitcoin surged at USD 19,661.63, but again after five
days from 17 December 2017 it dropped to USD 12,616.64 (www.coindesk.com).
1 www.quandl.com/data/BCHARTS/BITSTAMPUSD-Bitcoin-Markets-bitstampUSD.
2 Bitcoin price data for three days, that is, 6–8 January 2015 was not available.J. Risk Financial Manag. 2019, 12, 103 4 of 15
the effectiveness of forecast validation (Adya and Collopy 1998), we divide the dataset into a
training-sample (in-sample) and test-sample (out-sample). We consider two training-samples and
subsequently two-test samples for cross-validation purposes. The first training sample is from 1
January 2012 until 14 May 2013 (500 days), and the second from 1 January 2012 until 25 June 2017
(2000 days). As a consequence, the first test-sample is from 15 May 2013 to 4 October 2018 (1966
days), and the second from 16 June 2017 to 4 October 2018 (466 days).
(a) Original daily time series of Bitcoin price in USD
1 www.quandl.com/data/BCHARTS/BITSTAMPUSD-Bitcoin-Markets-bitstampUSD.
2 Bitcoin price data for three days, that is, 6–8 January 2015 was not available.
0
5000
10000
15000
20000
25000
2012/1/1 2013/1/1 2014/1/1 2015/1/1 2016/1/1 2017/1/1 2018/1/1
Actual Bitcoin price
J. Risk Financial Manag. 2019, 12, x FOR PEER REVIEW 4 of 15
(b) Log transformed daily Bitcoin price in USD
(c) First difference log operator of daily Bitcoin price in USD
Figure 1. (a) Original, (b) log-transformed and (c) first difference log operator bitcoin price in USD.
At the end of 2014, the price of Bitcoin dropped significantly to USD 302 (www.coindesk.com).
The cause of the price decline was the suspension of trading of Bitcoin by Mt. Gox, one of the leading
Bitcoin exchanges, which handled 70% of the Bitcoin exchange worldwide at that time. They
reported that around 850,000 Bitcoins were hacked, which belongs to customers and are worth
around USD 3.5 billion (Roberts 2017). The incident resulted in a lack of confidence in the security
system of Bitcoin; thus, the price decline continued until 2016. At the beginning of 2017, the Bitcoin
price increased dramatically, and at the end of 2017 the price of Bitcoin surged at USD 19,661.63, but
again after five days from 17 December 2017 it dropped to USD 12,616.64 (www.coindesk.com).
Stationarity of data is a prerequisite for predictive modelling, particularly when using
autoregressive time series models such as ARIMA. Table 1 shows results of the stationary test of the

1st difference log operator
Figure 1. (a) Original, (b) log-transformed and (c) first difference log operator bitcoin price in USD.
Stationarity of data is a prerequisite for predictive modelling, particularly when using
autoregressive time series models such as ARIMA. Table 1 shows results of the stationary test
of the training-data samples using the Augmented Dicky-Fuller test (ADF) (Dickey and Fuller 1979)
and Phillips-Perron test (PP) (Phillips and Perron 1988). Data, both in levels and log-transformed series,
are not stationary but become stationary at first difference log operator; thus, the ARIMA modellingJ. Risk Financial Manag. 2019, 12, 103 5 of 15
approach is feasible. It might be noted that the stationarity of data is not essential for neural network
models (Hyndman and Athanasopoulos 2018).

Data Training Sample ADF Test PP Test
First in-sample window (500 days)
Original data 01/01/2012~14/05/2013 −1.725 (0.695) −10.597 (0.519)
Log transformed data 01/01/2012~14/05/2013 −1.478 (0.800) −3.333 (0.919)
1st difference log operator 01/01/2012~14/05/2013 −9.593 (0.01) −338.72 (0.01)
Second in-sample window (2000 days)
Original data 01/01/2012~25/06/2017 1.021 (0.99) 6.109 (0.99)
Log transformed data 01/01/2012~25/06/2017 −1.327 (0.86) −3.321 (0.92)
1st difference log operator 01/01/2012~25/06/2017 −11.19 (0.01) −1513.50 (0.01)
ADF. Augmented Dicky-Fuller test; PP. Phillips-Perron test. p-values in parenthesis, p-value less than 0.05
confirms stationarity.

Methodology
4.1. Forecast Methods
Association of Bitcoin prices with other micro and macro-economic indicators, such as oil price
and gold price, are still not clear (Aalborg et al. 2018). Thus, the univariate modelling approach, where
data speaks for itself (Gujarati and Porter 2003), becomes an appropriate forecasting tool. Additionally,
a positive association between past and future values of Bitcoin price is evident in the literature
(Caporale et al. 2018). However, the degree of association varies over time (Caporale et al. 2018); thus,
re-estimating the forecast model every time for the one-step forecast with each additional daily Bitcoin
price becomes relevant. Additionally, this signifies the essence for investigating non-linear approach.
Thus, we employ two univariate time series models—ARIMA and NNAR. Application of ARIMA
can be found in many fields of studies such as in finance (Ariyo et al. 2014), shipping (Munim and
Schramm 2017), logistics (Miller 2018), and electric power (Contreras et al. 2003). Meanwhile, NNAR
models are also used to forecast global solar radiation (Benmouiza and Cheknane 2013), river flow
(Abrahart and See 2000), tourism demand (Álvarez-Díaz et al. 2018). For both ARIMA and NNAR
models, we scrutinize forecasting next-day Bitcoin price with and without re-estimating the forecast
model for each step. For the computational purpose, we used the Forecast package (Hyndman and
Khandakar 2007) in the R software.
4.1.1. ARIMA
ARIMA is probably the most popular method when it comes to time series forecasting, initially
developed by Box and Jenkins (1976). Typically, an ARIMA model has two components: an
autoregressive (AR) component and a moving average (MA) component. The AR component
models association between the value of a variable at a specified time with its value in previous
time(s), and the MA component models association between values of error term of a variable at a
specified time with its error term value in previous time(s). The integrated (I) component comes into
consideration when the time series becomes stationary after the first (or second) difference.
Here, Dzt = zt − zt−1; zt is the Bitcoin price in USD at time t, zt−i is the Bitcoin price in USD of all
previous periods until lag p, ?i is the parameter for zt−i, “t is the error term in time t, “t−i is the error
term of all previous periods until lag q and θi is the parameter for “t−i.J. Risk Financial Manag. 2019, 12, 103 6 of 15
4.1.2. Neural Network Autoregression (NNAR)
Artificial neural network (ANN) methods rely on mathematical models in a similar pattern
as ‘neurons’ in the brain. ANN models help design complex non-linear associations between the
dependent variable and its predictors (Adya and Collopy 1998; Hyndman and Athanasopoulos 2018).
The simplest ANN models would only have predictors (independent variables or inputs) in the bottom
layer and the dependent variable (output) in the top layer, which would be equivalent to a linear
regression model. After adding the hidden layer(s) in-between bottom and top layers, the ANN
structure becomes non-linear. A sample ANN model is depicted in Figure 2. This type of ANN is
called multi-layered feed-forward network, where each layer of neurons (nodes) receive inputs from
the previous layer. The inputs to each node are estimated using a weighted linear combination, as in
Equation (2):
zj = βj +
nXi=1
Wi,jXi (2)
is called multi-layered feed-forward network, where each layer of neurons (nodes) receive inputs
from the previous layer. The inputs to each node are estimated using a weighted linear combination,
as in Equation (2):

Here, 𝑧௝ is the value of output node 𝑗, 𝛽௝ is the constant for node 𝑗, 𝑊௜,௝ is the weight from
the input node 𝑖 to output node 𝑗, 𝑋௜ represents the inputs, and 𝑛 is number of input variables. In
the hidden layer, Equation (2) is transformed into non-linear function using sigmoid, as shown in
Equation (3).
𝑠(𝑧) =
1
1 + 𝑒ି௭
(3)
The parameters 𝛽ଵ, 𝛽ଶ, 𝛽ଷ, … , 𝛽௡ and 𝑊ଵ,ଵ, … , 𝑊ସ,ଷ are “learned” from the training data. To
prevent the weights from becoming too large, usually, the values of the weights are restricted. Decay
parameter—the parameter that restricts the weights is typically set to be equal to 0.1 (Hyndman and
Athanasopoulos 2018). With time series data such as daily Bitcoin price, lagged values of the time
series can be used as inputs in an ANN structure, which is known as neural network autoregression
(NNAR). A non-seasonal feed-forward network model with one hidden layer is usually denoted as
NNAR (p,k), where p represents the number of lags and k represents the number of nodes in the
hidden layer.
Figure 2. An ANN model with four inputs and one hidden layer with three neurons.
4.2. Forecast Accuracy Measures
Forecasting models are evaluated based on their accuracy of the forecast. Typical forecast
accuracy measures such as RMSE (root mean square error) and MAPE (mean absolute percent error)
are criticised for their instability with varying number of test-sample forecast periods. Thus, we
adopt three indices to measure the accuracy of forecast results: RMSE, MAPE, and MASE (mean
absolute scaled error). MASE was proposed by Hyndman and Koehler (2006) as a remedy to
overcome the drawbacks of RMSE and MAPE when dealing with a varying number of test-sample
periods. The three adopted accuracy measures can be expressed as follows:
Figure 2. An ANN model with four inputs and one hidden layer with three neurons.
Here, zj is the value of output node j, βj is the constant for node j, Wi,j is the weight from the input
node i to output node j, Xi represents the inputs, and n is number of input variables. In the hidden
layer, Equation (2) is transformed into non-linear function using sigmoid, as shown in Equation (3).
s(z) = 1
1 + e−z (3)
The parameters β1, β2, β3, : : : , βn and W1,1, : : : , W4,3 are “learned” from the training data. To
prevent the weights from becoming too large, usually, the values of the weights are restricted. Decay
parameter—the parameter that restricts the weights is typically set to be equal to 0.1 (Hyndman and
Athanasopoulos 2018). With time series data such as daily Bitcoin price, lagged values of the time series
can be used as inputs in an ANN structure, which is known as neural network autoregression (NNAR).
A non-seasonal feed-forward network model with one hidden layer is usually denoted as NNAR (p,k),
where p represents the number of lags and k represents the number of nodes in the hidden layer.
4.2. Forecast Accuracy Measures
Forecasting models are evaluated based on their accuracy of the forecast. Typical forecast accuracy
measures such as RMSE (root mean square error) and MAPE (mean absolute percent error) are criticised
for their instability with varying number of test-sample forecast periods. Thus, we adopt three indices
to measure the accuracy of forecast results: RMSE, MAPE, and MASE (mean absolute scaled error).
MASE was proposed by Hyndman and Koehler (2006) as a remedy to overcome the drawbacks ofJ. Risk Financial Manag. 2019, 12, 103 7 of 15
RMSE and MAPE when dealing with a varying number of test-sample periods. The three adopted
accuracy measures can be expressed as follows:

Here, et is the forecast error calculated as (dt − zt), dt is the actual Bitcoin price at time t, zt is the
forecasted price at time t, n is the total number of observations and zt − zt−1 is the forecast error of the
naïve forecast.

Empirical Results
First, the appropriate ARIMA and NNAR models are to be selected to forecast next-day Bitcoin
price for the test-sample. ARIMA models are chosen based on the lowest AIC, while considering the
PP test for stationarity using the auto.arima function provided by the Forecast package in R. However,
it is challenging to select the appropriate NNAR model. For the first training-sample period (500
days), 14 different NNAR(p,k) specifications are estimated and evaluated for the forecast (without
re-estimation) performance of the first test-sample period (1966 days). The results are presented
in Figure A1 in Appendix A. Interestingly, training-sample forecast performance gets better with
increasing the numbers of lags and hidden layers (see Figure A1a) but NNAR (2,1) performs best for
test-sample forecast (see Figure A1b). Therefore, NNAR (2,1) is selected for the estimation of the first
training and test samples. The same 14 models are estimated and compared for the second training and
test samples (see Figure A2), and NNAR (1,2) is selected based on test-sample forecast performance. In
the employed NNAR framework, it is noteworthy that test-sample forecast performance is always
better with a lower number of lags and nodes in contrast to the training-sample forecast performance.
For next-day Bitcoin price forecast without re-estimation of the model for next step, the two
selected models for first training and test samples are ARIMA (4,1,0) and NNAR (2,1), and for the
second training and test samples are ARIMA (4,1,1) and NNAR (1,2). We adopt the static forecast
approach, as depicted in Figure 3. When using an autoregressive model in the static forecast approach,
the actual value of the dependent variable in previous periods is used to estimate each step forecast for
the training sample. On the contrary, when forecasting multiple periods, dynamic forecast approach
uses the previously forecasted value (out-sample period) of the dependent variable to compute a
forecast. In Table 2, first, we present the training-sample forecast performance of ARIMA and NNAR
models by means of RMSE, MAPE and MASE. Then, in Table 3, we present the test-sample forecast
performance of the employed models. According to Table 2, NNAR models perform better than ARIMA
in the first training-sample period, but ARIMA is better in the second training-sample. According to
Table 3, for both cases, without and with re-estimation of forecast models for next-day Bitcoin price
forecasting, ARIMA models outperform NNAR in the test-sample forecast. Log-transformed Bitcoin
price series and its forecasted values using ARIMA and NNAR under different estimation approaches
are presented in
Forecast Model Training Sample RMSE MAPE MASE
First training-sample window (500 days)
ARIMA (4,1,0) 01/01/2012~14/05/2013 0.053 1.225 0.987
NNAR (2,1) 01/01/2012~14/05/2013 0.055 1.172 0.963
Second training-sample window (2000 days)
ARIMA (4,1,1) 01/01/2012~25/06/2017 0.040 0.565 0.970
NNAR (1,2) 01/01/2012~25/06/2017 0.042 0.567 0.983
Bold numbers indicate best performance.
Table 3. Test-sample static forecast performance.
Forecast Model Test Sample RMSE MAPE MASE
First test-sample window (1966 days)
Forecast without re-estimation at each step
ARIMA (4,1,0) 15/05/2013~04/10/2018 0.038 0.379 0.969
NNAR (2,1) 15/05/2013~04/10/2018 0.924 8.289 24.990
Forecast with re-estimation at each step
ARIMA 15/05/2013~04/10/2018 0.037 0.365 0.935
NNAR 15/05/2013~04/10/2018 0.048 0.453 1.186
Second test-sample window (466 days)
Forecast without re-estimation at each step
ARIMA (4,1,1) 26/06/2017~04/10/2018 0.042 0.354 1.207
NNAR (1,2) 26/06/2017~04/10/2018 0.093 0.837 2.911
Forecast with re-estimation at each step
ARIMA 26/06/2017~04/10/2018 0.042 0.354 1.209
NNAR 26/06/2017~04/10/2018 0.069 0.537 1.828
Bold numbers indicate best performance.
Figure 3. Illustration of static forecast approach.
Table 2. Training-sample forecast performance.
Forecast Model Training Sample RMSE MAPE MASE
First training-sample window (500 days)
ARIMA (4,1,0) 01/01/2012~14/05/2013 0.053 1.225 0.987
NNAR (2,1) 01/01/2012~14/05/2013 0.055 1.172 0.963
Second training-sample window (2000 days)
ARIMA (4,1,1) 01/01/2012~25/06/2017 0.040 0.565 0.970
NNAR (1,2) 01/01/2012~25/06/2017 0.042 0.567 0.983
Bold numbers indicate best performance.
Table 3. Test-sample static forecast performance.
Forecast Model Test Sample RMSE MAPE MASE
First test-sample window (1966 days)
Forecast without re-estimation at each step
ARIMA (4,1,0) 15/05/2013~04/10/2018 0.038 0.379 0.969
NNAR (2,1) 15/05/2013~04/10/2018 0.924 8.289 24.990
Forecast with re-estimation at each step
ARIMA 15/05/2013~04/10/2018 0.037 0.365 0.935
NNAR 15/05/2013~04/10/2018 0.048 0.453 1.186
Second test-sample window (466 days)
Forecast without re-estimation at each step
ARIMA (4,1,1) 26/06/2017~04/10/2018 0.042 0.354 1.207
NNAR (1,2) 26/06/2017~04/10/2018 0.093 0.837 2.911
Forecast with re-estimation at each step
ARIMA 26/06/2017~04/10/2018 0.042 0.354 1.209
NNAR 26/06/2017~04/10/2018 0.069 0.537 1.828
Bold numbers indicate best performance.
To confirm the validity of forecast models, diagnostic checks are conducted. p-values of the
Box-Ljung (BL) test (Ljung and Box 1978) suggest that residuals of all employed models are free from
autocorrelation (p-values > 0.05 considering eight lags). The BL test result of squared residuals of
ARIMA models indicates the presence of conditional heteroscedasticity (p-values < 0.05); thus, future research on Bitcoin price forecast should consider nested ARIMA models combining ARCH and GARCH. The Jarque-Bera test (Jarque and Bera 1980) results suggest that residuals are not normally distributed (p-values < 0.05). Normality of residuals should not be an issue for the NNAR model as the error series in such models are assumed to be homoscedastic (and normally distributed) when training the model based on the training-sample (Hyndman and Athanasopoulos 2018).J. Risk Financial Manag. 2019, 12, 103 9 of 15 J. Risk Financial Manag. 2019, 12, x FOR PEER REVIEW 9 of 15 (a) Actual and forecasted Bitcoin price (training sample:500 days, test-sample:1966 days) (b) Concentrated view on the forecast period (test-sample:1966 days) (c) Actual and forecasted Bitcoin price (training sample:2000 days, test-sample:466 days) 9 8 7 6 5 4 3 2 1 0 10 2012/1/1 2013/1/1 2014/1/1 2015/1/1 2016/1/1 2017/1/1 2018/1/1 Log transformed Bitcoin Price ARIMA (re-estimation) NNAR (re-estimation) ARIMA (without re-estimation) NNAR(without re-estimation) 9 8 7 6 5 4 10 2013/5/15 2013/7/15 2013/9/15 2013/11/15 2014/1/15 2014/3/15 2014/5/15 2014/7/15 2014/9/15 2014/11/15 2015/1/15 2015/3/15 2015/5/15 2015/7/15 2015/9/15 2015/11/15 2016/1/15 2016/3/15 2016/5/15 2016/7/15 2016/9/15 2016/11/15 2017/1/15 2017/3/15 2017/5/15 2017/7/15 2017/9/15 2017/11/15 2018/1/15 2018/3/15 2018/5/15 2018/7/15 2018/9/15 Log transformed Bitcoin Price ARIMA (re-estimation) NNAR (re-estimation) ARIMA (without re-estimation) NNAR(without re-estimation) 9 8 7 6 5 4 3 2 1 0 10 2012/1/1 2013/1/1 2014/1/1 2015/1/1 2016/1/1 2017/1/1 2018/1/1 Log transformed Bitcoin Price ARIMA (re-estimation) NNAR (re-estimation) ARIMA (without re-estimation) NNAR(without re-estimation) Figure 4. Cont.J. Risk Financial Manag. 2019, 12, 103 10 of 15 J. Bitcoin price forecast. (a,b) refer to the first training and test samples forecast in comprehensive and concentrated view, respectively, and (c,d) refer to the second training and test samples forecast in comprehensive and concentrated view, respectively. To confirm the validity of forecast models, diagnostic checks are conducted. p-values of the Box-Ljung (BL) test (Ljung and Box 1978) suggest that residuals of all employed models are free from autocorrelation (p-values > 0.05 considering eight lags). The BL test result of squared residuals of
ARIMA models indicates the presence of conditional heteroscedasticity (p-values < 0.05); thus,
future research on Bitcoin price forecast should consider nested ARIMA models combining ARCH
and GARCH. The Jarque-Bera test (Jarque and Bera 1980) results suggest that residuals are not
normally distributed (p-values < 0.05). Normality of residuals should not be an issue for the NNAR
model as the error series in such models are assumed to be homoscedastic (and normally
distributed) when training the model based on the training-sample (Hyndman and Athanasopoulos
2018).
Further, we perform the Diebold Mariano (DM) test (Diebold and Mariano 1995) to compare
test-sample forecast results obtained from the two models used, ARIMA and NNAR. DM test results
are presented in Table 4. In this case, the alternative hypothesis is that the forecast results of the
second method are less accurate than the first method. Thus, a p-value of less than 0.05 indicates
better accuracy of the first method. Result of the DM test is similar to as revealed in Table 3—the
ARIMA model is more accurate than NNAR in forecasting the test-sample Bitcoin price. It is
noteworthy that, forecast of ARIMA models, with or without model re-estimation in each step, are
identical. Meanwhile, the NNAR model with re-estimation in each step performs considerably better
than the without re-estimation approach.
Table 4. DM test of forecast results.
Models Compared DM Statistics p-Value
First test-sample window (1966 days)
ARIMA vs. NNAR (re-estimation) −7.054 1.20 × 10−12
ARIMA vs. NNAR (without re-estimation) −27.061 2.20 × 10−16
ARIMA (re-estimation) vs. ARIMA (without re-estimation) −4.502 3.56 × 10−6
NNAR (re-estimation) vs. NNAR (without re-estimation) −27.052 2.20 × 10−16
Second test-sample window (466 days)
ARIMA vs. NNAR (re-estimation) −6.006 1.92 × 10−9
ARIMA vs. NNAR (without re-estimation) −12.348 2.20 × 10−16
ARIMA (re-estimation) vs. ARIMA (without re-estimation) −0.785 0.217
NNAR (re-estimation) vs. NNAR (without re-estimation) −6.172 7.34 × 10−10
p < 0.05 indicates that forecast results of the first method is better than the second method.

Log transformed Bitcoin Price ARIMA (re-estimation) NNAR (re-estimation)
ARIMA (without re-estimation) NNAR(without re-estimation)
Figure 4. Bitcoin price forecast. (a,b) refer to the first training and test samples forecast in comprehensive
and concentrated view, respectively, and (c,d) refer to the second training and test samples forecast in
comprehensive and concentrated view, respectively.
Further, we perform the Diebold Mariano (DM) test (Diebold and Mariano 1995) to compare
test-sample forecast results obtained from the two models used, ARIMA and NNAR. DM test results
are presented in Table 4. In this case, the alternative hypothesis is that the forecast results of the
second method are less accurate than the first method. Thus, a p-value of less than 0.05 indicates better
accuracy of the first method. Result of the DM test is similar to as revealed in Table 3—the ARIMA
model is more accurate than NNAR in forecasting the test-sample Bitcoin price. It is noteworthy that,
forecast of ARIMA models, with or without model re-estimation in each step, are identical. Meanwhile,
the NNAR model with re-estimation in each step performs considerably better than the without
re-estimation approach.
Table 4. DM test of forecast results.
Models Compared DM Statistics p-Value
First test-sample window (1966 days)
ARIMA vs. NNAR (re-estimation) −7.054 1.20 × 10−12
ARIMA vs. NNAR (without re-estimation) −27.061 2.20 × 10−16
ARIMA (re-estimation) vs. ARIMA (without re-estimation) −4.502 3.56 × 10−6
NNAR (re-estimation) vs. NNAR (without re-estimation) −27.052 2.20 × 10−16
Second test-sample window (466 days)
ARIMA vs. NNAR (re-estimation) −6.006 1.92 × 10−9
ARIMA vs. NNAR (without re-estimation) −12.348 2.20 × 10−16
ARIMA (re-estimation) vs. ARIMA (without re-estimation) −0.785 0.217
NNAR (re-estimation) vs. NNAR (without re-estimation) −6.172 7.34 × 10−10
p < 0.05 indicates that forecast results of the first method is better than the second method.J. Risk Financial Manag. 2019, 12, 103 11 of 15Discussion and Conclusions
This study forecasts the next-day Bitcoin price using two univariate models—ARIMA and NNAR.
Based on the employed forecast accuracy measures (RMSE, MAPE and MASE), while NNAR models
perform better than ARIMA in the first training-sample (500 days) Bitcoin price forecasts, ARIMA
models outperform NNAR models in both the test-samples. In line with this, from Figure 4, one
could argue than NNAR models perform better than ARIMA (see Table 2) in times of less volatility,
but not during extremely volatile test-sample periods of Bitcoin price, particularly in the year 2018.
Furthermore, the DM test suggests the same, that is, ARIMA forecast results are more accurate than
the NNAR forecasts in the test-sample forecasts.
Meanwhile, existing studies offer interesting insights. In a review of neural network models in
forecasting, Adya and Collopy (1998) find that neural networks are not necessarily the best modelling
approach for all types of data. Abrahart and See (2000) and Álvarez-Díaz et al. (2018) find that ARIMA
and NNAR perform similarly. On the other hand, similar to this study, Alon et al. (2001) and Munim
and Schramm (2018) also find that neural networks outperform ARIMA in some training-sample,
but the opposite holds for test-sample. The reason for better accuracy of ARIMA models could be
that we employ the feed-forward NNAR model, which is found to be inferior by Ho et al. (2002) as
well when comparing with ARIMA and recurrent neural network (RNN) models. Thus, future study
should attempt the RNN approach to Bitcoin price forecast. Furthermore, according to the DM test
results, the forecast of ARIMA models are similar for with or without model re-estimation in each step.
However, the NNAR model with re-estimation in each step performs better than without re-estimation.
Thus, this unique approach of model re-estimation at each step can be adopted in inter-day forecasts,
such as in next-hour and next-minute Bitcoin price (also stock price) forecasts. However, the model
re-estimation approach to forecast next-day price increases computational duration slightly. To this
end, with the growing market-cap of cryptocurrencies and extreme volatility of cryptocurrency prices,
further attention should be paid to modelling their returns.performance, respectively.J.
Examined NNAR model performance. (a,b) The second training and test-sample forecast
performance, respectively.