Exchange Rate Predictability and Dynamic Bayesian Learning

This paper considers how an investor in the foreign exchange market can exploit predictive information by means of flexible Bayesian inference. Using a variety of different vector autoregressive models, the investor is able, each period, to revise past predictive mistakes and learn about important data features. The proposed methodology is developed in order to synthesize a wide array of established approaches for modelling exchange rate dynamics. In a thorough investigation of monthly exchange rate predictability for ten countries, we find that an investor using the proposed flexible methodology for dynamic asset allocation achieves significant economic gains out of sample relative to benchmark strategies. In particular, we find strong evidence for sparsity, fast model switching and exploiting the exchange rate cross-section.


Introduction
Understanding and predicting the evolution of exchange rates has long been a key component of the research agenda in international economics and finance. Yet, the early finding by Meese and Rogoff (1983) that structural models cannot offer predictability superior to that of a random walk has not been convincingly overturned. The voluminous existing literature on exchange rate forecasting, surveyed in Rossi (2013), adopts many different econometric methods. Broadly speaking, these differences fall in the following categories. First, they differ in whether they are multivariate (e.g. building a Vector Autoregressive, VAR, model involving a cross-section of exchange rates for many countries) or univariate. Second, they differ in which predictors they use. Third, they differ in how they treat the fact that there may be many potential predictors, most of which are unimportant. Fourth, they differ in whether they allow for dynamic model change (i.e. whether the best forecasting model can involve different predictors at different points in time) or not. Fifth, they differ in whether they allow for parameter change (both in VAR or regression coefficients and in volatilities) or not.
We develop an econometric approach that allows for a general treatment of each of these five categories. That is, its most flexible specification is a high-dimensional multivariate time series model involving the full cross-section of exchange rates, several exogenous predictors and time-variation in coefficients and volatilities. But our algorithm allows for decisions relating to these categories to be made in a data-based fashion using dynamic model selection methods. That is, the estimation procedure automatically decides whether to set a coefficient on a predictor or a VAR lag to be zero (or not). Most importantly, it does so in a dynamic manner, allowing for different forecasting models to be used at different points in time. Thus, decisions about specification choices (i.e. different predictors, different VARs, different degrees of model switching) are all made automatically in a time-varying fashion.
Our econometric approach is related to papers such as, among others, Koop and Korobilis (2013) and Giannone, Lenza, and Primiceri (2015), which provide strategies for handling prior elicitation and dynamic uncertainty in VAR models. We improve on and extend them in important directions of relevance for our empirical application. These include in particular rich shrinkage patterns provided by a Minnesota-type prior and flexible treatment of exogenous regressors.
Our framework enables us to assess the relative contributions of different modelling aspects in an exchange rate forecasting exercise involving 10 countries and exogenous predictors. We take the view of a Bayesian investor with a broad perspective, accommodating many features inspired by the exchange rate literature. Hence, our focus is on the economic evaluation of the density forecasts generated by our approach. That said, given the wealth of empirical results provided as a byproduct, we explore them to relate our findings to previously documented characteristics and phenomena of exchange rate behaviour.
To preview our empirical results, we do find that model switching has a big role to play. At most points in time only one or a few predictors are relevant for forecasting. In general, the best economic results are achieved when both VAR lags and fundamentals are considered in the candidate models though we find that VAR lags and fundamentals act as substitutes to a certain extent in this regard. But there are also several periods where a simple multivariate random walk with stochastic volatility is the best forecasting model.
We find an investor using our algorithm would experience substantial economic gains out of sample relative to the random walk model with time-varying volatility. A risk-averse mean-variance investor is willing to pay an annualized fee of several hundred basis points (after transaction costs) for switching from the dynamic portfolio strategy implied by the random walk with constant volatility model to the dynamic asset allocation implied by our VAR-based approach. Similarly, we find that the annualized Sharpe ratio after transaction costs increases substantially from adopting our approach.
The remainder of the paper is organized as follows. Section 2 relates our modelling strategy to the literature. Section 3 discusses the data while Section 4 lays out our econometric methods. Section 5 presents and discusses our empirical results and Section 6 concludes. We present technical details of our econometric methods along with many empirical results and further details regarding the underlying data in an online appendix.

Relation to the literature
We provide a selective review of the literature, with a focus on the five econometric modelling issues described in the Introduction. 1 A large part of the existing literature relies 1 A thorough review of the voluminous literature on exchange rate predictability can be found in Rossi (2013). on macroeconomic fundamentals to forecast exchange rates with little success, which is commonly referred to as the exchange rate disconnect puzzle.
The scapegoat approach of Bacchetta and van Wincoop (2004) attributes this failure to the fact that market participants attach excessive weight to observable fundamentals that deviate from their long-run trend. As a result, agents quickly switch between models over time and different fundamentals may be relevant only for short periods. This explanation translates into an econometric model that should allow for the optimal forecasting model to change over time. 2 Next is the issue of whether parameter change and other nonlinearities are beneficial for forecasting. Overall, the evidence is not strong (Rossi, 2013), although several studies find some benefits from allowing for time-variation in parameters; see, e.g., Rossi (2006) or Byrne, Korobilis, and Ribeiro (2016). Our approach accommodates both constant and time-varying parameters.
The question of whether there are benefits in working with a multivariate time series model such as a VAR involving a cross-section of exchange rates is also debated. Such an approach has the advantage that it exploits information in the co-movements and common dynamics in exchange rates. There is some evidence that doing so can improve exchange rate forecasts. Carriero, Kapetanios, and Marcellino (2009) work with a large Bayesian VAR involving a cross-section of exchange rates and find forecast improvements from considering dynamic comovements of exchange rates. Abbate and Marcellino (2018) extend Carriero, Kapetanios, and Marcellino (2009) by allowing for, among other things, time-varying coefficients and volatilities and find the latter to be particularly useful in improving forecast performance. These considerations suggest that working with VARs with time-varying volatilities is potentially important and our modelling approach does so.
Another issue which arises when we have many potential predictors is the need for some method for ensuring parsimony so as to avoid overfitting and poor out-of-sample results.
Indeed, even in univariate models, papers such as  find parameter estimation error to be substantial and, hence, they use no predictors when building a diversified FX portfolio. Instead they focus solely on exploiting volatility timing. 2 This dynamic model switching aspect has also been found to be of crucial importance in the empirical exchange rate literature. Sarno and Valente (2009) discuss how the fact that there is evidence of a weak link between in-sample fit and out-of-sample predictability complicates the choice of selecting an appropriate model even if fundamentals contain valuable information about the path of the exchange rate.
However, several recent papers have used data reduction methods, priors or model averaging methods to minimize overfitting concerns. Other techniques have been successfully used, including elastic net shrinkage (Li, Tsiakas, and Wang, 2015), gradient boosting (Berge, 2014) and model averaging/selection (Della Corte, Sarno, and Tsiakas, 2008;Della Corte and Tsiakas, 2012;Kouwenberg, Markiewicz, Verhoeks, and Zwinkels, 2017). All these approaches find sparsity to be an important modelling feature and, in particular, Kouwenberg, Markiewicz, Verhoeks, and Zwinkels (2017) illustrate also the time-varying relevance of regressors in a univariate framework. Our work corroborates these findings in a multivariate approach that allows us to assess the incremental value of fundamentals in addition to VAR lags and vice versa. From an investor's point of view, our multivariate approach allows for directly mapping the (density) forecasts into portfolio weights without having to rely on additional procedures as is the case for univariate approaches.
Motivated by these considerations, our econometric approach takes the perspective of an investor who learns from past mistakes. We formalize this setting econometrically using the notion of dynamic Bayesian learning. In it, the investor can adapt to a new forecasting environment each time period by switching to a new model. The decision to switch is based on past forecast errors. The result is an extremely flexible framework that learns quickly from recent forecast performance. Our empirical framework has several desirable features.
First, due to the specification of time-varying parameters and dynamic model switching, the VAR forecasting model can adapt to abrupt structural changes or sudden shifts in the investor's information set. Our estimation methods are Bayesian so that the investor's decisions account for parameter uncertainty. At the same time Bayesian methods offer a natural setting for imposing statistical shrinkage which, as discussed above, has been shown to be important for exchange rate predictability when working with large numbers of predictors and a large cross-section of exchange rates. Finally, it is worth mentioning that we allow for model incompleteness; see, e.g., Billio, Casarin, Ravazzolo, and van Dijk (2013). That is, we do not assume that one of our entertained VARs reflects the correct data generating process. The online appendix contains a small simulation experiment which outlines how model incompleteness is accommodated.

Data
All of our individual model configurations are VARs (or extensions thereof) which involve a cross-section of exchange rates as dependent variables. Some models also include additional exogenous predictors. We use the common set of G10 currencies: the Australian dollar (AUD), the Canadian dollar (CAD), the Euro (EUR) 3 , the Japanese yen (JPY), the New Zealand dollar (NZD), the Norwegian krone (NOK), the Swedish krona (SWK), the Swiss franc (SWF), the Great Britain pound sterling (GBP) and the US dollar (USD). All currencies are expressed in terms of the US dollar and are end-of-month exchange rates which enter the model as discrete returns. Thus, we have nine exchange rates, each relative to the US dollar, entering our VAR. The sample runs from 1986:01 until 2016:12. As additional predictors, we also include the Uncovered Interest Parity (UIP), the percentage change in stock prices over the past 12 months (STOCK GROWTH), the difference between long and short term interest rates (INT DIFF) and the percentage change in the nominal oil price (OIL). UIP, STOCK GROWTH and INT DIFF have been widely used in studies such as Wright (2008) and previous research shows that US dollar exchange rates are affected by the price of oil (Lizardo and Mollick, 2010). With regards to the interest rates, we use one-month LIBOR and Eurodeposit interest rates as well as 10 Year government bonds.
In the online appendix, we present empirical results with a longer sample of data going back to 1973 and also provide results for additional established predictors, such as purchasing power parity, the monetary model and the taylor rule approach, which are not available in real time. These results mainly reinforce the findings presented below. We focus on the shorter sample since it covers a period where all exchange rates are largely freely floating and availability of predictors is not a concern. The 1970s included several periods of economic turbulence, such as the oil price shock and changes in exchange rate arrangements for some currencies such as Sweden or Norway. In addition, perfectly comparable interest rates are not available.
The forecast evaluation period runs from 1996:01 to 2016:12 for a total of 252 observations. The online appendix provides sources, descriptions of the fundamental exchange rate models, other details about the data and results for the long sample.

The VAR
Our starting point is a time-varying parameter VAR with exogenous variables: where y t is an M × 1 vector containing observations on M time series variables (in our case, discrete exchange-rate returns for nine countries). x t is a matrix where each row contains predetermined variables in each VAR equation, namely an intercept, (lagged) exogenous variables, and p lags of each of the M variables. We divide the set of exogenous variables into two groups: N x denotes the number of variables which are asset specific and considered as relevant only for a specific exchange rate. For instance, in the equation for the UK currency the UIP for the UK belongs in this class. N xx denotes the number of non asset-specific variables which are supposed to be potentially relevant for all currencies in the setting (e.g. oil price changes). Thus, we have, k = M (1 + p · M + N x + N xx ) elements in β t . Following a large literature in economics and finance 4 we assume that β t evolves as a multivariate random walk without drift, with covariance matrix Ω t of dimension k × k.
Here we outline our methods for estimating and forecasting with a single VAR. Additional details are given in the online appendix. We require a prior for the initial condition for the time-varying VAR coefficients. In the case of the constant-coefficient VAR, this is the prior for the VAR coefficients. We use a variant of the Minnesota prior: 5 Hence, model coefficients are initialized with an expected value of 0 and covariance matrix Ω 0 . If the diagonal elements of Ω 0 are chosen to be small, the respective coefficients are shrunk to 0. We employ this mechanism to effectively exclude certain exogenous variables in some model configurations. The Minnesota prior assumes the prior covariance matrix Ω 0 to be diagonal. Let Ω 0,i denote the block of Ω 0 associated with the coefficients in equation i and Ω 0,i,jj its diagonal elements. The shrinkage intensity towards 0 is determined by the hyperparameters γ. We assume a prior covariance matrix of the form: for coefficients on own lag r = 1, ..., p s 2 i denotes the residual variance of the respective variable i. We set lag length p = 6. The Minnesota prior is typically controlled by a single shrinkage parameter, see Bańbura, Giannone, and Reichlin (2010) and citations therein. In order to deal with prior sensitivity associated with selecting a particular value for this shrinkage parameter, Giannone, Lenza, and Primiceri (2015) and Koop and Korobilis (2013)  Assuming Σ t and Ω t are known and the prior for β 0 is as above, standard Bayesian methods for state space models involving the Kalman filter can be used to estimate β t and obtain the predictive distribution of the returns.
In practice, the econometrician/investor does not observe Σ t and Ω t . In small models, these parameters can be estimated with Markov Chain Monte Carlo (MCMC) methods using approaches such as Chib, Nardari, and Shephard (2006). However, when working with larger models MCMC methods become too computationally demanding. Accordingly, we rely on exponential discounting methods. These are filtering methods in which Σ t and Ω t are updated by looking at recent data and discounting more distant observations at a higher rate.
Thus, if an abrupt change occurs, parameter estimates can adapt at a faster rate compared to an investor who tracks parameters based on the whole, equally weighted, sample of data.
The mechanics behind the discounting approach are described in the online appendix. 6 The key point to note here is that they involve the use of discount factors δ and λ to control the dynamics of Σ t and Ω t , respectively. These two discount factors control how quickly/slowly investors learn from past forecasting performance. When δ = 1 (similarly for λ), then the investor uses all available historical observations, equally weighted, to update volatilities and parameters. For values less than one, older observations are exponentially penalized, giving more weight to recent observations. As we work with monthly data, we set δ = 0.97, following J.P. Morgan/Reuters (1996) (2012) for an application in stock return predictability. For a general treatment, see West and Harrison (1997).
little evidence in favor of time-varying VAR coefficients (Koop and Korobilis, 2013;Chan and Eisenstat, 2018). Time-varying VAR coefficients may be even detrimental for portfolio performance in the case of FX portfolios (Abbate and Marcellino, 2018). Using Wishart Matrix discounting, we rely on a fully Bayesian approach for modelling the uncertainty surrounded by point forecasts of volatilities and correlations. In our online appendix we provide point forecasts of volatilities and correlations with credibility intervals.
In sum, Bayesian posterior and predictive inference of a single VAR can be done using standard Kalman filtering and discounting methods. A VAR is defined by making particular choices for δ, λ, γ 1 , ..., γ 7 to a particular value. Our proposed structure of the prior can also be motivated as a spike-and-slab prior, a perspective we outline in the online appendix.

Dynamic model learning
Our empirical results fix δ and λ and consider a grid of values for each of γ 1 , ..., γ 7 to allow for variable exclusion and different degrees of shrinkage intensity. If we consider every possible combination of values taken from all of these grids we have 512 choices. We interpret a choice as defining a model that the investor has at their disposal at each point in time upon which they could base their portfolio allocation. In order to allow for the investor to make an optimal choice each period t, we use the notion of dynamic model learning (DML).
Dynamic model learning involves selecting, at each point in time, the model specification with the highest discounted joint log predictive likelihood at that time. The predictive likelihood is a measure of out-of-sample forecasting ability that takes into account the entire predictive distribution; see Geweke and Amisano (2012). The individual model configuration with the highest discounted joint log predictive likelihood is used in order to obtain the predictive mean and covariance matrix. These are a crucial input in portfolio optimization.
Our motivation for using learning based on past forecast performance is that it potentially allows for a different model at each point in time. Such a feature is likely particularly useful in times of abrupt change. If we were to use a single VAR, gradual parameter changes would be accommodated if the discount factors δ and λ were below one. But this is not the same as switching between entirely different models as dynamic model learning allows for.
In this dynamic model learning setting, the discounted joint predictive likelihood (DP L) can be calculated as where p j (y t−i |y t−i−1 ) denotes the predictive likelihood of model j in period i and t|t − 1 subscripts refer to estimates made of time-t quantities given information available at time t − 1. Hence, model j will receive a higher value at a given point in time if it has forecast well in the recent past, using the predictive likelihood (i.e., the predictive density evaluated at the actual outcome) as the evaluation criterion. The interpretation of "recent past" is controlled by the the discount factor α, reflecting exponential decay. For example, if α = 0.95, forecast performance three years ago receives approximately 15% as much weight as the forecast performance last period. If α = 0.90, then forecast performance three years ago receives only about 2% as much weight. The case α = 1 implies no discounting and the discounted predictive likelihood is then proportional to the marginal likelihood. Lower values of α are associated with more rapid switching between models. We consider a range of values for α and, at each point in time, choose the best value for it. In this way, we can allow for times of fast model switching and times of slow model switching.
At time τ , we choose the best value for α as the one which has produced the model with the highest product of predictive likelihoods 7 in the past from t = 1, ..., τ . We consider the following grid of values: α ∈ {0.50; 0.70; 0.80; 0.90; 0.99; 1}.

Evidence on model switching and sparsity
Our most flexible approach allows for dynamic model learning over a set of 512 different VAR models and six different values of α using the methods described in Section 4. We use the term "DML with ALL REGRESSORS" to denote the case where DML is being done over all specification choices including all of the exogenous predictors. "DML without own/cross lags and NO REGRESSORS" is the (heteroskedastic) random walk. We also consider several restricted versions of DML which involves dynamic model learning over only some of the predictors. "DML with OIL", for example, means that OIL is the only possible 7 We stress that we are not using the DP L when choosing between different values for α. The DP L is only used to select the best model for a given value of α. exogenous variable variable which could be chosen. "DML without cross lags" means that the coefficients on the cross lags are set to zero in all VARs. We implement such restrictions by tuning the vector of shrinkage parameters γ 1 , ..., γ 7 introduced in equation (3). For instance, to delete the effect of cross lags we set γ 3 to zero. The label DML denotes the VAR which involves only exchange rates (no exogenous predictors). We also consider versions of our approach which set α to a specific value. For instance, DML (α = 0.99) means that α is fixed at 0.99 rather than being selected from a grid of values.
The main focus of this paper is on how well these specifications perform in terms of our dynamic asset allocation problem. However, before doing this, we present a few results illustrating how the dynamic model learning strategy is working using the most flexible specification.
Dynamic model learning is to be preferred over static Bayesian model learning only if the optimal forecasting model is changing over time. Figure 1 shows that it does so in our application. The vertical axis plots the model numbers from 1 to 512 against time for two cases. The set of models begins with model number 1 which is the multivariate random walk without drift and ends with model number 512 which is one of the most flexible models (i.e. the VAR model with an intercept, own lags with shrinkage parameter γ 2 = 0.9 and cross lags with shrinkage parameter γ 3 = 0.9, and with inclusion of all exogenous regressors).
The two lines in Figure 1 are for DML with ALL REGRESSORS (with α selected in a time-varying manner) and DML with ALL REGRESSORS (with α fixed to 1). The latter can be interpreted as allowing for model learning, but using conventional Bayesian model averaging methods. Both cases show that different models are selected at different times.
But with our flexible specification where α is chosen in real time, the model change is dramatic, suggesting that a high degree of model switching is a crucial feature. A wide range of different models is selected with none emerging as dominant. Interestingly, the multivariate random walk is selected 30.16% of the time. It is also evident that in certain episodes in time, e.g. during the subprime crisis, flexible models are preferred.
The coloured diamonds in Figure 2 show which blocks of variables are included at each point in time. In contrast, blank spaces in the graph depict the time-varying sparsity induced by DML, that is, periods where a block of variables is not selected. There is no single block of variables selected in all periods, however, in most cases, selection of a block persists for several consecutive months before it becomes again irrelevant.

Evaluation of economic utility and forecast performance
The previous sub-section establishes that the DML approach is picking up model change, but we have not provided evidence whether this feature is relevant for dynamic portfolio choice.
To investigate this further, we design an international asset allocation strategy that involves trading the US dollar and nine other currencies. We consider a US investor who builds a portfolio by allocating their wealth between ten bonds: one domestic (US), and the nine foreign bonds. In each period, the foreign bonds yield a riskless return in the local currency and at the same time a risky return that is due to currency fluctuations relative to the US dollar. Therefore, the only risk the US investor is exposed to is foreign exchange risk. Every period the investor takes two steps. First, they use the currently selected model (i.e., the model with the highest discounted sum of predictive likelihoods) to forecast the one-period ahead exchange rate returns and the predictive covariance matrix. Second, using these predictions, they dynamically rebalance their portfolio by calculating the new optimal weights. This setup is designed to assess the economic value of exchange rate predictability and to dissect which sources of information are valuable for asset allocation.
The dynamic asset allocation strategy is described in detail in the online appendix.
It involves choosing the investor's degree of relative risk aversion θ. We set θ = 2 and also consider θ = 6 in the online appendix an additional robustness check. It also takes into account transaction costs, τ , ex ante (i.e., at the time of the portfolio construction).
Following Della Corte and Tsiakas (2012), we set τ = 0.0008. It also involves choosing a target portfolio volatility, σ * p , which we set to 10%. We assess the economic value of different forecasting approaches by equating the utility generated by a portfolio strategy which is based on our approach and the utility achieved by a portfolio strategy relying on a simple random walk. The annualized performance fee an investor is willing to pay to switch from a homoskedastic multivariate random walk to our approach is labelled Φ T C in the table below.
As an additional measure of economic utility, we report the Sharpe ratio before and after transaction costs, SR and SR T C (benchmarked relative to the random walk).
The statistical criteria we use are the average joint predictive log likelihood (P LL), coverage statistics of interval forecasts and the mean squared forecasting error (M SF E).
We report (P LL)-statistics in Table 1 and statistics of interval forecasts along with the mean squared forecasting error relative to the random walk in our online appendix. Table 1 contains the results using our approach and the various restricted versions of it described above.
Using DML we find the annualized performance fee after transaction costs is 327 basis points and the annualized Sharpe ratio is 1.01 before transaction costs and 0.82 after transaction costs. Including exogenous regressors into DML leads to substantially stronger improvements when using the economic evaluation criteria. For instance, the annualized performance fee after transactions costs increases to 397 when all the regressors are considered.
Among the exogenous regressors, including UIP leads to the largest improvements in the The table summarizes the economic and statistical evaluation of our forecasts from different model configurations for the period from 1996:01to 2016:12. We measure statistical significance for differences in performance fees and log scores using the (one-sided) Diebold and Mariano (1995) t-test using heteroskedasticity and autocorrelation robust (HAC) standard errors. We evaluate whether the Sharpe ratio of a model is different from that of the random walk (with constant volatility) benchmark using the (one-sided version of the) Ledoit and Wolf (2008) bootstrap test. We compute the Ledoit and Wolf (2008) test statistics with a serial correlation-robust variance, using the pre-whitened quadratic spectral estimator of . One star indicates significance at 10% level; two stars significance at 5% level; and three stars significance at 1% level. Restrictions on α correspond to the specification DML with NO REGRESSORS.
economic performance measures. But, with the exception of OIL, the other regressors also lead to improvements. Importance of VAR lags is identified at some points in time and neglecting own lags or cross lags (i.e., setting γ 2 /γ 3 = 0) is detrimental for portfolio performance. These patterns are in line with those in Figure 2. Most of these findings are statistically significant relative to the homoskedastic multivariate random walk benchmark.
As noted above, we also repeated this exercise using a longer sample going back to 1973.
Using the same economic and statistical criteria, results for this longer sample (reported in the online appendix) are qualitatively similar.
In terms of economic utility gains, our DML models compare very well to results reported in the literature. Given our long evaluation period (252 observations in the "short" sample used in the main text and 324 in the "long" sample reported in the online appendix) and robustness to alternative specifications, this is good news for an investor. However, it is also part of the story that the multivariate models we use involve estimating additional parameters relative to univariate approaches. Enforcing stronger sparsity, we achieve lower mean squared errors but also lower PLLs and economic gains as a result from narrowing the model space to less parameter-rich configurations. This finding aligns with the results of Cenesizoglu and Timmermann (2012) who report broad agreement between density forecast measures and economic performance measures based on the predictive density. At the same time, they note that there is typically a weak link between point forecast evaluation criteria and economic evaluation criteria. Along these lines, this kind of disagreement in forecasting measures is not uncommon and has also been documented for exchange rates (see, e.g., Fratzscher, Rime, Sarno, and Zinna (2015) or Abbate and Marcellino (2018)). 8 An exception in this respect is Kouwenberg, Markiewicz, Verhoeks, and Zwinkels (2017)  Assessment based on the results of the long sample (see the online appendix) is comparable.
We next delineate the effect of restrictions on α. In practice, we estimate α to change rapidly over time. The results in Table 1 relating to α show the benefits of this for forecasting.
Fixing α = 1 (or to other high values) rather than choosing the value of α in real time leads to very poor forecasting results. Allowing for lower values of α and, thus, more model switching leads to higher values of the log scores, and in particular, to higher performance fees and Sharpe ratios. In fact, the highest performance fee and Sharpe ratio is obtained when α = 0.70 for all presented model configurations. Note that the restrictions on α in Table 1 are provided for DML with NO REGRESSORS.
Thus, large economic and statistical losses occur if the investor does not emphasize the most recent forecast performance when selecting the forecasting model on which to base their asset allocation decision. Altogether, we are finding the choice of the discount factor α to be a very important one.
The online appendix contains more empirical evidence that expands on and reinforces the story told above. That is, as an econometric approach DML is performing well and, if used to construct an investment portfolio, would yield higher levels of utility than a simple benchmark. In particular, it presents results which show that the coverage of our predictive densities is good and carries out a variety of robustness checks, including alternative prior specifications and the use of different sets of predictors. We find the modelling approach taken in this paper to be robust and better than other plausible specification or prior choices.
The online appendix also presents evidence against time-variation in the VAR coefficients. In addition, results from the Giacomini and Rossi (2010) fluctuations test are provided and several additional analyses and details with respect to portfolio performance are provided.

Market timing in high volatility periods
In this sub-section, we present additional empirical evidence to shed more light on when our DML methods are performing well and provide some context to the existing theories of exchange rate behaviour. All results are for DML with UIP which is (i) the most natural regressor choice in the context of an economic evaluation and (ii) found to perform best in the preceding sub-section. We also note that results using DML with ALL REGRESSORS are very similar.
Brunnermeier, Nagel, and Pedersen (2008) The message from this regression is clear cut: our DML with UIP strategy leads to portfolios which include fewer of the high-interest-rate currencies in periods of high FX volatility thus avoiding the crash risk associated with the carry trade strategies discussed in Brunnermeier, Nagel, and Pedersen (2008) and Menkhoff, Sarno, Schmeling, and Schrimpf (2012).
There is also time-variation in the economic utility produced by our DML with UIP approach relative to a random walk. If we regress the utility differences (∆U ) on FXVOL (which relates specifically to currency markets), the VIX (which relates to stock markets) and FXDIS (which is a measure of disagreement among professional forecasters) 9 we find: This reinforces the story that DML with UIP is producing gains in utility particularly in times of high volatility in currency markets, rather than financial markets as a whole or in times of uncertainty for the professional forecasters. 10 It is interesting how our findings relate to the scapegoat theory for which studies such as Fratzscher, Rime, Sarno, and Zinna (2015) and Pozzi and Sadaba (2018) find empirical support. Given our focus on developing and applying a method for out-of-sample forecasting, the suggested approach cannot be used as a direct test of the scapegoat theory. As Fratzscher, Rime, Sarno, and Zinna (2015) (2012), we note that in times of high volatility in currency markets, our DML approaches tend to include more regressors and VAR lags which potentially reflects an intensified search for scapegoats. 11 And, as noted 9 Exact definitions and data sources are given in the online appendix. 10 The pronounced out-performance of DML strategies against the random walk (as a proxy of carry trade strategies) in the time of the subprime crisis aligns with Fratzscher (2009)  1 Technical Appendix

Filtering
In this sub-section, we provide econometric details of our (TVP)-VARs. Filtered estimates can be obtained using the fact that the form of the state space model implies where t|t − 1 subscripts refer to estimates made of time-t quantities given information available at time t − 1. Forecasts can be obtained using the fact that the predictive density is multivariate t: where y t|t−1 = x t β t|t−1 . Standard Kalman filtering and Wishart matrix discounting formulas can be used to produce the quantities β t|t−1 , Ω t|t−1 and Q t|t−1 as follows.

Predictive step
The Kalman filter provides, beginning with β 0|0 = 0 (see below), simple updating formulas for producing β t|t−1 and β t|t for t = 1, ..., T which are standard and will not be reproduced here. Given these we can produce point forecasts as: To produce Ω t|t−1 we use a discount factor approximation involving a discount factor λ and update as Note that such an approximation is well established (West and Harrison, 1997) and, for example, used by Koop and Korobilis (2013).
We select the discount factors in a data-adaptive fashion in real time. If λ < 1, the VAR coefficients are time varying and a lower value of λ is associated with more rapidly changing coefficients. If λ = 1, the special case of constant coefficients is obtained. An advantage of the discount factor approach is that we do not have to update the entire covariance matrix but instead only have to choose a single discount factor.
To retain conjugacy, Σ t is modelled as Inverse Wishart (IW) with δn t−1 degrees of freedom and scale matrix S t−1 , Note that this density reflects the uncertainty about Σ t and thus accounts for parameter uncertainty. Low values of δ are associated with increasingly rapid changes in the covariance matrix. Values near one are associated with slow adaptation, while δ = 1 represents the case of a constant covariance matrix Σ.

Update step
The error e t is obtained as the difference between the point forecast y t|t−1 and the actual observation y t e t = y t − y t|t−1 .
The observational covariance matrix is updated as with the scale using approximation results by Triantafyllopoulos (2011) exploiting the expectation invariance of the random walk process for Σ t : E Σ t|t−1 = E Σ t−1|t−1 . As is common in the literature, the scale matrix is initialized as where u 2 i , ..., u 2 M are the residuals from OLS estimation of a VAR over an initial training sample. The updated degrees of freedom are obtained as It is a natural choice to initialize the degrees of freedom with The expected observational covariance is obtained as The time-t Kalman gain (KG t ) is obtained as Given the Kalman gain, the coefficients and the system covariance are updated as and Ω t|t = Ω t|t−1 + KG t x t Ω t|t−1 .

Spike-and-slab interpretation of the prior
Here we provide an interpretation of our proposed prior structure in the main paper as a spike-and-slab prior. This is meant as an illustration of our prior structure from a different angle. Note that the notation introduced in this sub-section only applies locally and is not used elsewhere in the main text or the online appendix. Our starting point is the same type of time-varying parameter VAR with exogenous variables we consider in the main paper: As in the main paper, we divide the set of exogenous variables into two groups: N x denotes the number of variables which are asset specific and considered as relevant only for a specific exchange rate. Thus, we have, The initial conditions for the time-varying VAR coefficients can be viewed as time t = 0 priors for the parameters β t . For each coefficient in VAR equation i, i = 1, ..., M , and lag/predictor j, j = 1, ..., k/M , we use a variable selection prior of the form where δ 0 denotes the Dirac delta which assigns point mass at zero, and DM L denotes dynamic model learning. Each indicator variable k i,j,t can take on a value of zero or one in each time period. When k i,j,t = 1 the prior for β 0,i,j is N (0, V i,j ) and when k i,j,t = 0 the coefficient is exactly zero (and, hence, covariate j does not enter VAR equation i).
Whether k i,j,t is one or zero is decided probabilistically via the DML procedure. 1 We make the time-dependency of the ks explicit here, using subscript t. To streamline notation, we do not use time-subscripts for the γs in the main text, although they are re-selected each period.
We choose V i,j , which contains the prior variances for the included coefficients, using ideas from the Minnesota prior: for coefficients on lag r of variable i (own lag) γ 3 r 2 s 2 k for coefficients on lag r of endogenous variable k, k = i γ (3+l) for coefficients on the l th asset-specific exogenous variable γ (Nx+3+m) for coefficients on the m th non asset-specific exogenous variable where r = 1, ..., p indexes lag-length, k = 1, ..., M indexes VAR equations, l = 1, ..., N x 1 It would be possible to treat k i,j,t as unknown parameters and include them in the Bayesian posterior. But these parameters are time-varying and directly drawing from them in an MCMC algorithm would be computationally burdensome. This motivates our use of DML which uses discounting methods to produce a computationally feasible approach. It is also worth noting that the selection indicators are updated online. That is, as new data becomes available the investor only needs to input the latest observation to update from k i,j,t to k i,j,t+1 .
indexes asset-specific predictors, m = 1, ..., N xx indexes non asset-specific exogenous predictors, while s 2 i denotes the OLS estimate of the residual variance of a univariate AR(p) for variable i.

Dynamic asset allocation and evaluation of economic utility 1.3.1 Portfolio allocation
We design an international asset allocation strategy that involves trading the US dollar and nine other currencies. Consider a US investor who builds a portfolio by allocating their wealth between ten bonds: one domestic (US), and the nine foreign bonds. The US bond return is r f . Define y t = (y 1,t , ..., y 9,t ) . At each period, the foreign bonds yield a riskless return in the local currency but a risky return due to currency fluctuations in US dollars. The expectation of the risky return from the investment in country i s bonds, The only risk the US investor is exposed to is foreign exchange (FX) risk. Every period the investor takes two steps.
First, they use the currently selected model (i.e., the model with the highest discounted sum of predictive likelihoods) to forecast the one-period ahead exchange rate returns and the predictive covariance matrix. Second, using these predictions, they dynamically rebalance their portfolio by calculating the new optimal weights. This setup is designed to assess the economic value of exchange rate predictability and to dissect which sources of information are valuable for asset allocation.
We evaluate our models within a dynamic mean-variance framework, implementing a maximum expected return strategy. That is, we consider an investor who tries to find the point on the efficient frontier with the highest possible (ex-ante) return, subject to achieving a target conditional volatility and a given horizon of the investor (one-month ahead for our main results). Define r t = (r 1,t , ..., r 9,t ) , µ t|t−1 = E t−1 (r t ) as its expectation.
The portfolio allocation problem involves choosing weights, w t = (w 1,t , ..., w 9,t ) attached to each of the 9 foreign bonds (with 1 − 9 i=1 w i,t being the weight attached to the domestic bond): where µ p,t|t−1 is the conditional expected portfolio return and σ * p 2 the target portfolio variance. ι is a vector of ones and the arguments of the predictive covariance matrix are all produced by our estimation algorithm; see the Technical Appendix 1.1 for definitions.
We also here and below use notation where the portfolio return before transaction costs is R p,t = 1 + r p,t−1 = 1 + 1 − w t−1 ι r f + w t−1 r t .
In addition, we let R T C p,t denote period-t gross return after transaction costs, τ . Our specification of the portfolio allocation problem takes into account proportional transaction costs, τ , ex ante (i.e., at the time of the portfolio construction). 3 Following Della Corte and Tsiakas (2012), we set τ = 0.0008. For our main results, we choose σ * p = 10% as target portfolio volatility of the conditional portfolio returns.

Evaluation of economic utility
Quadratic utility Our econometric model provides forecasts of the mean vector of returns and the covariance matrix. To assess the economic utility of the forecasts, we employ the method proposed by West, Edison, and Cho (1993). In a mean-variance framework with quadratic utility, we can express the investor's realized utility in period t as where W t is the investor's wealth in t, ρ determines their risk preferences.
The investor's degree of relative risk aversion θ t = ρWt 1−ρWt is set to a constant value θ. We choose θ = 2 for our main results (and θ = 6 for robustness checks). Then, the average realized utility, U (·), can be employed to consistently estimate the expected utility achieved by a given level of initial wealth (West, Edison, and Cho, 1993). With initial wealth W 0 , the average utility for an investor can be expressed as The advantage of the representation above is that, for a fixed value of θ, the relative risk aversion is constant and utility is linearly homogenous in wealth. In contrast, for standard quadratic utility without restrictions on θ, relative risk aversion would be increasing in wealth, which is not likely to represent a typical investor's preferences. Here, having constant relative risk aversion, we can set W 0 = $1.
Performance measures Our main evaluation criterion is based on the dynamic mean-variance framework and quadratic utility. Comparing two competing forecasting models involves comparing the average utilities generated by the respective forecasting models. We assess the economic value of different forecasting approaches by equating the average utility generated by a portfolio strategy which is based on (a particular version of) the VAR approach and the average utility achieved by a portfolio strategy relying on a simple random walk. Φ is the the maximum (monthly) performance fee an investor is willing to pay to switch from the random walk to the specific VAR configuration. The estimated value of Φ ensures that the following equation holds: where R T C, * p,t+1 is the gross portfolio return constructed using the expected return and covariance forecasts from the dynamically selected best model configuration and R T C p,t+1 is implied by the benchmark random walk (without drift) model. The superscript TC indicates that all quantities are computed after adjusting for transaction costs.
As a second measure of economic utility, we report the Sharpe ratio. Despite its popularity as a risk measure, it is well known that the Sharpe ratio comes with a few drawbacks in the context of evaluating dynamic portfolio strategies; see, for example, Marquering and Verbeek (2004) or Han (2006). This is why we primarily rely on performance fees as an evaluation criterion, while Sharpe ratios are reported as a complementary measure.

Fundamental exchange rate models
This section defines the fundamental exchange rate models which are used in the paper.
One of these (UIP) is used in the main results in the body of th paper. The remainder are used in this online appendix.

Fama regression/UIP
The UIP condition is the fundamental parity condition for foreign exchange market efficiency under risk neutrality. This condition postulates that the difference in interest rates between two countries should equal the expected change in exchange rates between the countries' currencies (Engel, 2013): where ∆s t+1 ≡ s t+1 − s t . E t ∆s t+1 denotes the expected change (at time t for t + 1) of log exchange rates, denominated as US dollar per foreign currency. int t (int * t ) is the one-period nominal interest rate US (foreign) securities. The following forecasting equation arises under the assumption that E t ∆s t+1 equals ∆s t+1 , where s t denotes the log of realized exchange rates: We use int t − int * t as a predictor.

Purchasing power parity
Throughout the PPP literature, the real exchange rate is usually modelled as where q t is the log of the real exchange rate and p t (p * t ) are the logs of the US (foreign) price levels (Rogoff, 1996). PPP postulates a constant real exchange rate, resulting in the price differential as the fundamental nominal exchange rate: and rely on current deviations from this exchange rate as a predictor for ∆s t+1 , that is, if PPP holds, we expect that ∆s t+1 = (f P P P − s t ) holds. Thus, we use f P P P − s t as a predictor.

Monetary fundamentals
The main feature of the monetary approach is that the exchange rate between two countries is determined via the relative development of money supply and industrial production (Dornbusch, 1976;Bilson, 1978). The underlying idea is that an increase in the relative money supply depreciates the US dollar, while the opposite holds for relative industrial production. A simplified version of the monetary approach adopted in previous studies (Mark and Sul, 2001) can be expressed as where m t − m * t denotes the (log) money supply and ip t − ip * t refers to (log) industrial production differentials. This implies ∆s t+1 = f M ON − s t and we use f M ON − s t as a predictor.

Taylor rule fundamentals
The Taylor rule states that a central bank adjusts the short-run nominal interest rate in order to respond to inflation (π) and the output gap (ou). Postulating such Taylor rules for two countries and subtracting one from the other, an equation is derived with the interest rate differential on the left-hand side and the inflation and output gap on the right-hand side. 4 Provided that at least one of the two central banks also targets the PPP level of the exchange rate, the real exchange rate also appears on the right-hand side of the equation. The underlying idea is that both central banks follow a Taylor-rule 4 The output gap is approximated as the deviation of industrial production from trend output which is calculated based on the Hodrick-Prescott filter with smoothing parameter λ = 14, 400. For estimating the Hodrick-Prescott trend out of sample, we only use data that would have been available at the given point in time.
model and determine the interest rate differential which drives the exchange rate. We rely on a simple baseline specification with ad-hoc weights for inflation and output gap which also incorporates the real exchange rate: We use 1.5(π t − π * t ) + 0.1(ou t − ou * t ) + 0.1q t as a predictor.
We apply our dynamic learning strategy to the following set of prediction models: M 3 : y 3t = 0.9 + 0.1y 3t−1 + ε 3t with ε it ∼ N (0, σ 2 ) independent for i = 1, 2, 3 and assume y i0 = 0.25, i = 1, 2, 3 and σ = 0.05. The model set is incomplete, but includes two models (M 1 and M 2 ) that are equivalent versions of the true model in the two parts of the sample.
We apply our dynamic model learning strategy to the simulated data. That is, we calculate the discounted predictive likelihood for each of the models (M 1 , M 2 and M 3 ) and select the model (and value of the discount factor) which would have generated the highest product of predictive likelihoods until the given point in time. As we do for our application to exchange rate forecasting, we only consider information that would have been available at a certain point in time. Instead of excluding dynamic learning by setting α = 1, we choose the same range of the discount factor as we do in our application to exchange rate forecasting: α ∈ {0.50; 0.70; 0.80; 0.90; 0.99; 1}. We simulated 1, 000 runs and recorded how often each of the models was chosen at each point in time. Figure   1 presents the results. It shows that (i) in almost all cases the appropriate stochastic process was selected, (ii) the structural break was recognized quickly and that (iii) model 3 rightly played no role.

Point and interval forecasts
Bayesian methods provide the full predictive density, from which we can produce interval and point forecasts as a byproduct. Although our primary interest is on exploiting density forecasts for asset allocation, it is instructive to have a look at point and interval forecasts.
In particular, the second column of Table 1 shows the empirical coverage rates (for a nominal coverage rate of 90%) for all currencies. These reveal good coverage properties, albeit very slightly too conservative.
The third column of Table 1 reports the ratio of mean squared forecasting errors relative to the simple random walk with constant volatility. Ratios below one indicate better point forecasting performance in terms of squared loss of the DML with UIP forecasts compared to those produced by the random walk. Our evidence on point forecasting is ambiguous with some ratios below and some above one. This finding once more shows how difficult it is to beat a simple random walk in terms of point forecasting accuracy. On the other hand, our previous results show that it is more fruitful to focus on density forecasts and exploit them for portfolio management. first increases significantly before gradually adjusting to the pre-crisis level.
As we a adopt a Wishart matrix discounting (WMD) approach for the error covariance matrix, we are able to provide credibility intervals for our estimates of volatility and correlations. Figure 3 presents the point estimates of annualized volatility along with the 90% credibility intervals for the nine exchange rates. Figure 4 plots the point estimates of correlations along with the 90% credibility intervals for four selected exchange rates returns which display different patterns. The correlation between AUD and NZD increases to almost one at the end of the sample which reflects the well-established comovements between these currencies. On the other hand, the intensity of the relationship between JPY and GBP strongly decreases, potentially due to country-specific drivers of the GBP exchange rate as a result of Brexit. Overall, these figures illustrate that this dimension of model flexibility is able to capture relevant global currency dynamics.       Figure 6 depicts the cumulative differences in predictive log likelihoods between DML with UIP and the random walk (with constant volatility). The out-performance of DML with UIP is most pronounced in the time of the subprime crisis.

Test statistics for Giacomini-Rossi Fluctuation test
The MSFE ratio is a measure of the global performance. It tells us whether the DML with UIP or the random walk have given more precise point forecasts in a mean squared error sense. However, we do not learn from this measure how the relative forecasting power has evolved over time. As we seek to shed some light on the evolution through time, we also provide a measure of local forecasting performance. A useful device for The figure shows the cumulative differences in predictive log likelihoods between DML with UIP and the random walk with constant volatility.
that DML with UIP forecasts better (worse) than the random walk. Figure 7 highlights that the relative forecasting performance is highly unstable across currencies and over time. This finding aligns with Rossi (2013 the assumption that a rolling or fixed estimation window has been used for generating the out-of-sample forecasts. The out-of-sample forecasts in our setup were produced using exponential discounting. Hence, we cannot compute valid critical values for our application. This is, however, not a major concern since Figure 7 shows that the absolute values of the test statistics are greater than two only for few currencies at very few points in time. The null hypothesis of equal forecasting performance would thus essentially never be rejected at conventional significance levels.

Alternative sets of regressors
For our main results we did not include some of the traditional regressors used by exchange rate forecasters due to data revision concerns. But if we are willing to use final vintage data (as opposed to data that forecasters would have had in real time), we can extend our set of regressors to include purchasing power parity (PPP), the monetary model (MON) and an asymmetric Taylor Rule (ASYTAY). The Technical Appendix 1.4 provides details of what these are and how they are calculated. Table 2 shows that including these fundamentals would not improve the performance of an investor's portfolio. Besides these conventional fundamentals, we also experimented with yield curve factors which are commonly used to exploit the terms structure of interest rates and the arising macroeconomic effects (Wright, 2011). This can be considered as an extension of the simple interest rate spread. However, in line with Berge (2014) including a level, slope and curvature factor does not improve our forecasts. The findings are available upon request. We measure statistical significance for differences in performance fees and log scores using the (one-sided) Diebold and Mariano (1995) t-test using heteroskedasticity and autocorrelation robust (HAC) standard errors. We evaluate whether the Sharpe ratio of a model is different from that of the random walk (with constant volatility) benchmark using the (one-sided version of the) Ledoit and Wolf (2008) bootstrap test. We compute the Ledoit and Wolf (2008) test statistics with a serial correlation-robust variance, using the pre-whitened quadratic spectral estimator of . One star indicates significance at 10% level; two stars significance at 5% level; and three stars significance at 1% level.

Alternative priors
We have experimented with many prior specifications that lie within our DML framework and, in particular, experimented with alternative choices of grids for the Minnesota shrinkage parameters. Results were robust. If we use a refined grid for the values of the hyperparameters, we find very slight forecast improvements (at the cost of increasing the computation time). This shows that our specification of grid points for the hyperparameters is sufficiently flexible to cover the model space. In this section, we discuss some alternative, more restrictive, prior specifications. Overall, we find that the rich shrinkage patterns we use pay off compared to more restrictive settings.

"Dense" prior structure
In this sub-section, we discuss a prior structure that represents a "dense" rather than a "sparse" modelling approach. We investigate how our results change when enforcing a "dense" prior rather than letting the data choose between a "dense" and a "sparse" structure. A dense prior is one where VAR lags and exogenous regressors cannot be removed from the model, instead only the degree of shrinkage intensity for each of the (blocks) of variables is selected (i.e. the prior shrinkage parameters cannot be set to be exactly zero as we do in our approach. We specify an alternative prior that features a dense structure as described in the following paragraph. For γ 2 and γ 3 , the shrinkage parameters for own and cross lags, we use grids of {0.0001; 0.01; 0.1} and also the shrinkage parameter for UIP is estimated using a grid of {0.0001; 0.01; 0.1}. We do not take into account other exogenous variables in this setting.  We measure statistical significance for differences in performance fees and log scores using the (one-sided) Diebold and Mariano (1995) t-test using heteroskedasticity and autocorrelation robust (HAC) standard errors. We evaluate whether the Sharpe ratio of a model is different from that of the random walk (with constant volatility) benchmark using the (one-sided version of the) Ledoit and Wolf (2008) bootstrap test. We compute the Ledoit and Wolf (2008) test statistics with a serial correlation-robust variance, using the pre-whitened quadratic spectral estimator of . One star indicates significance at 10% level; two stars significance at 5% level; and three stars significance at 1% level.

VAR with tight prior
We also explored a VAR for the nine exchange rates without any exogenous regressors and a very tight prior for the VAR coefficients. This setting is similar to Carriero, Kapetanios, and Marcellino (2009). For γ 2 and γ 3 , the shrinkage parameters for own and cross lags, we use grids of {10 −4 ; 10 −5 ; 10 −6 }. Although we find that for seven out of nine exchange rates the MSFE error is slightly lower than that of the random walk, results in terms of density forecasting accuracy and economic measures are inferior to our baseline setting: P LL = 21.73, Φ T C = 164, SR = 0.64 and SR T C = 0.62.

Treating the exogenous variables as endogenous
We also investigate how the results change if the exogenous variables are not treated as such. Instead they are included as endogenous variables in the VAR. That is, instead of working with a 9 variable VAR with exogenous variables, we work with a 37 dimensional VAR involving the 9 exchange rates, 3 asset-specific variables, UIP, INT DIFF, STOCK GROWTH, (i.e. there are 3 such variables for each of 9 countries, hence this adds 27 variables to the VAR) and 1 non-asset specific variable, OIL. With this much larger VAR it is computationally infeasible to do a grid search over seven different prior shrinkage parameters. Accordingly, we employ the framework proposed by Koop and Korobilis (2013) which involves a single shrinkage parameter. We label this the KK-Minnesotaprior. The strategy of using a single shrinkage parameter for imposing shrinkage on all model parameters (except the intercept) is commonly used in the large Bayesian VAR literature; see Giannone, Lenza, and Primiceri (2015), Koop and Korobilis (2013) and Bańbura, Giannone, and Reichlin (2010). Following Koop and Korobilis (2013) . Table 4 unambiguously conveys the message that the more restrictive structure of the Koop and Korobilis (2013) framework is clearly inferior in this exchange rate forecasting exercise compared to our proposed setting, both in statistical terms and even more so in economic terms. This highlights that allowing for different degrees of prior shrinkage on different blocks of parameters is empirically warranted.

Time-varying coefficients
In the preceding section, all of our VARs involved constant coefficients (but had timevarying volatilities). Time-variation in VAR coefficients can easily be added, but leads to inferior forecasting performance. To show this, we present results using a DML with The table summarizes the economic and statistical evaluation of the KK-Minnesota-prior for the period from 1996:01 to 2016:12. We measure statistical significance for differences in performance fees and log scores using the (one-sided) Diebold and Mariano (1995) t-test using heteroskedasticity and autocorrelation robust (HAC) standard errors. We evaluate whether the Sharpe ratio of a model is different from that of the random walk (with constant volatility) benchmark using the (one-sided version of the) Ledoit and Wolf (2008) bootstrap test. We compute the Ledoit and Wolf (2008) test statistics with a serial correlation-robust variance, using the pre-whitened quadratic spectral estimator of . One star indicates significance at 10% level; two stars significance at 5% level; and three stars significance at 1% level.
UIP specification identical to that used in the preceding section except that it sets λ = 0.99. Results are presented in Table 5. In comparison to the constant parameter case (λ = 1) in our main results, we find that using time-varying VAR coefficients is in general detrimental for forecasting performance, particularly when evaluating forecasts in terms of the economic performance measures. We find strong evidence that allowing for abrupt switching between different models for handling the evolving relationship between exchange rates and fundamentals as highlighted by Sarno and Valente (2009).
But allowing for gradual change in parameters is not a useful addition. An exception is the specification "DML without own/cross lags but with ALL REGRESSORS". In this case time-varying parameters do not turn out to be detrimental. It appears that, in specifications that involve estimation of many parameters for the VAR lags, time-variation in parameters leads to lower performance. This finding aligns with the econometric literature with respect to time-varying VAR parameters in medium-size VARs (Chan and Eisenstat, 2018;Koop and Korobilis, 2013).

Additional results on portfolio performance
In this sub-section, we explore in greater detail the portfolio performance implied by our flexible DML with UIP model and the portfolio performance based on the random walk. The table summarizes the economic and statistical evaluation of our forecasts from the TVP-VAR for the period from 1996:01 to 2016:12. We measure statistical significance for differences in performance fees and log scores using the (onesided) Diebold and Mariano (1995) t-test using heteroskedasticity and autocorrelation robust (HAC) standard errors.
Restrictions on α correspond to the specification DML with NO REGRESSORS. We evaluate whether the Sharpe ratio of a model is different from that of the random walk (with constant volatility) benchmark using the (one-sided version of the) Ledoit and Wolf (2008) bootstrap test. We compute the Ledoit and Wolf (2008) test statistics with a serial correlationrobust variance, using the pre-whitened quadratic spectral estimator of . One star indicates significance at 10% level; two stars significance at 5% level; and three stars significance at 1% level. and is hence only slightly higher than the target portfolio volatility of 10%. Skewness of portfolio returns is substantially higher based on DLM with UIP, while the kurtosis is considerably lower than in the random walk case. Altogether, the portfolio characteristics of the DML with UIP model are clearly superior to those of the random walk. In addition, the characteristics of the portfolio returns based on the DML with UIP strategy are also more favourable for risk management and diversification purposes: the correlation of the returns to equities (proxied by S&P500 returns) is even negative and the first-order autocorrelation of returns and squared returns is lower than in case of the portfolio returns based on the random walk.

Evolution of portfolio weights
It is of interest how the portfolio weights have evolved through time. Figure 8 and   Table 7 summarizes the effect of restrictions on portfolio weights on economic utility and the Sharpe ratio. Restricting the portfolio weights to [−1; 1] leads to even slightly better portfolio performance than in the case where the portfolio weights are left unrestricted. This is good news from a risk-management perspective since excessive portfolio weights are not required to achieve high utility gains. However, severe restrictions on the portfolio weights are clearly detrimental for portfolio performance.

Global Harvest Index as benchmark
We consider the Deutsche Bank Global Currency Harvest Index as an additional benchmark strategy. This index can be seen as a proxy for carry trade returns as a style strategy. Figure 11 shows that the wealth path generated by our random walk model and the evolution of the Global Currency Harvest Index are broadly similar. The correlation between the returns is 0.60. This result is not surprising, given that our

Portfolio performance when removing one currency
To assess the sensitivity of the portfolio performance, we compute the Sharpe ratios when we remove one currency from the set of currencies and set the respective portfolio weight to 0. Table 8 shows that there is not one particular currency that drives the results. Not surprisingly, enforcing dollar neutrality leads, in relative terms, to the largest decrease in the Sharpe ratio.

Results for single currencies
We also analyze the case where only one foreign bond is considered for investment in addition to the risk-less USD bond (from the perspective of a US investor). Table 9 reports the results and once again sends the story that there does not emerge one particular currency that leads to attractive portfolio results and reinforces our finding that market timing in a large set of currencies is key for economic utility gains.

Additional robustness checks
In this sub-section, we briefly mention a couple of additional specifications we considered.

Spillover effects
The first of these investigated whether spillover effects involving macroeconomic fundamentals might be important. Such third-country effects have been discussed in Berg and Mark (2015). For instance, instead of including only the UIP for the UK in the equation for the UK currency (as we do), we can also include the UIPs for all the other currencies as well. If we do this, results are not noticeably affected. Our VAR specification allows spillovers between the exchange rates for different countries.
This kind of spillover we have found to improve forecasts. Adding spillovers involving macroeconomic fundamentals results in no additional benefits.
Alternative measure of portfolio performance  (2008) bootstrap test. We compute the Ledoit and Wolf (2008) test statistics with a serial correlation-robust variance, using the pre-whitened quadratic spectral estimator of . One star indicates significance at 10% level; two stars significance at 5% level; and three stars significance at 1% level. As an alternative performance measure we also investigated the manipulation-proof performance measure proposed by Goetzmann, Ingersoll, Spiegel, and Welch (2007). The advantage of this criterion is that we do not have to assume a particular utility function.
The results compared to the reported quadratic utility case are very similar and available upon request.

Alternative risk aversion
It is of interest whether the economic utility gains can also be achieved by investors with higher risk aversion. To explore this issue, we also considered the risk aversion coefficient θ = 6 instead of θ = 2. For this case, we found even larger utility gains than in our baseline setting, achieving an annualized performance fee of 525 basis points, that is Φ T C = 525, for the DML with UIP model (464 basis points in the base case) .

Specific degrees of time variation for different blocks of coefficients
As discussed previously, we have found that working with a constant coefficient VAR by setting λ = 1 leads to improved forecast performance over λ = 0.99. But these specifications assume the same λ applies to all the VAR coefficients. It is theoretically possible that, by allowing for different degrees of shrinkage for different blocks of coefficients, forecast performance can be improved. In practice, we have done extensive experimentation and have not found any forecast improvements by doing so.
Alternative grid for the decay factor α We also considered a more refined grid for choosing α, namely α ∈ {0.40 : 0.01 : 1.00}.
In this case, α = 0.73 is selected from the data over the entire period and, hence, is quite similar to our benchmark results (α = 0.70). The results for the refined grid are almost exactly the same as in our base case.

Results for the long sample
In this sub-section, we report some additional key results for our long sample period which starts in 1973:01 and for which we compute out-of-sample results from 1990:01 to 2016:12. Due to data availability we do not consider the inclusion of exogenous regressors for this sample period.
Table 10 summarizes the results. As is the case for the short sample, DML substantially outperforms the multivariate random walk (i.e. DML without own/cross lags) both in terms of PLLs and economic criteria. Here again, fast model switching is found to be crucially important for the accuracy of density forecasts and portfolio allocation. The optimal decay factor α is found to be 0.80 over the entire evaluation period and is thus comparable to the optimal decay factor of the short period (α = 0.70).
As for our short sample, we consider the G10 countries. Figure 12 illustrates the high frequency of model change when the decay factor is chosen from the data. The vertical axis plots the model numbers from 1 to 32 against time for two cases. The set of models begins with model number 1 which is the multivariate random walk without drift and ends with model number 32 which is one of the most flexible models (i.e. the VAR model with an intercept, own lags with shrinkage parameter γ 2 = 0.9 and cross lags with shrinkage parameter γ 3 = 0.9). The two lines in Figure 12 are for DML (with α selected in a time-varying manner) and DML (α = 1). In our flexible specification where the decay factor is chosen from the data, model change occurs much more frequently than in the case when there is no discounting of forecasting performance (α = 1). Many different models are selected over time. The individual specification which is picked most frequently is the multivariate random walk (in approximately half of the cases). This prominent role of the multivariate random walk reinforces the story that sparsity is a key aspect. Figure 13 shows which blocks of variables are included at each point in time (coloured diamonds). Blank spaces in the graph depict the time-varying sparsity induced by DML, that is, periods where a block of variables is not selected.
Typically, we observe persistence in the selection of a block of variables. Figure 14 compares the evolution of wealth for an investor who begins with one dollar and relies on DML to the wealth of an investor who uses a multivariate random walk with constant covariance to construct their portfolio. As for the short sample, the outperformance of DML is large, with the most striking gains around the time of the subprime crisis.
Overall, all key findings for the short sample period also apply for the long sample. The table summarizes the economic and statistical evaluation of our forecasts from the DML and restricted versions thereof for the period from 1990:01 to 2016:12. We measure statistical significance for differences in performance fees and log scores using the (one-sided) Diebold and Mariano (1995) t-test using heteroskedasticity and autocorrelation robust (HAC) standard errors. Restrictions on α correspond to the DML specification. We evaluate whether the Sharpe ratio of a model is different from that of the random walk (with constant volatility) benchmark using the (one-sided version of the) Ledoit and Wolf (2008) bootstrap test. We compute the Ledoit and Wolf (2008) test statistics with a serial correlationrobust variance, using the pre-whitened quadratic spectral estimator of . One star indicates significance at 10% level; two stars significance at 5% level; and three stars significance at 1% level.   The figure depicts the evolution of wealth in the DML model and the random walk model.