Practical portfolio optimization in Python (2/3) - machine learning

Advanced Portfolio Optimization

In the second article, we will go through more advanced approaches and also modern ones. We will slightly describe CVaR optimization and even the Bayesian approach with the Black-Litterman model. It uses more advanced math, so we will cover some basics.

I will also mention stochastic programming: multi-stage problems which can be used for portfolio optimization. The most up to date part of this article is the machine learning approach developed by Marcus Lopez de Prado. The code is in the third article.

We will use the same setting as in the first article. Let’s cover some theory, and then we will analyze the results of a walk forward out-of-sample. For each section, we will plot the resulting portfolios, and at the end of the article, we will compare the metrics for given methodologies.

CVaR - Conditional Value at Risk

I believe you have heard about Value at Risk. It is a lower percentile of returns, usually 5%. It means that with a probability of 95%, the returns will be higher than this value. Conditional Value at Risk is the mean of all worst losses, better explained in the histogram (daily returns on SPY 2015-2020).

These risk metrics are very useful and will tell you something about your expected losses or the worst case scenarios. It can be used as one of the backtest metrics. You can calculate these metrics on your daily returns or trades. Calculation with Pandas:

``````#daily_returns = pandas.Series() of your daily returns
var = daily_returns.quantile(0.05)
cvar = daily_returns[daily_returns < var].mean() ``````

Mathematically CVaR is a conditional expectation, so from probability theory, we have:

$\mathrm{CVaR}_{\alpha}(R) \cong \mathbb{E}\[R~|R\le\mathrm{VaR}_{\alpha}(R)\]$

Where $R$  is return (daily or return on trade), we set $\alpha=0.05$ , and $\mathbb{E}$ is expected value. In literature, you can also see working with losses ( $-R$ and $\alpha=0.95$ ).

${\begin{array}{clrlcl}\displaystyle \min_{\mathbf{w}} & \displaystyle -\mathrm{CVaR}_{\alpha} \left( \sum_{i=1}^n w_i R_i\right)&\\\\ \textrm{subject to} & \displaystyle \pmb{\mu}^\top\mathbf{w} \ge \mu_x&\\& \displaystyle \sum_{i=1}^{n}w_i = 1&\\& w_i \le 1 & i \in \{ 1,\dots,n\} \\& w_i \ge -1 & i \in \{ 1,\dots,n\} & \textrm{resp. } w_i \ge 0\end{array}}$

Where $R_i$  is a random variable representing return on asset $i$ , we are minimizing the negative CVaR because CVaR, according to our definition, is a negative number, the loss. You can see similar minimization without minus when CVaR is defined for losses. Other conditions are the same as in the Markowitz model in the previous article. Note that this problem can be rewritten to a linear program that is easy to optimize.

Unfortunately, this optimization was omitted from PyPortfolioOpt, but the linear model can be programmed directly with cvxpy (advanced). The main target of this part was to present you a CVaR metric.

Black-Litterman model

A black-Litterman allocation is a Bayesian approach to asset allocation. With a Bayesian approach, we use some prior probability distribution for returns, which is then optimized or updated by historical data. The result is a posterior distribution (posterior estimate of expected returns), which can be used for further analyses.

The most important update to Markowitz models is that we can use investors’ views or expectations for each asset. The results then are taking into account investors’ views and also real historical performance. With different views, we can construct many hypothetical portfolios.

A simple example: an investor wants to invest in tech stocks and thinks that AAPL stock will grow 15%, FB will outperform GOOG by 5%, AMZN will grow 20%, and also the other combinations like NVDA with QCOM will outperform AMZN and FB by 5%. With these views, you can create exciting conditions which are affecting the prior distribution. For detailed info on how to use it, look at the Black-Litterman model’s documentation in PyPortfolioOpt, and you can find the code in the next article.

The resulting posterior returns of the Black-Litterman model can be imputed to Markowitz optimization as expected returns or model can directly generate some weights. If you are more interested in the Bayesian approach, methodologies, and usage in financial markets, stay tuned; I will add some articles soon.

We won’t go deeper through the theory; instead, we will look at practical results. In our optimization setting, for simplicity, we will use mean returns as an investor’s view. We then use posterior returns in Markowitz’s return maximization approach and the direct result of the Black-Litterman model.

On the plot, you can see the returns of portfolios constructed by the Black-Litterman model.

The results are similar as in the previous article. Higher returns also have very high volatility, and traces of equities are not that pleasant as for SPY or QQQ. We used the posterior expected return and imputed it into models from the previous article: BL-maxReturn, BL-maxSharpe, and BL-minVolatility.

We also calculated the direct portfolio allocation from the Black Litterman model – return implied weights.

Stochastic programming - multi-stage problems

Every optimization problem which deals with random variables or deals with uncertainty is a stochastic programming problem – all models derived from Markowitz are in the family of stochastic programming. Using expected values, covariance matrix, we are approximating the probability distribution to work with it.

The mathematical theory behind this stuff is advanced and beautiful – there are many interesting optimization problems out of the financial world.

Multi-stage portfolio optimization problems mean we are rebalancing our portfolio more times during the time. We are also optimizing what weights we use in the second, third, other stages.

The application of this methodology needs a deep understanding of the subject, so it is definitely out of this article’s scope. I did some googling for you, and a good source is a presentation with examples from Minnesota UniversityI had learned about stochastic programming during my master’s studies at Charles University but have not tried multi-stage optimization on portfolios because there was no necessity to do that advanced model during my career.

Another reason not to spend such a long time developing this method was the overfitting tendency of these methodologies.

Machine learning for portfolio allocation

Marcus Lopez de Prado does a lot of research about asset allocation and its application in the age of data science. The primary motivation was to present a brand new portfolio allocation methodology that does not have problems that the Critical Line Algorithm has (CLA, by Markowitz).

The biggest problem is the lack of robustness – a small change in expected profits or covariance matrix can lead to totally different solutions. CLA is mean-variance optimization with different constraints on weights.

Markowitz style optimization requires the covariance matrix’s invertibility, and to fulfill this mathematical operation; the matrix must meet some conditions. Hierarchical Risk Parity (HPR) combines graph theory and machine learning techniques, and there is no need for inverting the covariance matrix.

I will provide here 2 new theories for modern portfolio allocation, both by Marcus Lopez de Prado. Since both algorithms’ ideas are quite simple, going through the whole theory is exhausting because you have to understand a lot of theory behind used machine learning algorithms. It is not that important if you are informing yourself by seeing practical applications.

We will use Python library mlfinlab, which is free to use for these methodologies, but for advanced usage, the documentation is paid. The package uses advanced methodologies from different authors, but Marcus Lopez de Prado is the dominant one.

Note that I didn’t apply for their training program, nor do I have access to full documentation of mlfinlab – I am just a practitioner who frequently uses this package, and I would like to show you some examples.

Critical line algorithm (Markowitz) Results

For better comparison, we can look at results from CLA – turning points, maximum Sharpe, and minimum volatility. Turning points are portfolios from the efficient frontier, and each frontier can have multiple points. Portfolios from the first points have a maximum return. From the last point, they have minimum volatility.

This can be visible on the plot where we have turning points’ portfolios (1 – highest return, 2 – lower,… 5 – lowest returns and lowest risk). Portfolios with a higher profit than SPY have not recovered during 2019 from the drop from the end of 2018; the same result was visible in the previous article.

But the theory works very well on the out-of-sample data, too. We can see that when we move with turning points to lower volatility, the resulting equities have lower volatility.

Hierarchical Risk Parity

When dealing with portfolio optimization, the covariance matrix is the most important tool we use in the mean-variance approach. In a covariance matrix, we have a lot of information about the relationship between given assets and only linear relationships. By this approach, we cluster assets into similar groups by unsupervised machine learning methodology – hierarchical clustering. One picture is worth thousands of words so you can have a look at how HRP reorganizes assets, resp. Their correlation matrix, the image is straight from HRP mlfinlab documentation.

Of course, the most crucial part is the hierarchical tree clustering, in which weights are also calculated. If you are interested in the whole theory, have a look at this paper.

A similar methodology, motivated by HRP is Hierarchical Equal Risk Contribution (HERC), which uses a little different clustering. It cuts parts of the hierarchical tree and creates an optimal number of clusters (usually less than HRP). Then, practically, it calculates weights for given clusters, not for separated assets. In each cluster, the weights are uniform for given assets.

The resulting equities are shown on the plot. The results are robust compared to CLA, but also more conservative – lower returns with lower volatility. Ideally, we are looking for higher returns with lower volatility, but portfolio optimization is almost an impossible task compared to the benchmark of the S&P 500 or Nasdaq 100 (during 2010-2020).

Nested Clustered Optimization

NCO is another method from Marcus Lopez de Prado that deals with the instability problem of mean-variation optimization. A detailed description is in the scientific paper

This clustering is based directly on the covariance matrix, then calculating the weights for each cluster separately, reducing the covariance matrix into clusters, and calculating weights for clusters. Final weights are combinations of weights for clusters and inside clusters. This method is updated within the same paper by applying the Monte Carlo methodology to get even more robust results.

You can see that these algorithms solve CLA’s main problem (stability or robustness) on the plot. NCO algorithm uses three different approaches – the first is the NCO algorithm itself, the second uses convex optimization (CVO), and the last one uses Monte Carlo permutation and both NCO and CVO results, which can maximize Sharpe ratio.

You can see that all these methodologies are very robust on the plot, and we get similar results to SPY.

Table of results

In this final part, you can analyze 4 important metrics of used algorithms. To check the code, look at our last article.

 Annual Return Max Drawdown Sharpe Ratio Hist. Volatility BL-maxReturn 19.45% -33.59% 0.754 25.09% BL-maxSharpe 10.63% -32.73% 0.500 22.64% BL-minVolatility 7.13% -31.85% 0.451 17.12% BL-retImplied 23.37% -37.48% 0.841 29.61% CLA_maxSharpe 10.77% -32.10% 0.526 21.78% CLA_minVolatility 8.90% -32.83% 0.587 15.90% CLA_turnPoint-1 25.98% -68.47% 0.643 48.22% CLA_turnPoint-2 14.45% -36.96% 0.611 25.14% CLA_turnPoint-3 9.45% -31.69% 0.524 19.39% CLA_turnPoint-4 8.23% -32.06% 0.515 16.92% CLA_turnPoint-5 7.77% -32.49% 0.507 16.18% HERC 7.02% -29.60% 0.439 16.72% HRP 9.01% -32.09% 0.600 16.25% NCO-MC-cvo_sharpe 10.71% -33.52% 0.621 18.31% NCO-MC-nco_sharpe 11.26% -33.43% 0.646 18.32% NCO-cvo 9.49% -33.20% 0.615 16.60% NCO-nco 9.61% -33.75% 0.629 16.31% SPY 11.02% -33.71% 0.661 18.44% QQQ 19.42% -28.57% 0.957 21.36%

Every time we go for higher returns than benchmark SPY, we have to face higher volatility and, most of the time, also higher drawdowns. Thanks to the covid-19 crisis in 2020, the drawdowns are similar, caused by the same event. Only with returns over 20% annually do we have higher drawdowns. But without any doubt, new methodologies are doing exactly what they were created to – making more robust asset allocation.

This robustness is visible only by stable results according to returns and volatility, but it is not proof of the robustness. To prove it, we have to use a much longer history, and different rebalance times, and a much wider universe – stocks, bonds, ETFs, and commodities. This simple example uses only 200 top traded stocks according to dollar volume. In the papers of these new methodologies, the concepts and robustness are proven correctly.

These articles’ main point is to show you what kind of methodologies run behind many funds that manage billions of dollars.

Conclusion

Does running for that extra return worth the risks? Well, every investor has a different answer to this question. From this article, getting a 26% annual return is not worth 48% annual volatility. If you look at our article about volatility, you would know that 48% yearly volatility means that with 95% probability, we can expect the returns in the interval (-96%, 96%), which is doubling or deleting your account. That’s why investors use optimization mostly for minimizing the risk, having less volatility than it is in the markets.

We compared classical methodologies with the cutting-edge ones, we slightly went through the theory, and the full code is available in the third article.

This portfolio optimization can be used for algo-trading strategies, too. Imagine using the same idea on more futures markets, and you want to allocate capital to better-working markets. Well, strategies never work all the time perfectly, so it can happen (also in your backtest) that sometimes strategy does better on metals, another time on currencies, and so on. By actively changing the capital allocation into different markets, you can get better results.

It is good to know about these methodologies. I believe at some point they can be useful for your trading and investing. Another story is creating long term investments, not based on portfolio optimization but predictions from different models. Is long term prediction on financial markets even possible? That’s a topic for another article.