### Portfolio optimization

In the last article of this series, we will go through a fully functional code. You will read here how the process of optimization was applied and backtests created. The code is based on previous articles: Markowitz optimization and machine learning adaptations.

#### Importing all necessary stuff

```
import pandas as pd, numpy as np, glob
# installation requires cvxpy (for installation on windows see the documentation)
import mlfinlab # version 0.15.0
import pypfopt # version 1.2.6
from mlfinlab.portfolio_optimization.clustering import NestedClusteredOptimisation
```

### Loading data

My data is saved and prepared in the folder. For practical purposes, use your own daily data, resp. download data from Yahoo or Alpha Vantage (see our other articles). We will load data, use only data from 2014, and save them to the directory. Then we simply transform the directory into a DataFrame with a multi-column consisting of symbols and OHLCV.

```
df = {}
for f in glob.glob('myPathToData/*feather'):
print(f)
a = pd.read_feather(f)
a = a[a['date']>='2014-01-01']
if len(a) == 0:
continue
# windows' users use '\\' instead of '/'
df[f.split('/')[-1].replace('.feather','')] = a.set_index('date')
dd = pd.concat(df.values(), keys=df.keys(), axis=1)
df = dd.swaplevel(axis=1)
```

### Walk-forwarding optimization - basic setup

The whole optimization will be in loot, where we reallocate the portfolio every 3 months. We use Alpha Vantage function to use only stocks that were available for trading during the time of re-optimization – Listing Status. Then we prepare our symbols – top 200 according to median Dollar Volume; see the comments in the code section.

```
download_path = 'https://www.alphavantage.co/query?function=LISTING_STATUS&date={}&apikey=YourApiKey'
allocation = {}
for q in pd.date_range('2015-01-01','2020-09-01', freq='QS'):
print(q)
# load benchmark data - ETFs of S&P 500 and Nasdaq 100, if you have them within your stock universe, or just download them - after the loop
bench = df['adjusted close'][['SPY','QQQ']]
# load Stock Listings with specified date
symbols = pd.read_csv(download_path.format(str(q)[:10]))
# use only Stocks, not ETFs
symbols = symbols[
symbols['assetType']=='Stock'
]['symbol'].values.tolist()
stocks_only = symbols.copy()
# calculate daily dollar volume
usdvol = df[df.index<q]['close']*df[df.index<q]['volume']
# use only last year
usdvol = usdvol[usdvol.index >= q - pd.DateOffset(years=1)]
# if there were more than 30 missing values we don't use that stock
usdvol = usdvol.loc[:, usdvol.isna().sum()<30]
# symbol has to be treadable during the last day
usdvol = usdvol.loc[:, usdvol.iloc[-1].notna()]
# symbol has to be within assetType Stock
usdvol = usdvol.loc[:, usdvol.columns.isin(stocks_only)]
# finally we calculate median value of 1 year and take top 200
usdvol = usdvol.median().sort_values()
symbols = usdvol[-200:].index
```

Now when we have prepared symbols, let’s prepare in-sample and out-of-sample data for portfolio optimization. In-sample is 1 year before re-optimization, out-of-sample or our simulation is for the next 3 months.

```
dd = df['adjusted close'][symbols].copy()
# in-sample prices + benchmark
df_is = dd[dd.index<q]
bench_is = bench[
(bench.index < q) &
(bench.index >= q - pd.DateOffset(years=1))]
# we use only stock which have less than 5% missing values on IS
nans = df_is.isna().sum()/len(df_is) # or .isna().mean()
nans = nans[nans<0.05]
# we can fill missing values of IS, first we will forward then backward
df_is = df_is[nans.index]
df_is = df_is.fillna(method='ffill').fillna(method='bfill')
# creating OOS dataset, the first index is the same as the last on IS, we explain why later
last_date = df_is.index[-1]
df_oos = dd[(dd.index >= last_date) &
(dd.index < q + pd.DateOffset(months=3))][nans.index]
```

### Efficient frontier optimization

Since methodologies from the first article assume that assets are not correlated, we will get rid of assets that have a higher correlation than 75%, and we keep only the ones with highest dollar volume.

```
# calculate correlations from pct returns
corrs = df_is.pct_change().corr().abs()
# correlated symbols will be save into list
to_drop = []
# columns were sorted from lowest to highest USD volume
for s in df_is.columns[::-1]:
if s in to_drop: # if it is already used, skip it
continue
# find stocks correlated with s (more than 0.75)
a = corrs[s].drop(s)[corrs[s].drop(s) > 0.75]
# add findings to list to_drop
if len(a) > 0:
to_drop += a.index.tolist()
# use only uncorrelated symbols
df_is = df_is.drop(to_drop,axis=1)
df_oos = df_oos.drop(to_drop,axis=1)
```

Now we will go through all used methodologies, the explanation is commented in code.

```
#calculate share ratios of in-sample
is_sharpes = mlfinlab.backtest_statistics.sharpe_ratio(
df_is.pct_change())
# calculate expected returns which will be used in the methodologies - we predict for 3 months, so 63 days
expected_returns = pypfopt.expected_returns.mean_historical_return(
df_is, frequency=63)
#covariance matrix is the main part of portfolio optimization
cov_matrix = df_is.pct_change().cov().values
symbols = df_is.columns # list of symbols
results = {} # save results into dictionary
# Sharpe Maximization
opt = pypfopt.EfficientFrontier(
expected_returns,
cov_matrix,
solver='CVXOPT'
).max_sharpe(0) # zero risk-free interest rate
# the result is OrderedDictionary, so we do classic dict and add it to pandas.Series
opt = pd.Series({x: opt[x] for x in opt.keys()})
results['opt_maxSharpe'] = opt.copy()
# Return Maximization
# input into efficient_risk is maximum volatility we want to achieve, so maximum of benchmark volatilities
opt = pypfopt.EfficientFrontier(
expected_returns,
cov_matrix,
solver='CVXOPT'
).efficient_risk(
bench_is.pct_change().std().max()
)
opt = pd.Series({x: opt[x] for x in opt.keys()}) # OrderedDictionary
results['opt_maxReturn'] = opt.copy()
# Volatility Minimization
# input into efficient_return is the minimal return we want to achieve, so it is the maximum of returns of benchmarks
opt = pypfopt.EfficientFrontier(
expected_returns,
cov_matrix,
solver='CVXOPT'
).efficient_return(
pypfopt.expected_returns.mean_historical_return(
bench,
frequency=63).max()
)
opt = pd.Series({x: opt[x] for x in opt.keys()}) # OrderedDictionary
results['opt_minVolatility'] = opt.copy()
# top 15 stocks according to in-sample Sharpe, weighted equally
results['max_sharpe'] = (is_sharpes >= is_sharpes.sort_values()[-15]
)/15
# Uniform distribution of top 100 stocks according to USD volume
# Series with zero values and add 1 only to top 100 stocks
results['uniform'] = pd.Series(0, index=is_sharpes.index)
results['uniform'].loc[
is_sharpes.index.isin(
usdvol[-100:].index)] = 1
# Create uniform weights by dividing by 100, resp number of ones
results['uniform'] /= results['uniform'].sum()
```

### Black-litterman allocation

```
# simple views, first set empty matrices
Q = np.zeros((len(df_is.columns), 1))
P = np.zeros((len(df_is.columns), len(df_is.columns)))
views = dict(expected_returns.round(3).sort_values())
# now set values
for i, view_ticker in enumerate(views.keys()):
# expectation for ith element of Q
Q[i] = views[view_ticker]
# ith row and column which is for given stock is 1, other is 0
P[i, list(df_is.columns).index(view_ticker)] = 1
# application of BL allocation
bl = pypfopt.black_litterman.BlackLittermanModel(
cov_matrix,
tickers=df_is.columns, # names of symbols
pi='equal', # prior estimate for returns - by equal it is uniform distribution
Q=Q, P=P, # investor's views
omega='idzorek', # Idzorek's method (see documentation for more info)
risk_free_rate=0,
view_confidences=np.array(len(Q)*[0.25]) # confidence of views - how are we confident, let's just pass 25%
)
# posterior returns
rets = bl.bl_returns()
rets.index = df_is.columns
# max Sharpe - we just use different returns
opt = pypfopt.EfficientFrontier(rets, cov_matrix, solver='CVXOPT').max_sharpe(0)
opt = pd.Series(opt) # OrderedDictionary
results['BL-maxSharpe'] = opt.copy()
# max return
opt = pypfopt.EfficientFrontier(
rets, cov_matrix, solver='CVXOPT'
).efficient_risk(
bench_is.pct_change().std().max()
) # max volatility setted as max from benchmarks
opt = pd.Series(opt) # OrderedDictionary
results['BL-maxReturn'] = opt.copy()
# Volatility minimization
opt = pypfopt.EfficientFrontier(rets, cov_matrix, solver='CVXOPT').efficient_return(
pypfopt.expected_returns.mean_historical_return(
bench, frequency=63).max()
)
opt = pd.Series(opt) # OrderedDictionary
results['BL-minVolatility'] = opt.copy()
# return implied weights
delta = pypfopt.black_litterman.market_implied_risk_aversion(
df_is, risk_free_rate=0)
bl.bl_weights(delta.values)
opt = bl.clean_weights()
opt = pd.Series(opt) # OrderedDictionary
# there are also negative weights, we set them for 0
opt[opt<0] = 0
# the new weights should sum into 1
opt /= opt.sum()
results['BL-retImplied'] = opt.copy()
```

### Critical Line Algorithm

As described it is just another Markowitz’s algorithm from efficient frontier theory. For application we will use the mlfinlab package. The usage is very straightforward.

```
cla = mlfinlab.portfolio_optimization.CriticalLineAlgorithm(
weight_bounds=(0,1))
# Maximum Sharpe Solution
cla.allocate(asset_prices=df_is, solution='max_sharpe')
# from this case the result is pd.DataFrame of weights
results['CLA_maxSharpe'] = cla.weights.loc[0]
# Minimum Variance Solution
cla = mlfinlab.portfolio_optimization.CriticalLineAlgorithm(
weight_bounds=(0,1)) # reinicilization of CLA
cla.allocate(asset_prices=df_is, solution='min_volatility')
results['CLA_minVolatility'] = cla.weights.loc[0]
# Turning points
cla = mlfinlab.portfolio_optimization.CriticalLineAlgorithm(
weight_bounds=(0,1)) # reinicilization of CLA
cla.allocate(asset_prices=df_is,solution='cla_turning_points')
# very time there is different number of turning points, so we choose 5 uniformly by setting numpy arange from 0 to length of turning points, with step size of ⅕ of the length
for i,j in enumerate(
np.arange(0, len(cla.weights), int(len(cla.weights) / 5))
):
# i is for new turning point number, j is for real position from CLA allocation
results['CLA_turnPoint-{}'.format(i + 1)] = cla.weights.iloc[j]
```

### Hierarchical Risk Parity

Here we go for new methodologies from Marus Lopez De Prado. Usage is also very simple, but it is important to know what we are doing. In this portfolio optimizations, we don’t invest in stocks with allocation less than 1% of capital (we are not huge funds with billions of dollars, but we are allocating 100k of investments). From my observations, HRP divides our stock universe into many small investments, so I decided to do the HRP in two steps.

- Firstly we do HRP on all stocks we use (uncorrelated top 200 according to dollar volume); the results are the weights.
- We get rid of stocks with less than 0.5% allocation and do the new HRP allocation with the new given universe.

Note that you can make your own portfolio corrections like this or test different methodologies to get rid of too low weights. At the end of the loop we will do a similar thing for all the results.

```
opt = mlfinlab.portfolio_optimization.HierarchicalRiskParity()
opt.allocate(df_is.columns, df_is)
# resulting weights are again in pd.DataFrame
res_hrp = opt.weights.loc[0]
# we get rid of weights smaller than 0.5%
res_hrp = res_hrp[res_hrp>0.005]
# new allocation with new universe
opt.allocate(res_hrp.index, df_is[res_hrp.index])
# final results
results['HRP'] = opt.weights.loc[0]
```

### Hierarchical Equal Risk Contribution

```
opt = mlfinlab.portfolio_optimization.HierarchicalEqualRiskContribution()
opt.allocate(res_hrp.index, df_is[res_hrp.index])
results['HERC'] = opt.weights.loc[0]
```

### Nested Clustered optimization

```
opt = mlfinlab.portfolio_optimization.clustering.()
# Convex Optimization solution
cvo = opt.allocate_cvo(cov=cov_matrix)
# resulting DataFrame is different than in previous methods
cvo = pd.Series(cvo[:,0], index=df_is.columns)
# Nested Clustered Optimization solution, you can choose maximum number of cluster, do not use high numbers when your universe is small
nco = opt.allocate_nco(cov=cov_matrix, max_num_clusters=25)
nco = pd.Series(nco[:,0], index=df_is.columns)
# saving the results
results['NCO-cvo'] = cvo
results['NCO-nco'] = nco
# Multiple simulations of NCO and SVO by Monte Carlo optimization - just 1 simulation because of really long computation time
w_cvo, w_nco = opt.allocate_mcos(
df_is.pct_change().mean().values, # here we use average historical daily return
cov_matrix,
len(df_is),
1, # number of MCMC
min_var_portf=False # if False we use Sharpe maximization
)
# saving the results
w_nco = w_nco.loc[0]
w_nco.index = df_is.columns
w_cvo = w_cvo.loc[0]
w_cvo.index = df_is.columns
# again we use only positive weights and scale them to sum to 1
results['NCO-MC-cvo_sharpe'] = w_cvo[w_cvo>1]/w_cvo[w_cvo>1].sum()
results['NCO-MC-nco_sharpe'] = w_nco[w_nco>1]/w_nco[w_nco>1].sum()
w_cvo, w_nco = opt.allocate_mcos(
df_is.pct_change().mean().values, # here we use average historical daily return
cov_matrix,
len(df_is),
1, # number of MCMC
min_var_portf=False # we use Volatility minimization
)
w_nco = w_nco.loc[0]
w_nco.index = df_is.columns
w_cvo = w_cvo.loc[0]
w_cvo.index = df_is.columns
results['NCO-MC-cvo_volatility'] = w_cvo[w_cvo>1]/w_cvo[w_cvo>1].sum()
results['NCO-MC-nco_volatility'] = w_nco[w_nco>1]/w_nco[w_nco>1].sum()
```

### Saving the out-of-sample equities

When for the given loop the optimization is finished we prepare the portfolios and calculate out-of-sample performance.

```
for key in results.keys():
# use the weights, take only weights bigger than 0.9%
to_allocate = results[key].copy()
to_allocate = to_allocate[to_allocate>0.009]
# recalculate the weights
to_allocate = to_allocate/to_allocate.sum()
# it is very possible that still newly calculated weights are still less than 0.9%, so we repeat the process once more - yes very primitive technique but functional
to_allocate = to_allocate[to_allocate>0.009]
to_allocate = to_allocate/to_allocate.sum()
# prepare OOS data - forward fill explained after this codeblock
rs = df_oos[to_allocate.index].copy().fillna(method='ffill')
# make relative stock equities according to the first day, the first value will be always 1 (that index is repeated in IS, because that day we have still return from previous portfolio
rs = rs/rs.iloc[0]
# by dot product we have portfolio of stocks with given weights (we add costs later)
rs = rs.dot(to_allocate)
# save the portfolio
if key not in allocation.keys():
# if it is first loop, we simply save the portfolio
allocation[key] = rs.copy()
else:
# when previous performance exists we just append continuation of portfolio but not starting with value 1 but the last value of previous performance, that's why we use relative equity, because it is easily scalable just by product
allocation[key] = allocation[key].append(
rs*allocation[key].iloc[-1])
```

Why do I use the fillna method, forward fill? Simply, all our stocks are highly tradable. During the period, there were only a few stocks whose data ‘finished’. Those data didn’t finish because of bankruptcy, but an acquisition of that stock by another company (purchase). So you didn’t lose the money.

Historically, it is hard to find the acquisitions automatically, so we would instead use that the price didn’t move for the remaining time. I show this example because most of you do not have access to highly professional data to see everything.

Indeed the publicly-traded company may go bankrupt. It sometimes happens that there are some gaps in data and the stock price continues; there can be some error in collecting the data or some pause in trading because of some technical issues on the exchange (yes, it can also happen, but it is not common).

Having a double index with the same value during the re-optimization day is just for practical purposes. We didn’t apply costs yet and also didn’t save the dates of re-optimization. And we don’t need to have them because every time there is a double index, there is re-optimization.

```
# we get reoptimization dates where is double index, it is same the date for all models
change = allocation['opt_maxReturn'].resample('B').count()
change = change[change>1].index
# rewrite results to DataFrame
allocation = pd.concat(allocation.values(), keys=allocation.keys())
# models or different optimizations are added to multi-index, so we put model into column and use pivot_table to create DataFrame where index is date and columns are equities of given portfolios
allocation = allocation.reset_index(level=0)
allocation.columns = ['model','equity']
# during pivoting the double index disappears, but for values it is OK, because in both same dates every equity has the same values
allocation = allocation.pivot_table(index='date', columns='model',values='equity')
# calculating costs, since the portfolio is in relative numbers it is very easy to just subtract the costs, Series of costs * see the note after the codeblock
costs = pd.Series(0, index=allocation.index)
costs.iloc[0] = 0.0005 # the fist to buy is 0.05% of the portfolio value, commissions + slippage
costs.loc[change] = 0.001 # during reoptimization we close and open position, so 0.1%
# to subtract all cost we have to do cumulative sum of costs, the number will change only during the reoptimization period,
net_port = allocation.sub(costs.cumsum(),axis='index')
# download benchmarks' prices
from alpha_vantage.timeseries import TimeSeries
ts = TimeSeries(key='yourAPIkey',
output_format='pandas',
indexing_type='date')
# download or if you have it saved in your data use saved data
qqq = ts.get_daily_adjusted('QQQ','full')[0]['5. adjusted close'].reindex(allocation.index)
spy = ts.get_daily_adjusted('SPY','full')[0]['5. adjusted close'].reindex(allocation.index)
# create relative equities
qqq /= qqq.iloc[0]
spy /= spy.iloc[0]
# for investing in ETF there is the cost of first buy (commission) and also yearly fee, which is 0.0945% for SPY and 0.2% for QQQ, it is called the expense ratio, but it is already calculated in the price of ETF
# we need to subtract only costs of buying the ETFs
qqq -= 0.0005
spy -= 0.0005
#add equities to our portfolio results
net_port['SPY'] = spy
net_port['QQQ'] = qqq
# everything is ready, you can plot data and do analyses on equities
```

We just simply subtracted percentage commissions from the relative equities. This methodology is not correct because after paying the fees, we have slightly less money to invest, so our real returns are slightly lower.

This difference in a short time is invisible. Still, it is sufficient for our purposes (in many articles about portfolio optimization, there are no fees adjusted, and that can make a big difference).

### Calculating the metrics

At the end of each article, you could find the table with some basic metrics; here, you can see how to calculate it.

```
results = pd.DataFrame()
# calculate annual return of portfolios - lambda over equities - last value / first value - 1
annual_ret = net_port.resample('Y').apply(lambda x: x.iloc[-1]/x.iloc[0] -1)
annual_ret = annual_ret[1:].mean()
results['Annual Return'] = annual_ret
# drawdown in python is super easy
def drawdown(vec): # percentual drawdown
return (vec - np.maximum.accumulate(vec))/np.maximum.accumulate(vec)
results['Max Drawdown'] = drawdown(net_port).min()
# Sharpe Ratio from mlfinlab, note that we cannot use equity but daily returns
results['Sharpe Ratio'] = mlfinlab.backtest_statistics.sharpe_ratio(net_port.pct_change())
# historical volatility is simple standard deviation of percentage returns, multiplied to have yearly volatility
results['Hist. Volatility'] = net_port.pct_change().std() * np.sqrt(252)
```

### Plotting

```
from matplotlib import pyplot as plt
import seaborn as sns
sns.set()
plt.rcParams["font.family"] = "serif"
sns.set_palette(sns.color_palette("bright"))
# just NCO for quick example
net_port[['SPY', 'QQQ', 'NCO-MC-cvo_sharpe', 'NCO-MC-nco_sharpe',
'NCO-cvo', 'NCO-nco']].plot(
lw=1, alpha=0.7, figsize=(550/96,370/96)
)
plt.xlabel('')
plt.legend(fontsize=8)
plt.title('Portfolios: out-of-sample walk forward, Nested Clustered Optimization',
fontsize=10)
plt.savefig('/yourPATH/equities_NCO.pdf')
```