Why testing against randomness is crucial in trading?
We will show you a simple example of randomizing one of our strategies. All the time, we do two types of these tests, depending on the complexity of a strategy.
- We do the first test if the strategy trades only one market, it is easy to use the actual distribution of trades and randomize just dates (this is our case for this example).
- The second test we do if the strategy is more complex or trades many markets (we consider each stock price as one market). Then we calculate mathematical properties of the trades’ distribution from the backtest. We create random samples, so the resulting random trades are from the same distribution. Not in returns but the number of trades, maximum, minimum and average trade duration, the same proportion of long/short positions, and so on. Let’s get deeper into our strategy.
The strategy that we will show you for this analysis is a trend following strategy on e-mini Russell 2000 futures, RTY. It uses price patterns on intraday bars, we use 65-minutes bars, and session 8:45 am – 3:15 pm (if you are interested, you can read about strategy robustness when it comes to changing bar lengths and sessions). Since this strategy uses only one time series, it is easy to construct random trades using the exact properties in Python. The out of sample for this strategy is from 2015, so our analysis will be too.
Basic features of backtested trades:
- Number of trades: 274, longs 118, shorts 156
- Average trade duration: 28 bars or 7 days (calendar)
- Median trade duration: 8 bars or 3 days (calendar)
- Short-term (up to 10 bars): 158 trades, only 18 longs, 140 shorts
- Medium-term trades (11-100 bars): 94 trades, 78 longs and only 16 shorts
- Long-term trades (101 bars and more): 22 trades, all longs
It is important to look at how shorts and longs are distributed over trade lengths. Because of long bias on the stock market, we can see that long termed shorts are not traded with this strategy. If you don’t count this fact in the randomization, you are comparing something different, and you could come to false conclusions.
We will do the type 1 randomization and use all the trades how they are, we only shuffle the start and end dates. We calculate only the trades (daily returns are not necessary, so we will save some computation time), the trades include all costs and slippage, and we trade 1 contract of RTY (E-mini Russell 2000).
The speed of computation depends on strategy complexity, code, and ability to program it, so it works faster. In this case, because strategy does not have many trades we do for cycles, which are very easy to program without making many errors, the computation time of around 2000 random backtests was about 10 minutes.
Of course, it depends on your PC; for these short analyses, I use my laptop which has Intel’s 2.9GHz CPU with 4 cores (8 threads), 16GB of RAM with another 32GB on the swap. For these simple calculations I don’t use a graphic card or my desktop PC. The faster form of computation comes when you write your code vectorized (with NumPy arrays, Pandas tables, or Nvidia RAPIDS on GPU when using bigger data).
For example, creating the randomization test for one of our trading strategies, which had around 5k signals daily, compared to around 45k of all daily opportunities, was very computationally expensive. I didn’t use a GPU for comparison, but a fully vectorized code calculated just one month’s performance for about 1 hour.
Let’s have a look at results; we plot equity curves for backtest and all random strategies. Usually, the results are straightly visible from the plot. Sometimes you need the help of some statistics, like what percentiles our strategy is in.
This analysis is simple, has straightforward results but sometimes can be hard to compute. We usually want to see an overall beat of randomness on the plot with percentile over 90 % or 95 % to have it statistically significant. When we look inside, it is good to have at least 80 % percentiles to beat given years; if overall is doing exceptionally well, we can accept even lower percentiles for given years. We can see that year 2019 was weaker than the others.
Note that this is not MCMC (Markov Chain Monte Carlo), where we create random prices with the same distribution and test strategy on it that is a different approach. MCMC usually uses the normality of logarithmic returns, so the log-normality of prices. I don’t use this kind of test.
Firstly, log-normality is not describing the market correctly; secondly, I don’t want to test strategy if it is successful on randomly generated prices; that is useless. Even with a lot of random noise, the market is not entirely random (if yes, there would not exist successful trading companies in the long term). We believe the market runs on behavioral aspects of people, algorithms, and fundamentals, and news.
Randomization tests serve as a confirmation that you beat sheer luck strategies. Thus, you have the real edge in the market. It is necessary to apply these tests on true out-of-sample data. On in-sample data where the strategy is fitted, the results can be too optimistic. Be careful with that.