Testing The Strategy Robustness
Developing a trading strategy is a very complex process, with many steps and traps. You can read about different essential stages in our other articles.
This article is dedicated to testing if the strategy is not over-fitted to bar size and session time. This kind of testing is crucial, obviously, when the strategy uses some price patterns. Using price patterns that can be logical and straightforward often leads to overfitting on specific data (overfitting means that the strategy works exceptionally well on the dataset in which it was developed but fails on unseen data).
The strategy presented in this article has a long out of sample on which it worked exceptionally well. Before we applied strategy into live trading, we wanted to ensure that the used pattern is not functional only with a particular setting. The strategy uses E-mini Russell 2000 Index futures, RTY, the mid-cap companies of the US market.
To give better insight into the strategy. The strategy was developed in EasyLanguage on the TradeStation platform. Then it was reprogrammed to python so we could see that the strategy works correctly and has the same results.
This double-check is very important because, thanks to it, we discovered many issues that made many strategies look great, but there were errors or incorrect calculations inside the bars.
We provide the equity curve of the backtest with daily returns on the out of sample. That is from 2015 until August 2020, with all costs (commissions and slippage). In the backtest, we use one contract of RTY, and equity is constructed from daily net returns.
The strategy does not beat the market during crazy up-trends but has lower drawdowns and a smoother equity curve. The original model uses 65-minute bars and trading sessions 08:45 am – 3:15 pm.
Backtests on different bars and shifted sessions
To ensure that this price pattern is not total overfit for given settings, we would like to know how the equity changes if we use slightly different bars and sessions.
Most important is local randomization, where we look only at small changes (that’s why local or neighborhood). The expected effect of approving the strategy is not a big change in performance. We used different bar lengths (60, 63, 65, 68, 70) and eight different sessions, each having 7 minutes shifts.
So totally, we created 40 different basic settings. Since futures trade all day, we cut the sessions to fit into classical market hours 9 am – 4 pm. These shifts can be explained easily; we have:
- 65 min bar, shift = 0 min, session: 9:45-15:10 (9:45 is end time of the first bar)
- 65 min bar, shift = 7 min, session: 9:52-15:17
- 65 min bar, shift = 14 min, session: 9:59-15:24
- 65 min bar, shift = 21 min, session: 9:01-15:31
For each setting, we get a different session. By the way, don’t be afraid of using different bar lengths and sessions. What works on 5 min bars should work on 4 or 6 min bars, too, and also shifted sessions. If not, it is just an over-fit.
Chart of backtested equities of different bars’ length and sessions
After looking at the equities, without doing any statistical analysis, this strategy is quite robust. That means a tiny change in the sessions and bars does not change the result significantly.
We can see that in some cases, the overall performance is quite different. Returns are still more than 60% correlated, and then the equity curves have more or less the same trace. We could not expect having almost identical equities, but this stable distribution is a good sign.
We have seen strategies where a small change in bars or sessions made perfectly profitable strategies unprofitable; that is a behavior we don’t want to see. In the chart, we created a color map that goes from light colors for 60 min bars (yellow), through 63 (orange), 65 (red), to 68 (purple), and finally 70 (black).
We can see that for some bar settings, the session setting can also make a huge difference, but the most stable bars for sessions are the red and purple ones (65 and 68).
The reason why changing the time session makes the difference is that with different sessions, we get different daily bars, and inside the algorithm, we use the values for actual highs/lows of the day and close price from the previous day.
The same analysis on the broader neighborhood
We also created a more extensive test, where we used more bars (50, 60, 70, 80, 90) with four different sessions, each shifted by 14 minutes. To our surprise, even using a broader neighborhood works very well.
We use similar coloring; the darker the color, the longer the bar, so 90-minute bars are too much because they cause the highest variation with changing sessions.
Actually, by this analysis, we are proving that the price pattern used is very robust, and that is what we want to see from an algo-trading strategy.
Description of used data
In the analysis, we used 1-minute data generated from Tradestation as a continuous contract (with automatic rollings provided by Tradestation). From these data, we created our bars and sessions.
Unfortunately, the saying “the only correct data are those you collect yourself” is always correct (or construct everything yourself from high-quality tick data). These data have some errors, like sometimes missing some parts of the day.
Also, because of the automated rolling by Tradestation, we can see on the plot a few more significant drops or gains within one day, which were reversed another day.
Thinking of rolling the contracts is very important when trading futures. In this case, these data mistakes, which sometimes occur during the rolling, do not affect the overall performance.