LEARN TIME SERIES IN PYTHON - Pandas
In this introductory article on time series analysis in Python, we will cover some simple working with prices in Pandas, especially OHLCV (open-high-low-close-volume). We will go through some basic plotting and constructing indicators or price patterns.
Along with advanced articles, where we analyze interesting stuff that expands your knowledge of trading or gives you new ideas for algo-trading, I will also show some practical things for beginners. As I already mentioned in the introductory article, you will best learn it by doing it. After going through some basic plotting and constructing some indicators or price patterns, you should build them yourself to get some practice as well.
Depending on what you’re coding, if you code the whole script and then run it, you use different software compared when you code interactively and see the results for your analyses immediately. I use Spyder for all analytical research when I analyze data, change code frequently, and see the results. On the other hand, for writing pure code, I use Atom. You can choose any software you like.
Let’s go practically through some stuff. First, we need to get some data. We will use Nvidia [NVDA], as stock for our analysis. Let’s plot the price and do some calculations. We can do fancy plots with OHLC bars or interactive plots with plotly, but we will cover this in another article. We do quick and straightforward plots with matplotlib during our analyses.
If you want to play with graphs, draw lines, do visual technical analysis, it is better to use a software for it, or TradingView, which is a perfect online tool for traders – they also have a lot of fundamental data. To continue I assume you have at least basic knowledge of Python.
If you don’t understand import pandas as pd, or how to use functions and classes of the library, please check our introductory article about Python.
Getting THE DATA
During my programming life, I used many data providers; for the stocks, in my opinion the best is Alpha Vantage. Next to daily and intraday prices, they also provide some fundamentals for each stock and forex data. They provide many technical indicators calculated from the OHLCV, so you don’t have to do it yourself. Let’s download Nvidia (NVDA) prices and plot them for the past 3 years.
import pandas as pd from alpha_vantage.timeseries import TimeSeries ts = TimeSeries(key='YourAPIkey', # claim your free API KEY at alphavantage.co output_format='pandas', indexing_type='date') prices = ts.get_daily_adjusted('NVDA', 'full').sort_index() prices.columns = [x[3:] for x in prices.columns] # names of columns contains unnecessary text
Note that we called for NVDA daily prices, we wanted full history, so the next parameter is set to ‘full‘. Functions of TimeSeries returns a tuple containing DataFrame in the first place and some information about the frame in the second place. Let’s look at the last 10 rows of prices with code prices.tail(10). I love stock data from Alpha Vantage because they also contain dividends and splits (we will cover how to adjust the prices yourself in another article). For simplicity, let’s use close prices only.
|date||open||high||low||close||adjusted close||volume||dividend amount||split coefficient|
Very simply, we can plot prices from Jan 2017 until the end of Aug 2020. The result of the code is shown plot – it is easy (note: we don’t need the x-axis label name, everybody knows it is a date).
import matplotlib.pyplot as plt prices[prices.index>'2017-01-01']['close'].plot(label='NVDA') plt.xlabel('') plt.legend() plt.show()
Now, let’s do a trend following strategy based on crossing the moving averages. Everyone knows this strategy, so making it with pandas could be more fun.
Trend Following with moving average crossover
For educational and practical purposes, let’s make this strategy. The definition of strategy is straightforward. We use two moving averages, one with longer (slow MA), another with a shorter (fast MA) window. When fast MA crosses above the slow MA we buy, else we sell. Note that this strategy is not good, because it is based only on lagged values, but it’s good enough for our purposes. When there is no trend, but oscillation, this strategy has horrible results. Let’s calculate 30 days slow MA and 10 days fast MA. We can calculate MA average with rolling function on pandas DataFrame, or Exponentially Weighted Moving Average with ewm function. After this function, which tells pandas to use a window of previous data, you have to say what function you want to apply to that data: mean, median, max, min, std, etc.
prices['fastEMA'] = prices['adjusted close'].ewm( span=10).mean() #.rolling(10).mean() prices['slowEMA'] = prices['adjusted close'].ewm( span=30).mean() #.rolling(30).mean()
Let’s have a look at the plot with slow and fast EMA. We have data from 2000, but because of the price growth from 2016, the overall plot is not very well readable, that’s why we use only the last few years when plotting. It looks like some trend changes were captured pretty well, but some not. For example, when there was a quick selloff because of pandemics in 2020.
The truth is that most of the time, you don’t have to create backtests for these simple strategies. There is a fast way to check if the idea is functional. For immediate results, we create a theoretical strategy which is in a long position when the fastEMA is over slowEMA, and in a short position when it is opposite. An important point! We work with daily data and close prices, so the information about today’s position is actually for tomorrow (preventing a look-ahead bias). Practically, we do it by shifting the time series by one day. Calculate if the position is long or short and convert it to values [-1,1] for easy usage.
prices['pos'] = (prices['fastEMA'] >= prices['slowEMA'])*2 - 1 prices['pos'] = prices['pos'].shift(1) #prevent look-ahead bias
In the first line, we have contrition to be in a long position, the output is True or False, and we can work with them as integers 1 or 0. Multiplying by 2 and then subtracting 1 will change values [0,1] into [-1,1], which is more natural for short and long positions. Just for a quick theoretical check of the strategy performance, we will forget about all costs and integer value of the volume. We will plot the cumulative returns of buy and hold vs. MA crossover trend following. Let’s calculate the percent change for daily price and the strategy (where calculate the return by multiplying with position direction for a given day).
prices['return'] = prices['adjusted close'].pct_change() prices['return_strategy'] = prices['return']*prices['pos']
After we have calculated percent changes, let’s re-create equities from percent returns and plot it, I also provide simple code for beginners:
(prices[prices.index>'2018-01-01']['return'] + 1 ).cumprod().plot(label='Buy & Hold') (prices[prices.index>'2018-01-01']['return_strategy'] + 1 ).cumprod().plot(label='Strategy') plt.legend() plt.xlabel('') plt.show()
Since we are in a time series the product is replaced by a cumulative product because we recursively calculate it for, until the beginning,. If we don’t multiplicate the plot by the first close value, the plot will start from 1
What looked functional in the first picture is not that good when we look at it more precisely. The blue line is simple cumulative return of holding Nvidia stock, while the orange one is the return on our strategy. The volatile uptrend in the first 8 months was not suitable for this strategy; then, the strategy caught a bit of downtrend. Again other smaller trends in the flat part were not caught properly. And finally, the extreme selloff because of COVID-19 pandemics showed that the lag of the moving average is just too big.
This was a quick theoretical equity done in 2 minutes of coding. We usually add costs (and slippage) each time moving averages cross and calculate the backtest with integer values, but because of bad theoretical results, it is not necessary. Another important note: this is not how we develop strategies; it is just a simple example of pandas usage. For a step by step approach for creating and testing the algo-trading strategy look at our other article.
Just playing with pandas, you can apply any functions on your data. We used a function that looks for a few previous values; it is called rolling. Important hint here: try to use as many pandas built-in functions as possible, they are optimized to work very fast on DataFrames. When you want to create some specific function that should apply to your data or rolling data, you can make it by lambda programming and using function apply. Or you can define your own function and input it instead of lambda function (better to define a function with NumPy arrays for speed performance). Notice that function apply is slower because it is for functions that are not built-in. The next example is just for better imagination.
is equivalent to (not in speed)
prices['close'].apply(lambda x: x.mean()),
prices['close'].apply(lambda x: x.sum() / x.count()).
Simple price patterns
With pandas you can create almost anything. Do you want to add some price patter? Like if the close price is in the upper part of the bar? Let’s say upper ¼ or in the last quartile. It would be best if you first thought about the logic (is high, is low andis close):
then code it.
prices['close_upper'] = ( (prices['close'] - prices['low']) / (prices['high'] - prices['low']) ) >= .75
Or are you looking for extreme long bars? Let’s say significantly longer than the last 10 bars? Easy, create the logic: so the extreme is when bar length is more than 2 standard deviations away from the mean (or outside of a Bollinger Band). Calculate MA and moving standard deviation for bar length (you can shift it one day if you don’t want the last extreme bar affecting mean and standard deviation). Finally, add the condition that today’s bar length is significantly far away from average.
prices['ave_bar_length'] = (prices['high'] - prices['low']).rolling(10).mean() prices['std_bar_length'] = (prices['high'] - prices['low']).rolling(10).std() prices['is_extreme_bar'] = ( (prices['high'] - prices['low']) >= (prices['ave_bar_length'] + 2*prices['std_bar_length']) )