WHAT IS STOCK SPLIT?
It is normal in financial markets that companies split their stocks (or merges – reverse stock splits). A stock split is simply dividing one stock into more without decreasing the company’s market capitalization. It usually happens when stocks are growing for a long time, and prices make particular stock not affordable for smaller investors. Usually, stock splits are positive in the longer term. A reverse stock split is when a company wants to decrease the number of shares.

EXAMPLE OF STOCK SPLITTING ON APPLE stock
A simple example of a stock split was Apple [AAPL] stock split Aug 31, 2020, of 4-for-1, which means if you owned 10 AAPL stocks before Aug 31, 2020, after you had 40 stocks. When capitalization is not decreased, the nominal price for one stock has to reduce. Dealing with splits is very important when backtesting strategies. A 75% drop in price was not the drop, but the split – this can confuse your strategy. You also know the adjusted close prices, which are adjusted mostly for stock splits. But it is not enough; a lot of time, your algorithm also uses open, high, low, and volume. Also, when adjusting stocks and using dividends in your backtest, you have to adjust dividends.
Adjusting is very easy. Let’s look at an AAPL example. We download adjusted data from Alpha Vantage (they consist of stock splits and dividends) and look at historical AAPL splits.
import pandas as pd
from alpha_vantage.timeseries import TimeSeries
ts = TimeSeries(key='YourAPIkey', #claim on alphavantage.co
output_format='pandas',
indexing_type='date')
daily = ts.get_daily_adjusted('AAPL', 'full')[0].sort_index()
daily.columns = [x[3:] for x in daily.columns]
daily[daily['split coefficient'] != 1]
open | high | low | close | adjusted close | volume | dividend amount | split coefficient | |
---|---|---|---|---|---|---|---|---|
date | ||||||||
2000-06-21 | 50.50 | 56.94 | 50.31 | 55.63 | 0.8583 | 4375000.0 | 0.0 | 2.0 |
2005-02-28 | 44.68 | 45.14 | 43.96 | 44.86 | 1.3843 | 11635900.0 | 0.0 | 2.0 |
2014-06-09 | 92.70 | 93.88 | 91.75 | 93.70 | 21.1767 | 75414997.0 | 0.0 | 7.0 |
2020-08-31 | 127.58 | 131.00 | 126.00 | 129.04 | 129.0400 | 223505733.0 | 0.0 | 4.0 |
In the table, we can see 4 stock splits for AAPL data from the year 2000. In this case, OHLCV is not adjusted, and adjusted close has different values than close. The split coefficient is also visible and usable. The mathematical calculation is straightforward, look at prices around Aug 31, 2020.
daily[daily.index>='2020-08-28'][:3]
open | high | low | close | adjusted close | volume | dividend amount | split coefficient | |
---|---|---|---|---|---|---|---|---|
date | ||||||||
2020-08-28 | 504.05 | 505.77 | 498.31 | 499.23 | 124.8075 | 46907479.0 | 0.0 | 1.0 |
2020-08-31 | 127.58 | 131.00 | 126.00 | 129.04 | 129.0400 | 223505733.0 | 0.0 | 4.0 |
2020-09-01 | 132.76 | 134.80 | 130.53 | 134.18 | 134.1800 | 152470142.0 | 0.0 | 1.0 |
We want to adjust the prices and also the volume. When we have a 4-for-1 split, each stock is multiplicated by 4, so the volume of previous days should be just multiplied by 4. Aug 28, the close price was 499.23 for 1 stock. When adjusting, we have to divide the price by the split coefficient, 499.23/4 = 124.8075 (adjusted close). It’s very simple and fast in pandas. Notice that when there are more splits in history, we have to account for them. We divide by 4 until we come to another split in history, then we have to account for it. So the prices before Jun 9, 2014, should be divided by 7 and 4, so by 28.
def calculate_adjusted(df, dividends=False):
# we will go from today to the past
new = df.sort_index(ascending=False)
split_coef = new['split coefficient'].shift(1
).fillna(1).cumprod()
for col in ['open', 'high', 'low', 'close']:
new['adj_' + col] = new[col] / split_coef
new['adj_volume'] = split_coef * new['volume']
if dividends:
new['adj_dividends'] = new['dividend amount'] / split_coef
return new.sort_index(ascending=True)
We sort values descending (according to DatetimeIndex), and the split coefficient is multiplicated through the history. We shift the value for 1 day into the history (+1 because we have descending dates) and make cumulative products to consider other splits. Then we divide the prices by that value but multiplicate the volume.
Let’s download minute data and adjust them. Minute data do not have dividends, so we will not adjust dividends here (function in default doesn’t adjust dividends). A simple query to download minute data in csv format (also for historical minute data, see the Alpha Vantage documentation). You can notice that there is slice=year{}month{} in the query, which we format as year1month1 to get the most recent data. Alpha Vantage gives maximum one month data per one query and up to 2 years of history. So the previous month would be year1month2, this month in the previous year would be year2month1, and so on. Note that depending on the date you are downloading the data, the event could be older than one month. You can check if the prices are adjusted correctly by downloading adjusted prices by setting adjusted=true.
download_file = 'https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY_EXTENDED&function=TIME_SERIES_INTRADAY_EXTENDED&symbol={}&interval=1min&slice=year{}month{}&adjusted=false&apikey=YourAPIkey' # change to your API key
df = pd.read_csv(download_file.format('AAPL', 1, 1))
# note that time of making this query was after the split, Sep 2020
df = df.set_index('time')
df = df.sort_index()
df[df.index < '2020-08-31 04:03:00'].tail()
We can see the split in the table when the price decreased from 501.98 USD to 128 USD, which was pre-market open on Aug 31. But our minute data don’t have a split coefficient, so we merge it with the coefficient (which is on daily data, so it is at time 00:00:00).
df = df.merge(daily[['split coefficient']], left_index=True,
right_index=True, how='outer')
# fill split NaN values with 1 (because of cumulative product)
df['split coefficient'] = df['split coefficient'].fillna(1)
df[df.index < '2020-08-31 04:03:00'].tail()
open | high | low | close | volume | split coefficient | |
---|---|---|---|---|---|---|
2020-08-28 19:59:00 | 501.73 | 501.73 | 501.65 | 501.65 | 1683.0 | 1.0 |
2020-08-28 20:00:00 | 501.80 | 502.00 | 501.71 | 501.98 | 8239.0 | 1.0 |
2020-08-31 00:00:00 | NaN | NaN | NaN | NaN | NaN | 4.0 |
2020-08-31 04:01:00 | 128.00 | 137.29 | 125.60 | 126.00 | 61664.0 | 1.0 |
2020-08-31 04:02:00 | 126.92 | 127.00 | 126.00 | 126.50 | 16918.0 | 1.0 |
Apply our function and have a look at the result.
dfa = calculate_adjusted(df)
dfa[dfa.index < '2020-08-31 04:03:00'].tail()
open | high | low | close | volume | split coefficient | adj_open | adj_high | adj_low | adj_close | adj_volume | |
---|---|---|---|---|---|---|---|---|---|---|---|
2020-08-28 19:59:00 | 501.73 | 501.73 | 501.65 | 501.65 | 1683.0 | 1.0 | 125.4325 | 125.4325 | 125.4125 | 125.4125 | 6732.0 |
2020-08-28 20:00:00 | 501.80 | 502.00 | 501.71 | 501.98 | 8239.0 | 1.0 | 125.4500 | 125.5000 | 125.4275 | 125.4950 | 32956.0 |
2020-08-31 00:00:00 | NaN | NaN | NaN | NaN | NaN | 4.0 | NaN | NaN | NaN | NaN | NaN |
2020-08-31 04:01:00 | 128.00 | 137.29 | 125.60 | 126.00 | 61664.0 | 1.0 | 128.0000 | 137.2900 | 125.6000 | 126.0000 | 61664.0 |
2020-08-31 04:02:00 | 126.92 | 127.00 | 126.00 | 126.50 | 16918.0 | 1.0 | 126.9200 | 127.0000 | 126.0000 | 126.5000 | 16918.0 |
Adjusted prices are prepared, and you can fully use them in your backtests. We used only built-in pandas functions, so it calculates reasonably fast for a more extended history of one-minute data or even one-second data. Do your price adjusting is very important if you store the data in the original form; most brokers also provide original data without splitting dates and coefficients.