DEV Community

ryantjo
ryantjo

Posted on

Resampling Market Tick Data

Tick Data

Tick data is the stream of individual trades executed on an exchange (usually a stock exchange) with each ‘tick' representing a single trade.

Typically each tick contains a timestamp, trade price, volume and the exchange the trade was executed on. For example, below is a series of ticks for Apple AAPL:

2021-02-01 04:00:02:533,133.65,1,ARCX 2021-02-01 04:00:02:533,133.7,4,ARCX 2021-02-01 04:00:03:713,133.71,50,XNGS 2021-02-01 04:00:03:713,134,50,XNGS 2021-02-01 04:00:03:713,133.7,50,ARCX 2021-02-01 04:00:03:932,134,200,XNGS 
Enter fullscreen mode Exit fullscreen mode

(note the timestamp includes milliseconds)

Resampling Tick Data

Tick data is the highest resolution form of market data and can give a lot of insight into a market’s microstructure over very short timeframes. However, it is extremely large in size and the sheer volume of the data can make it unwieldy for analysis for longer timeframe analysis (such as over 1 week). For analysis of longer timeframes, intraday bars (or ‘candles’) are the preferred data format.

A bar, is a single data-point for a timeframe which includes the open, close, high and low prices. For example: below is a series of 1-minute bars for Apple AAPL:

2021-01-04 09:30:00,133.52,133.612,132.95,133.15,2328651 2021-01-04 09:31:00,133.13,133.45,133.08,133.335,486524 2021-01-04 09:32:00,133.345,133.36,132.99,133.11,471947 2021-01-04 09:33:00,133.11,133.15,132.71,132.746,477518 
Enter fullscreen mode Exit fullscreen mode

(format : timestamp, high, low, open, close, volume)

Therefore, a common requirement is to resample tick data into intraday bars. Fortunately, the Pandas Python library has several inbuilt functions to perform this task very efficiently.

Worked Example

Starting with a tick dataset for AAPL, you can get a sample tick dataset for AAPL at TickHistory.

If you do not already have Python and Pandas installed, a simple solution is to install Anaconda and then use Anaconda to install Pandas.

Once in Python, import the Pandas package

import pandas as pd 
Enter fullscreen mode Exit fullscreen mode

Next, load the data into a dataframe

aapl_df = pd.read_csv('AAPL_2020_10.txt', names=['timestamp', 'trade_price', 'volume', 'exchange'], index_col=0, parse_dates=True) 
Enter fullscreen mode Exit fullscreen mode

This statement reads the csv formatted file (note that it can also read directly from a zip file), names the columns, parses the timestamp into a date_time, and finally sets the index to the timestamp column.

Once the data has been loaded, we can quickly review the dataframe to ensure it has correctly loaded using the head() function:

aapl_df.head() 
Enter fullscreen mode Exit fullscreen mode

This should output :

Timestamp trade_price volume exchange 2021-02-01 04:00:02:533 133.65 1 ARCX 2021-02-01 04:00:02:533 133.7 4 ARCX 2021-02-01 04:00:03:713 133.71 50 XNGS 2021-02-01 04:00:03:713 134 50 XNGS 2021-02-01 04:00:03:713 133.7 50 ARCX 
Enter fullscreen mode Exit fullscreen mode

To resample the data, we will use the Pandas resample() function. This needs to be repeated for each of the high, low, open, close, volume datapoints in the bar:

aapl_1hour_open_df = aapl_df.resample("1H").agg({'trade_price': 'first’}) aapl_1hour_high_df = aapl_df.resample("1H").agg({'trade_price': 'high’}) aapl_1hour_low_df = aapl_df.resample("1H").agg({'trade_price': 'low’}) aapl_1hour_close_df = aapl_df.resample("1H").agg({'trade_price': 'last’}) aapl_1hour_volume_df = aapl_df.resample("1H").agg({'volume': 'sum’}) 
Enter fullscreen mode Exit fullscreen mode

Now we have a separate dataframe for each of the open, high, low, close, volume datapoints. We now need to combine these into a single dataframe using the Pandas concat() function:

aapl_1hour_df =pd.concat([aapl_1hour_open_df, aapl_1hour_high_df, aapl_1hour_low_df, aapl_1hour_close_df, aapl_1hour_volume_df], axis=1, keys=['open', 'high', 'low', 'close', 'volume']) 
Enter fullscreen mode Exit fullscreen mode

Finally, we need to remove the zero volume bars as the resample function will include a bar for every timeframe during the 24-hour day and not just the trading hours. This can be done by filtering for volumes above 0.

aapl_1hour_df = aapl_1hour_df[aapl_1hour_df.volume > 0] 
Enter fullscreen mode Exit fullscreen mode

The resampling from ticks to 1-hour bars is now complete and the file can be created using the Pandas to_csv() function:

aapl_1hour_df.to_csv('file_path') 
Enter fullscreen mode Exit fullscreen mode

Top comments (0)