Skip to content

ENH: merge_asof threshold minimum #61164

@Lituchy

Description

@Lituchy

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I often find myself using the merge_asof function on time series data. The tolerance and allow_exact_matches fields are very useful in filtering data, but it would be very useful to have more granular control over this tolerance. Being able to supply a minimum tolerance in addition to the currently existing maximum tolerance would be very beneficial in giving the user more control over this function.

Feature Description

A current example for this function is the following:

We only asof within 10ms between the quote time and the trade time
and we exclude exact matches on time. However prior data will
propagate forward

>>> pd.merge_asof( ... trades, ... quotes, ... on="time", ... by="ticker", ... tolerance=pd.Timedelta("10ms"), ... allow_exact_matches=False ... ) time ticker price quantity bid ask 0 2016-05-25 13:30:00.023 MSFT 51.95 75 NaN NaN 1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98 2 2016-05-25 13:30:00.048 GOOG 720.77 100 NaN NaN 3 2016-05-25 13:30:00.048 GOOG 720.92 100 NaN NaN 4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN 

I am envisining a version where we could have

We only asof within 10ms between the quote time and the trade time but more than 2ms between the quote time and the trade time and we exclude exact matches on time. However prior data will propagate forward

>>> pd.merge_asof( ... trades, ... quotes, ... on="time", ... by="ticker", ... tolerance=pd.Timedelta("10ms"), ... mininum_tolerance=pd.Timedelta("2ms"), ... allow_exact_matches=False ... ) time ticker price quantity bid ask 0 2016-05-25 13:30:00.023 MSFT 51.95 75 NaN NaN 1 2016-05-25 13:30:00.038 MSFT 51.95 155 51.97 51.98 2 2016-05-25 13:30:00.048 GOOG 720.77 100 NaN NaN 3 2016-05-25 13:30:00.048 GOOG 720.92 100 NaN NaN 4 2016-05-25 13:30:00.048 AAPL 98.00 100 NaN NaN 

Alternative Solutions

Another solution to this problem to augment the currently existing tolerance argument to accept a single datetimelike object, or a tuple of datetmelike objects which could act as a lower and upper bound, respectively.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds DiscussionRequires discussion from core team before further actionNeeds InfoClarification about behavior needed to assess issueReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions