Skip to content

Time-based .rolling() fails with .groupby() #13966

@chrisaycock

Description

@chrisaycock

Starting with this example:

df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8, 'B': np.arange(40)}) 

I can easily compute the rolling mean by identifier:

In [20]: df.groupby('A').rolling(4).B.mean() Out[20]: A 1 0 NaN 1 NaN 2 NaN 3 1.5 4 2.5 5 3.5 6 4.5 7 5.5 8 6.5 9 7.5 ... 2 30 28.5 31 29.5 3 32 NaN 33 NaN 34 NaN 35 33.5 36 34.5 37 35.5 38 36.5 39 37.5 Name: B, dtype: float64 

Now I want to add a timestamp column:

dates = pd.date_range(start='2016-01-01 09:30:00', periods=20, freq='s') df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8, 'B': np.concatenate((dates, dates)), 'C': np.arange(40)}) 

The timestamps are ordered within each identifier, but pandas complains:

In [25]: df.groupby('A').rolling('4s', on='B').C.mean() ... ValueError: B must be monotonic 

Re-sorting leads to a different error:

In [26]: df.sort_values('B', inplace=True) In [27]: df.groupby('A').rolling('4s', on='B').C.mean() ... ValueError: invalid on specified as B, must be a column (if DataFrame) or None 

But we know that these column names are valid:

n [28]: df.rolling('4s', on='B').C.mean() Out[28]: 0 0.000000 20 10.000000 1 7.000000 21 10.500000 2 8.800000 22 11.000000 3 9.857143 23 11.500000 4 10.857143 24 12.500000 ... 35 24.714286 15 23.500000 36 25.714286 16 24.500000 37 26.714286 17 25.500000 38 27.714286 18 26.500000 19 25.857143 39 27.500000 Name: C, dtype: float64 

It seems like a bug that time-based .rolling() does not work with .groupby().

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugGroupbyReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions