-
- Notifications
You must be signed in to change notification settings - Fork 19.2k
Description
-
I have searched the [pandas] tag on StackOverflow for similar questions.
-
I have asked my usage related question on StackOverflow.
I had a hard time understanding how df.rolling works when df is indexed by a MultiIndex
This is an example data frame:
import pandas as pd idx = pd.MultiIndex.from_product( [pd.date_range("2020-01-01", "2020-1-10"), ["a", "b"]], names=["date", "obs"], ) df = pd.DataFrame(index=idx) df['c1'] = range(len(df)) print(df)which outputs
c1 date obs 2020-01-01 a 0 b 1 2020-01-02 a 2 b 3 2020-01-03 a 4 b 5 2020-01-04 a 6 b 7 2020-01-05 a 8 b 9 2020-01-06 a 10 b 11 2020-01-07 a 12 b 13 2020-01-08 a 14 b 15 2020-01-09 a 16 b 17 2020-01-10 a 18 b 19 Now I want to apply a rolling window on the date level, keeping the obs level separate.
I tried with no success obvious and simple (least surprise) commands like
df.rolling("7d", index="date")ordf.rolling("7d", on="date")
but finally the desired result is obtained by
df_r = df.groupby(by="obs", group_keys=False).rolling( "7d", on=df.index.levels[0] ).mean().sort_index() print(df_r) which gives me the correct result:
c1 date obs 2020-01-01 a 0.0 b 1.0 2020-01-02 a 1.0 b 2.0 2020-01-03 a 2.0 b 3.0 2020-01-04 a 3.0 b 4.0 2020-01-05 a 4.0 b 5.0 2020-01-06 a 5.0 b 6.0 2020-01-07 a 6.0 b 7.0 2020-01-08 a 8.0 b 9.0 2020-01-09 a 10.0 b 11.0 2020-01-10 a 12.0 b 13.0 It seams to me that this should be a quite common situation, so I was wondering if there is a simpler way to obtain the same results. By the way my solution is not very robust, because there are hidden assumptions on how the objects returned by groupby are indexed, which do not necessarily hold for a generic data frame.
Moreover the doc of the on parameter in rolling was almost incomprehensible to me: I'm still wondering if my usage rolling( "7d", on=df.index.levels[0]) is the intended one or not.