Skip to content

Conversation

@jreback
Copy link
Contributor

@jreback jreback commented Nov 14, 2015

closes #10702
closes #9052
xref #4950, removal of depr

So this basically takes all of the pd.rolling_*,pd.expanding_*,pd.ewma_* routines and allows an object oriented interface, similar to groupby.

Some benefits:

  • nice tab completions on the Rolling/Expanding/EWM objects
  • much cleaner code internally
  • complete back compat, e.g. everything just works like it did
  • added a .agg/aggregate function, similar to groupby, where you can do multiple aggregations at once
  • added __getitem__ accessing, e.g. df.rolling(....)['A','B'].sum() for a nicer API
  • allows for much of API/ENH: master issue for pd.rolling_apply #8659 to be done very easily
  • fix for coercing Timedeltas properly
  • handling nuiscance (string) columns

Other:

  • along with window doc rewrite, fixed doc-strings for groupby/window to provide back-refs

ToDO:

  • I think that all of the doc-strings are correct, but need check
  • implement .agg
  • update API.rst, what's new
  • deprecate the pd.expanding_*,pd.rolling_*,pd.ewma_* interface as this is polluting the top-level namespace quite a bit
  • change the docs to use the new API
In [4]: df = DataFrame({'A' : range(5), 'B' : pd.timedelta_range('1 day',periods=5), 'C' : 'foo'}) In [5]: df.rolling(window=2).sum() Out[5]: A B C 0 NaN NaT foo 1 1 3 days foo 2 3 5 days foo 3 5 7 days foo 4 7 9 days foo In [6]: df.rolling(window=2)['A','C'].sum() Out[6]: A C 0 NaN foo 1 1 foo 2 3 foo 3 5 foo 4 7 foo In [2]: r = df.rolling(window=3) In [3]: r. r.A r.C r.corr r.cov r.max r.median r.name r.skew r.sum r.B r.apply r.count r.kurt r.mean r.min r.quantile r.std r.var 

do rolling/expanding/ewma ops

In [1]: s = Series(range(5)) In [2]: r = s.rolling(2) I# pd.rolling_sum In [3]: r.sum() Out[3]: 0 NaN 1 1 2 3 3 5 4 7 dtype: float64 # nicer repr In [4]: r Out[4]: Rolling [window->2,center->False,axis->0] In [5]: e = s.expanding(min_periods=2) # pd.expanding_sum In [6]: e.sum() Out[6]: 0 NaN 1 1 2 3 3 6 4 10 dtype: float64 In [7]: em = s.ewm(com=10) # pd.ewma In [8]: em.mean() Out[8]: 0 0.000000 1 0.523810 2 1.063444 3 1.618832 4 2.189874 dtype: float64 

and allow the various aggregation type of ops (similar to groupby)

In [1]: df = DataFrame({'A' : range(5), 'B' : pd.timedelta_range('1 day',periods=5), 'C' : 'foo'}) In [2]: r = df.rolling(2,min_periods=1) In [3]: r.agg([np.sum,np.mean]) Out[3]: A B C sum mean sum mean sum mean 0 0 0.0 1 days 1 days 00:00:00 foo foo 1 1 0.5 3 days 1 days 12:00:00 foo foo 2 3 1.5 5 days 2 days 12:00:00 foo foo 3 5 2.5 7 days 3 days 12:00:00 foo foo 4 7 3.5 9 days 4 days 12:00:00 foo foo In [4]: r.agg({'A' : 'sum', 'B' : 'mean'}) Out[4]: A B 0 0 1 days 00:00:00 1 1 1 days 12:00:00 2 3 2 days 12:00:00 3 5 3 days 12:00:00 4 7 4 days 12:00:00 
@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design labels Nov 14, 2015
@jreback jreback added this to the 0.17.1 milestone Nov 14, 2015
@shoyer
Copy link
Member

shoyer commented Nov 15, 2015

Cc @jhamman who has been working on this for xray.

@jreback jreback modified the milestones: 0.18.0, 0.17.1 Nov 15, 2015
@jreback
Copy link
Contributor Author

jreback commented Nov 15, 2015

pushing to 0.18.0, I think __getitem__ will be a really nice add here. might as well do all of this at once.

@jreback
Copy link
Contributor Author

jreback commented Nov 22, 2015

ok, this is ready, MUCH bigger rabbit hole that I thought.

note on the doc-strings.

Since now we have much more like a groupby interface, e.g.

s.rolling(....).sum(), the doc-strings for Rolling.sum are minimal but have a See also back to the Series/DataFrame.rolling (We don't have the notion of a RollingSeries,RolldingDataFrame class so this would be quite tricky).

Further I did the same with groupby doc-strings (again don't have the class distinction on the See Also).

@jorisvandenbossche @shoyer @sinhrks @TomAugspurger @cpcloud

@jreback jreback force-pushed the rolling branch 2 times, most recently from f38b713 to af0dc3c Compare November 22, 2015 23:45
@seth-p
Copy link
Contributor

seth-p commented Nov 25, 2015

One thing I've wanted to add but haven't, which may be easier using this framework, at least interface-wise, is rolling exponentially weighted functions -- i.e. add a window to all the ewm*() parameters. Obviously this would need to be implemented in Cython for performance, but perhaps interface-wise it would be simpler using the scheme proposed here.

@jreback
Copy link
Contributor Author

jreback commented Nov 25, 2015

yep that would be quite straightforward to do interface wise but yes would need to be added to the cython functions (but not too hard there)

@jreback
Copy link
Contributor Author

jreback commented Nov 25, 2015

any comments?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be GroupBy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@jreback
Copy link
Contributor Author

jreback commented Dec 15, 2015

so now aggregations are consistent with what you'd expect

In [3]: df = DataFrame({'A' : range(5),'B' : range(0,10,2)}) In [4]: r = df.rolling(window=3) In [5]: r.agg(['mean','sum']) Out[5]: A B mean sum mean sum 0 NaN NaN NaN NaN 1 NaN NaN NaN NaN 2 1 3 2 6 3 2 6 4 12 4 3 9 6 18 In [6]: r['A'].agg(['mean','sum']) Out[6]: mean sum 0 NaN NaN 1 NaN NaN 2 1 3 3 2 6 4 3 9 In [7]: r.agg({'A' : ['mean','sum']}) Out[7]: mean sum 0 NaN NaN 1 NaN NaN 2 1 3 3 2 6 4 3 9 
@jorisvandenbossche
Copy link
Member

For the last one (r.agg({'A' : ['mean','sum']})), in the case of groupby you still have the 'A' in the columns:

In [49]: grouped.agg({'C': {'r':np.sum, 'r2':np.mean}}) Out[49]: C r r2 A B bar one 1.249205 1.249205 three -0.262759 -0.262759 two 1.151419 1.151419 foo one -0.518008 -0.259004 three 0.588044 0.588044 two 1.635643 0.817821 

There are some problems with the how and freq keyword. freq is still allowed but deprecated, but the accompanying how is not allowed:

this is correct, you cannot pass how to .mean nor .sum. This is simply invalid syntax. (and nor was it allowed in the original impl).

In any case, this was allowed and is in the docstring. Eg the example I gave works for 0.16.2:

In [63]: ser = pd.Series(np.random.randn(20), index=pd.date_range('1/1/2000', pe riods=20, freq='12H')) In [64]: pd.rolling_mean(ser, window=5, freq='D', how='max') Out[64]: 2000-01-01 NaN 2000-01-02 NaN 2000-01-03 NaN 2000-01-04 NaN 2000-01-05 0.314516 2000-01-06 0.511396 2000-01-07 0.343306 2000-01-08 0.561190 2000-01-09 0.772309 2000-01-10 0.266875 Freq: D, dtype: float64 In [65]: pd.rolling_mean(ser, window=5, freq='D', how='min') Out[65]: 2000-01-01 NaN 2000-01-02 NaN 2000-01-03 NaN 2000-01-04 NaN 2000-01-05 -0.439707 2000-01-06 -0.431944 2000-01-07 -0.538390 2000-01-08 -0.296774 2000-01-09 -0.112988 2000-01-10 -0.425533 Freq: D, dtype: float64 
@jreback
Copy link
Contributor Author

jreback commented Dec 15, 2015

I think my code was correct at the beginning.

In [4]: r.agg(['sum','mean']) Out[4]: A B sum mean sum mean 0 NaN NaN NaN NaN 1 NaN NaN NaN NaN 2 3 1 6 2 3 6 2 12 4 4 9 3 18 6 In [5]: r.agg({'A' : ['sum','mean']}) Out[5]: A sum mean 0 NaN NaN 1 NaN NaN 2 3 1 3 6 2 4 9 3 In [6]: r['A'].agg(['sum','mean']) Out[6]: sum mean 0 NaN NaN 1 NaN NaN 2 3 1 3 6 2 4 9 3 In [7]: r['A'].agg({'s' : 'sum', 'm' : 'mean' }) Out[7]: s m 0 NaN NaN 1 NaN NaN 2 3 1 3 6 2 4 9 3 In [8]: r.agg({'A' : {'s' : 'sum', 'm' : 'mean' }}) Out[8]: A s m 0 NaN NaN 1 NaN NaN 2 3 1 3 6 2 4 9 3 
@jreback jreback force-pushed the rolling branch 4 times, most recently from 05d1385 to 2b81c88 Compare December 17, 2015 23:04
@jreback
Copy link
Contributor Author

jreback commented Dec 18, 2015

might be still some small loose ends, but any further comments @jorisvandenbossche @shoyer

jreback added a commit that referenced this pull request Dec 19, 2015
API: provide Rolling/Expanding/EWM objects for deferred rolling type calculations #10702
@jreback jreback merged commit 2a1d9f2 into pandas-dev:master Dec 19, 2015
@jreback
Copy link
Contributor Author

jreback commented Dec 19, 2015

bombs away!

@jreback
Copy link
Contributor Author

jreback commented Dec 19, 2015

@max-sixty
Copy link
Contributor

👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode