API: provide Rolling/Expanding/EWM objects for deferred rolling type calculations #10702 #11603

jreback · 2015-11-14T18:12:18Z

closes #10702
closes #9052
xref #4950, removal of depr

So this basically takes all of the pd.rolling_*,pd.expanding_*,pd.ewma_* routines and allows an object oriented interface, similar to groupby.

Some benefits:

nice tab completions on the Rolling/Expanding/EWM objects
much cleaner code internally
complete back compat, e.g. everything just works like it did
added a .agg/aggregate function, similar to groupby, where you can do multiple aggregations at once
added __getitem__ accessing, e.g. df.rolling(....)['A','B'].sum() for a nicer API
allows for much of API/ENH: master issue for pd.rolling_apply #8659 to be done very easily
fix for coercing Timedeltas properly
handling nuiscance (string) columns

Other:

along with window doc rewrite, fixed doc-strings for groupby/window to provide back-refs

ToDO:

I think that all of the doc-strings are correct, but need check
implement .agg
update API.rst, what's new
deprecate the pd.expanding_*,pd.rolling_*,pd.ewma_* interface as this is polluting the top-level namespace quite a bit
change the docs to use the new API

In [4]: df = DataFrame({'A' : range(5), 'B' : pd.timedelta_range('1 day',periods=5), 'C' : 'foo'}) In [5]: df.rolling(window=2).sum() Out[5]: A B C 0 NaN NaT foo 1 1 3 days foo 2 3 5 days foo 3 5 7 days foo 4 7 9 days foo In [6]: df.rolling(window=2)['A','C'].sum() Out[6]: A C 0 NaN foo 1 1 foo 2 3 foo 3 5 foo 4 7 foo In [2]: r = df.rolling(window=3) In [3]: r. r.A r.C r.corr r.cov r.max r.median r.name r.skew r.sum r.B r.apply r.count r.kurt r.mean r.min r.quantile r.std r.var

do rolling/expanding/ewma ops

In [1]: s = Series(range(5)) In [2]: r = s.rolling(2) I# pd.rolling_sum In [3]: r.sum() Out[3]: 0 NaN 1 1 2 3 3 5 4 7 dtype: float64 # nicer repr In [4]: r Out[4]: Rolling [window->2,center->False,axis->0] In [5]: e = s.expanding(min_periods=2) # pd.expanding_sum In [6]: e.sum() Out[6]: 0 NaN 1 1 2 3 3 6 4 10 dtype: float64 In [7]: em = s.ewm(com=10) # pd.ewma In [8]: em.mean() Out[8]: 0 0.000000 1 0.523810 2 1.063444 3 1.618832 4 2.189874 dtype: float64

and allow the various aggregation type of ops (similar to groupby)

In [1]: df = DataFrame({'A' : range(5), 'B' : pd.timedelta_range('1 day',periods=5), 'C' : 'foo'}) In [2]: r = df.rolling(2,min_periods=1) In [3]: r.agg([np.sum,np.mean]) Out[3]: A B C sum mean sum mean sum mean 0 0 0.0 1 days 1 days 00:00:00 foo foo 1 1 0.5 3 days 1 days 12:00:00 foo foo 2 3 1.5 5 days 2 days 12:00:00 foo foo 3 5 2.5 7 days 3 days 12:00:00 foo foo 4 7 3.5 9 days 4 days 12:00:00 foo foo In [4]: r.agg({'A' : 'sum', 'B' : 'mean'}) Out[4]: A B 0 0 1 days 00:00:00 1 1 1 days 12:00:00 2 3 2 days 12:00:00 3 5 3 days 12:00:00 4 7 4 days 12:00:00

shoyer · 2015-11-15T04:07:06Z

Cc @jhamman who has been working on this for xray.

jreback · 2015-11-15T16:29:09Z

pushing to 0.18.0, I think __getitem__ will be a really nice add here. might as well do all of this at once.

jreback · 2015-11-22T22:17:05Z

ok, this is ready, MUCH bigger rabbit hole that I thought.

note on the doc-strings.

Since now we have much more like a groupby interface, e.g.

s.rolling(....).sum(), the doc-strings for Rolling.sum are minimal but have a See also back to the Series/DataFrame.rolling (We don't have the notion of a RollingSeries,RolldingDataFrame class so this would be quite tricky).

Further I did the same with groupby doc-strings (again don't have the class distinction on the See Also).

@jorisvandenbossche @shoyer @sinhrks @TomAugspurger @cpcloud

seth-p · 2015-11-25T02:27:20Z

One thing I've wanted to add but haven't, which may be easier using this framework, at least interface-wise, is rolling exponentially weighted functions -- i.e. add a window to all the ewm*() parameters. Obviously this would need to be implemented in Cython for performance, but perhaps interface-wise it would be simpler using the scheme proposed here.

jreback · 2015-11-25T02:28:49Z

yep that would be quite straightforward to do interface wise but yes would need to be added to the cython functions (but not too hard there)

jreback · 2015-11-25T16:23:37Z

any comments?

shoyer · 2015-11-25T22:41:07Z

doc/source/computation.rst

should be GroupBy

jreback · 2015-12-15T16:53:05Z

so now aggregations are consistent with what you'd expect

In [3]: df = DataFrame({'A' : range(5),'B' : range(0,10,2)}) In [4]: r = df.rolling(window=3) In [5]: r.agg(['mean','sum']) Out[5]: A B mean sum mean sum 0 NaN NaN NaN NaN 1 NaN NaN NaN NaN 2 1 3 2 6 3 2 6 4 12 4 3 9 6 18 In [6]: r['A'].agg(['mean','sum']) Out[6]: mean sum 0 NaN NaN 1 NaN NaN 2 1 3 3 2 6 4 3 9 In [7]: r.agg({'A' : ['mean','sum']}) Out[7]: mean sum 0 NaN NaN 1 NaN NaN 2 1 3 3 2 6 4 3 9

jorisvandenbossche · 2015-12-15T17:47:41Z

For the last one (r.agg({'A' : ['mean','sum']})), in the case of groupby you still have the 'A' in the columns:

In [49]: grouped.agg({'C': {'r':np.sum, 'r2':np.mean}}) Out[49]: C r r2 A B bar one 1.249205 1.249205 three -0.262759 -0.262759 two 1.151419 1.151419 foo one -0.518008 -0.259004 three 0.588044 0.588044 two 1.635643 0.817821

There are some problems with the how and freq keyword. freq is still allowed but deprecated, but the accompanying how is not allowed:

this is correct, you cannot pass how to .mean nor .sum. This is simply invalid syntax. (and nor was it allowed in the original impl).

In any case, this was allowed and is in the docstring. Eg the example I gave works for 0.16.2:

In [63]: ser = pd.Series(np.random.randn(20), index=pd.date_range('1/1/2000', pe riods=20, freq='12H')) In [64]: pd.rolling_mean(ser, window=5, freq='D', how='max') Out[64]: 2000-01-01 NaN 2000-01-02 NaN 2000-01-03 NaN 2000-01-04 NaN 2000-01-05 0.314516 2000-01-06 0.511396 2000-01-07 0.343306 2000-01-08 0.561190 2000-01-09 0.772309 2000-01-10 0.266875 Freq: D, dtype: float64 In [65]: pd.rolling_mean(ser, window=5, freq='D', how='min') Out[65]: 2000-01-01 NaN 2000-01-02 NaN 2000-01-03 NaN 2000-01-04 NaN 2000-01-05 -0.439707 2000-01-06 -0.431944 2000-01-07 -0.538390 2000-01-08 -0.296774 2000-01-09 -0.112988 2000-01-10 -0.425533 Freq: D, dtype: float64

jreback · 2015-12-15T18:53:46Z

I think my code was correct at the beginning.

In [4]: r.agg(['sum','mean']) Out[4]: A B sum mean sum mean 0 NaN NaN NaN NaN 1 NaN NaN NaN NaN 2 3 1 6 2 3 6 2 12 4 4 9 3 18 6 In [5]: r.agg({'A' : ['sum','mean']}) Out[5]: A sum mean 0 NaN NaN 1 NaN NaN 2 3 1 3 6 2 4 9 3 In [6]: r['A'].agg(['sum','mean']) Out[6]: sum mean 0 NaN NaN 1 NaN NaN 2 3 1 3 6 2 4 9 3 In [7]: r['A'].agg({'s' : 'sum', 'm' : 'mean' }) Out[7]: s m 0 NaN NaN 1 NaN NaN 2 3 1 3 6 2 4 9 3 In [8]: r.agg({'A' : {'s' : 'sum', 'm' : 'mean' }}) Out[8]: A s m 0 NaN NaN 1 NaN NaN 2 3 1 3 6 2 4 9 3

jreback · 2015-12-18T18:18:11Z

might be still some small loose ends, but any further comments @jorisvandenbossche @shoyer

…calculations, xref pandas-dev#10702

…andas-dev#4950

API: provide Rolling/Expanding/EWM objects for deferred rolling type calculations #10702

jreback · 2015-12-19T13:52:12Z

bombs away!

jreback · 2015-12-19T14:23:36Z

@jorisvandenbossche http://pandas-docs.github.io/pandas-docs-travis/computation.html#stats-moments

does the :math not render on travis doc builds?

max-sixty · 2015-12-19T16:51:04Z

👏

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design labels Nov 14, 2015

jreback added this to the 0.17.1 milestone Nov 14, 2015

jreback force-pushed the rolling branch from 920653f to 23ffc19 Compare November 14, 2015 19:00

leeong05 mentioned this pull request Nov 14, 2015

API/ENH: master issue for pd.rolling_apply #8659

Closed

14 tasks

jreback modified the milestones: 0.18.0, 0.17.1 Nov 15, 2015

This was referenced Nov 15, 2015

ENH: Add the moment function as DataFrame and Series method WITH namespacing #10702

Closed

Roll apply nonfloat dtypes #11620

Closed

jreback force-pushed the rolling branch 2 times, most recently from 6f7de2f to 4790a88 Compare November 19, 2015 13:49

jreback mentioned this pull request Nov 20, 2015

Bug when computing rolling_mean with extreme value #11645

Closed

jreback force-pushed the rolling branch 4 times, most recently from 3352a93 to ec5fd94 Compare November 22, 2015 00:18

max-sixty mentioned this pull request Nov 22, 2015

ENH: min_weight in addition to min_periods for ewma #11167

Open

jsexauer mentioned this pull request Nov 22, 2015

DEPR: Clean up list of deprecations from prior versions #6581

Closed

1 task

jreback force-pushed the rolling branch from ec5fd94 to ba12763 Compare November 22, 2015 22:12

jreback force-pushed the rolling branch 2 times, most recently from f38b713 to af0dc3c Compare November 22, 2015 23:45

jreback force-pushed the rolling branch from af0dc3c to 101f335 Compare November 25, 2015 15:37

shoyer reviewed Nov 25, 2015
View reviewed changes

jreback force-pushed the rolling branch from 2558440 to 20f6730 Compare December 15, 2015 16:50

jreback force-pushed the rolling branch 4 times, most recently from 05d1385 to 2b81c88 Compare December 17, 2015 23:04

jreback added 8 commits December 19, 2015 08:43

API: provide Rolling/Expanding/EWM objects for deferred rolling type …

3c23dc9

…calculations, xref pandas-dev#10702

BUG/API: consistency in .agg with nested dicts pandas-dev#9052

36fb835

DOC: update docs for back-refs to groupby & window functions

9587d46

DEPR: removal of expanding_corr_pairwise/rolling_cor_pairwise, xref p…

e47bd99

…andas-dev#4950

DEPR: deprecate pd.rolling_*, pd.expanding_*, pd.ewm*

3156395

DOC: minor doc corrections

0bbe110

DEPR: deprecate freq/how arguments to window functions

05eb20f

cleanup based on comments

1890a88

jreback force-pushed the rolling branch from 2b81c88 to 1890a88 Compare December 19, 2015 13:51

jreback added a commit that referenced this pull request Dec 19, 2015

Merge pull request #11603 from jreback/rolling

2a1d9f2

API: provide Rolling/Expanding/EWM objects for deferred rolling type calculations #10702

jreback merged commit 2a1d9f2 into pandas-dev:master Dec 19, 2015

jreback mentioned this pull request Jul 24, 2016

DEPR: deprecations log for removed issues #13777

Closed

jreback mentioned this pull request Sep 20, 2016

DEPR: 0.21 deprecations master issue #14220

Closed

8 tasks

TomAugspurger mentioned this pull request Jan 10, 2017

API: Table-wise rolling / expanding / EWM function application #15095

Closed

This was referenced Dec 2, 2017

DEPR/CLN: Remove freq parameters from df.rolling/expanding/ewm #18601

Merged

DEPR/CLN: Remove how keyword from df.rolling() etc. #18668

Merged

topper-123 mentioned this pull request Dec 11, 2017

DEPR/CLN: Remove pd.rolling_*, pd.expanding* and pd.ewm* #18723

Merged

3 tasks

chris-b1 mentioned this pull request Jan 17, 2018

.apply() for core.window.Window (feature request) #19286

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

API: provide Rolling/Expanding/EWM objects for deferred rolling type calculations #10702 #11603

API: provide Rolling/Expanding/EWM objects for deferred rolling type calculations #10702 #11603

Uh oh!

jreback commented Nov 14, 2015

shoyer commented Nov 15, 2015

jreback commented Nov 15, 2015

jreback commented Nov 22, 2015

seth-p commented Nov 25, 2015

jreback commented Nov 25, 2015

jreback commented Nov 25, 2015

shoyer Nov 25, 2015

jreback Nov 25, 2015

jreback commented Dec 15, 2015

jorisvandenbossche commented Dec 15, 2015

jreback commented Dec 15, 2015

jreback commented Dec 18, 2015

jreback commented Dec 19, 2015

jreback commented Dec 19, 2015

max-sixty commented Dec 19, 2015

Labels

11 participants

Uh oh!

Uh oh!

API: provide Rolling/Expanding/EWM objects for deferred rolling type calculations #10702 #11603

API: provide Rolling/Expanding/EWM objects for deferred rolling type calculations #10702 #11603

Uh oh!

Conversation

jreback commented Nov 14, 2015

shoyer commented Nov 15, 2015

jreback commented Nov 15, 2015

jreback commented Nov 22, 2015

seth-p commented Nov 25, 2015

jreback commented Nov 25, 2015

jreback commented Nov 25, 2015

shoyer Nov 25, 2015

Choose a reason for hiding this comment

jreback Nov 25, 2015

Choose a reason for hiding this comment

jreback commented Dec 15, 2015

jorisvandenbossche commented Dec 15, 2015

jreback commented Dec 15, 2015

jreback commented Dec 18, 2015

jreback commented Dec 19, 2015

jreback commented Dec 19, 2015

max-sixty commented Dec 19, 2015

Labels

11 participants