-
-
Couldn't load subscription status.
- Fork 19.2k
ENH: allow .rolling / .expanding as groupby methods #12743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| cc @lminer |
pandas/core/groupby.py Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, I can find no open issue.
| ok, this is ready. comments. |
| Is the example at the top still up to date? |
| Also, how does the result looks like if you have a non-sorted grouper? |
| @jorisvandenbossche all updated. Had to work thru some issues. But much more fully tested now. This basically replicates what Also this PR cleans up a bunch of cases where the name is not returned for groupbys. So much more consistency now. |
pandas/core/groupby.py Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a massive hack using. well hack is the wrong word. Its a way of informing the groupby that we want to force the multiindex construction path which is normally taken only when things are mutated. its not an external visible kw.
ba7f228 to 102bfad Compare | @jorisvandenbossche if you'd have a look |
| Will take a look later today! |
doc/source/whatsnew/v0.18.1.txt Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"These return another object where you operate" -> this is not really a clear sentence. What do exactly want to say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
18256d1 to 2525374 Compare | @jreback one more lost index name, when you group a Series by another (using the original dataframe) In [27]: df['B'].groupby(df.A).rolling(4).mean() Out[27]: A idx 1 0 NaN 1 NaN 2 NaN 3 1.5 4 2.5 ... 3 35 33.5 36 34.5 37 35.5 38 36.5 39 37.5 dtype: float64Let me know if you want to fix that here, otherwise I'll open an issue. |
| And I'm (correctly) seeing a MultiIndex for the |
| Will try again (I just fetched the head of this PR, didn't rebase myself as I thought you already did that) |
| @TomAugspurger It's with the rolling case I do not see a MultiIndex ( @jreback Tested it again (fetched this PR, rebased on latest master myself, rebuild pandas to be certain), and still getting the same result as I showed in #12743 (comment) (using python 2.7, numpy 1.10.1, Windows) |
| @jorisvandenbossche hmm, ok I did on 2.7/1.10.4 and it doesn't look right. odd. (on windows) |
| ok, this has been failing on windows on my branch : https://ci.appveyor.com/project/jreback/pandas/build/1.0.2077/job/qfrsv7p46f6ns6xg |
| ok, this is not a 2.7/numpy issue, but something specific to windows. |
| ok pushed a commit to fix. This comes back to some of the 'figuring what the shape of the return' is logic. IOW its a heuristic depending on whether something is mutated (this is the standard case). But when we are doing a chained window/resample we need to force this. |
| pushed a new version, had some flake/formatting issues on windows. |
209c013 to 190ecd0 Compare | @TomAugspurger I picked up that last name fix (was a bug in the concat step). |
closes pandas-dev#12738 BUG: allow df.groupby(...).resample(...) to return a Resampler groupby object closes pandas-dev#12486 BUG: consistency of name of returned groupby closes pandas-dev#12363
| @TomAugspurger @jorisvandenbossche any more comments. |
| Nothing else from me 👍 |
| ok will merge shortly unless @jorisvandenbossche has any further comments. |
| Tested on master, and now indeed the inconsistency is solved! Thanks |
| @jreback, thanks. I want to do grouby then rolling then corr on two columns, code below takes 18s+ on 1M rows: Is there a faster way to do this? i tried code but it does not work (got exceptions): Another question: how to ignore the group key (just keep the original index) for below code: |
closes #12738
closes #12486
closes #12363
- [ ] doc section in groupbywill do later