Skip to content

Conversation

@jreback
Copy link
Contributor

@jreback jreback commented Jun 27, 2019

closes #18502
replaces #26550

@jreback jreback added Groupby Compat pandas objects compatability with Numpy or Python functions Categorical Categorical Data Type labels Jun 27, 2019
@jreback jreback added this to the 0.25.0 milestone Jun 27, 2019
@jreback
Copy link
Contributor Author

jreback commented Jun 27, 2019

@WillAyd #27071 (comment)

that test is not appropriate for checking the results of this change as most of those ops don't work on ordered categoricals; i have covered the most common of first/last/min/max above.

@codecov
Copy link

codecov bot commented Jun 27, 2019

Codecov Report

Merging #27071 into master will decrease coverage by 1.38%.
The diff coverage is 93.1%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #27071 +/- ## ========================================== - Coverage 92.04% 90.66% -1.39%  ========================================== Files 180 180 Lines 50714 50727 +13 ========================================== - Hits 46680 45991 -689  - Misses 4034 4736 +702
Flag Coverage Δ
#multiple 90.66% <93.1%> (-0.02%) ⬇️
#single ?
Impacted Files Coverage Δ
pandas/core/groupby/generic.py 88.48% <100%> (-0.86%) ⬇️
pandas/core/groupby/ops.py 96% <100%> (ø) ⬆️
pandas/core/nanops.py 94.76% <100%> (ø) ⬆️
pandas/core/groupby/groupby.py 97.32% <100%> (+0.15%) ⬆️
pandas/core/internals/construction.py 96.21% <100%> (+0.25%) ⬆️
pandas/core/internals/blocks.py 94.95% <71.42%> (-0.19%) ⬇️
pandas/core/computation/pytables.py 62.5% <0%> (-27.75%) ⬇️
pandas/io/pytables.py 64.86% <0%> (-25.44%) ⬇️
pandas/io/gbq.py 88.88% <0%> (-11.12%) ⬇️
pandas/core/computation/common.py 84.21% <0%> (-5.27%) ⬇️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d94146c...424c466. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 27, 2019

Codecov Report

Merging #27071 into master will increase coverage by 50.06%.
The diff coverage is 93.1%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #27071 +/- ## =========================================== + Coverage 41.96% 92.02% +50.06%  =========================================== Files 180 180 Lines 50707 50727 +20 =========================================== + Hits 21277 46681 +25404  + Misses 29430 4046 -25384
Flag Coverage Δ
#multiple 90.66% <93.1%> (?)
#single 41.85% <24.13%> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/core/groupby/generic.py 88.48% <100%> (+73.66%) ⬆️
pandas/core/groupby/ops.py 96% <100%> (+76.23%) ⬆️
pandas/core/nanops.py 94.76% <100%> (+63.17%) ⬆️
pandas/core/groupby/groupby.py 97.32% <100%> (+73.45%) ⬆️
pandas/core/internals/construction.py 96.21% <100%> (+31.81%) ⬆️
pandas/core/internals/blocks.py 94.95% <71.42%> (+41.89%) ⬆️
pandas/core/computation/pytables.py 90.24% <0%> (+0.3%) ⬆️
pandas/io/pytables.py 90.3% <0%> (+0.96%) ⬆️
pandas/core/panel.py 17.8% <0%> (+1.7%) ⬆️
pandas/util/_test_decorators.py 93.84% <0%> (+4.61%) ⬆️
... and 138 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c1673cf...9b8f2b4. Read the comment docs.

try:

result = self._holder._from_sequence(
np.asarray(result).ravel(), dtype=dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned by the asarray here. Is that just so we can do the .ravel?

Consider a silly example like

df.groupby('key').apply(lambda x: x.array)

Will that end up hitting this, and so calling asarray and converting to ndarray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this make sense?

df.groupby('A').B.apply(lambda x: x.array) 
(Pdb) p df A B 0 1 2000-01-01 18:00:00-06:00 1 1 2000-01-01 18:00:00-06:00 2 2 NaT 3 2 NaT 4 3 1999-12-31 18:00:00-06:00 5 3 1999-12-31 18:00:00-06:00 6 1 2000-01-01 18:00:00-06:00 7 4 2000-01-02 18:00:00-06:00 (Pdb) p result A 1 [2000-01-01 18:00:00-06:00, 2000-01-01 18:00:0... 2 [NaT, NaT] 3 [1999-12-31 18:00:00-06:00, 1999-12-31 18:00:0... 4 [2000-01-02 18:00:00-06:00] Name: B, dtype: object 
@jreback
Copy link
Contributor Author

jreback commented Jun 27, 2019

@jbrockmendel
Copy link
Member

Does this also preserve the dtypes under transpose?

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jreback
Copy link
Contributor Author

jreback commented Jun 27, 2019

@jbrockmendel

Does this also preserve the dtypes under transpose?

no that's a more general issues

@jreback
Copy link
Contributor Author

jreback commented Jun 27, 2019

home/vsts/work/1/s/doc/source/whatsnew/v0.25.0.rst:856: WARNING: Unknown interpreted text role "method".

@jorisvandenbossche any idea what this means?

@jreback jreback merged commit ce86c21 into pandas-dev:master Jun 27, 2019
result = ts.resample('3T').mean()
expected = Series([1, 4, 7],
index=pd.date_range('1/1/2000', periods=3, freq='3T'),
dtype='Int64')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback @jorisvandenbossche why returning Int64 here? I would expect float64 or Float64.

e.g. if we do ts[-1] += 1 before the resample, the mean comes back as float64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be Float64, because it is only accidentally that the results are all integer-like.
This is one of the cases that I listed in #37494

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Categorical Categorical Data Type Compat pandas objects compatability with Numpy or Python functions Groupby

5 participants