Skip to content

Conversation

@ghasemnaddaf
Copy link
Contributor

@ghasemnaddaf ghasemnaddaf commented Nov 14, 2017

If old_categories is empty (all nan categories) then _recode_for_categories
should return codes.copy() so that the writable flag is True.

if len(old_categories) == 0:
# All null anyway, so just retain the nulls
return codes
return codes.copy()
Copy link
Contributor Author

@ghasemnaddaf ghasemnaddaf Nov 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return codes causes writable flag to be False hence we get the error reported in #18051

@codecov
Copy link

codecov bot commented Nov 14, 2017

Codecov Report

Merging #18279 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@ Coverage Diff @@ ## master #18279 +/- ## ========================================== - Coverage 91.4% 91.38% -0.02%  ========================================== Files 164 164 Lines 49878 49878 ========================================== - Hits 45590 45581 -9  - Misses 4288 4297 +9
Flag Coverage Δ
#multiple 89.19% <ø> (ø) ⬆️
#single 39.41% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/categorical.py 95.75% <ø> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69472f9...b35f16e. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 14, 2017

Codecov Report

Merging #18279 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #18279 +/- ## ========================================== - Coverage 91.4% 91.38% -0.02%  ========================================== Files 164 164 Lines 49880 49880 ========================================== - Hits 45592 45583 -9  - Misses 4288 4297 +9
Flag Coverage Δ
#multiple 89.19% <100%> (ø) ⬆️
#single 39.42% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/core/categorical.py 95.75% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 148ed63...21734aa. Read the comment docs.

@jorisvandenbossche jorisvandenbossche added Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version labels Nov 14, 2017
@jorisvandenbossche jorisvandenbossche added this to the 0.21.1 milestone Nov 14, 2017

# all-nan categories GH 18051
cat_nan = Categorical([np.nan])
assert cat_nan.unique()._codes.flags.writeable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this are all tests about set_categories so it feels a bit strange to put this one in here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a whatsnew note in 0.21.1


# all-nan categories GH 18051
cat_nan = Categorical([np.nan])
assert cat_nan.unique()._codes.flags.writeable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just assert the results of nunique

Copy link
Contributor Author

@ghasemnaddaf ghasemnaddaf Nov 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no nunique for categorical (?) What is being tested makes sense to me, that is the root cause was that the codes in the output of unique() was not writeable.

rebased to remove conflicts in whats new
@ghasemnaddaf ghasemnaddaf force-pushed the catDtype_copy_nan_codes branch from 9189c78 to 6c751c0 Compare November 15, 2017 00:32
@ghasemnaddaf
Copy link
Contributor Author

synced off master and rebased to remove conflict in whatsnew

@jorisvandenbossche
Copy link
Member

I would maybe add the original reported case of pd.Series(pd.Categorical([np.nan])).nunique() as a test as well? (eg in test_categorical.py::TestCategoricalAsBlock)

@topper-123
Copy link
Contributor

Works for me.

This issue is a bit too much under the hood for me to understand, so I can't speak to whether this is the best solution, but it works fine for me.

exp_cat = Categorical(["b", np.nan, "a"], categories=["b", "a"])
tm.assert_categorical_equal(res, exp_cat)

# GH 18051 unique()._codes should be writeable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you just need to compare the result of .unique() that's the user visible thing we are testing here. The non-writable issue is detail.

cat = Categorical([np.nan])
res = cat.unique()
exp_cat = Categorical([np.nan], categories=[])
tm.assert_categorical_equal(res, exp_cat)
Copy link
Contributor Author

@ghasemnaddaf ghasemnaddaf Nov 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback , do you mean like this? (This is not failing on 0.21.0 though)

Copy link
Contributor

@topper-123 topper-123 Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a test for

assert pd.Series(pd.Categorical([np.nan])).nunique() == 0

Note that nunique is a method on Series, not Categorical.

Copy link
Contributor Author

@ghasemnaddaf ghasemnaddaf Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@topper-123 topper-123 Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks great IMO.

Also, you mention above that Categorical([np.nan]).unique() doesn't fail in 0.21.0, but pd.Series(pd.Categorical([np.nan])).unique() does fail. So if you could add a test for that as well in the test for series, then IMO the PR is good (but let @jreback make the final call on that).

@topper-123
Copy link
Contributor

Hi,

I've looked into this again and IMO the tests should go in tests/series/test_analytics.py, as that's where Series.unique and Series.nunique() are tested. These test failures actually are not happening on Categoricals themselves, o tests shouldn't go into test_categorical.py.

So in method test_value_counts_nunique I'd add these lines:

# GH 18051 s = pd.Series(pd.Categorical([])) assert s.nunique() == 0 s = pd.Series(pd.Categorical([np.nan])) assert s.nunique() == 0

and in method test_unique I'd add:

# GH 18051 s = pd.Series(pd.Categorical([])) tm.assert_categorical_equal(s.unique(), pd.Categorical([]), check_dtype=False) s = pd.Series(pd.Categorical([np.nan])) tm.assert_categorical_equal(s.unique(), pd.Categorical([np.nan]), check_dtype=False)

and have no tests in test_categorical.py.

@jreback , do you agree?

@jreback
Copy link
Contributor

jreback commented Nov 22, 2017

@topper-123 suggestions look reasonable.

@jreback jreback removed this from the 0.21.1 milestone Nov 22, 2017
@topper-123
Copy link
Contributor

@jreback, is it too late to get this in 0.21.1?

@ghasemnaddaf, if you don't have time to finish this up, I could open a new PR in order to get this into 0.21.1. Alternatively, I'd be very happy if you could finish this up for 0.21.1.

@ghasemnaddaf
Copy link
Contributor Author

sorry @topper-123 im busy till weekend. Go for it thanks

@jreback
Copy link
Contributor

jreback commented Nov 22, 2017

superseded by #18436

@jreback jreback closed this Nov 22, 2017
@jreback jreback added this to the No action milestone Nov 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version

4 participants