Skip to content

Conversation

@Giftlin
Copy link
Contributor

@Giftlin Giftlin commented Sep 13, 2017

res = method(*args, **kwargs)
if res is not None:
return Series(res, index=self.index)
return Series(res, index=self.index, name=self.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I misread the earlier. self is the instance of CategoricalAccessor, not the Series like I expected. So this approach won't work.

Copy link
Contributor

@TomAugspurger TomAugspurger Sep 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you modify

return CategoricalAccessor(data.values, data.index)
to get data.name. and then modify CategoricalAccessor.__init__ to accept a name. data may not have a name, so maybe getattr(data, 'name', None). Make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the signature should follow like we do in pandas.core.indexes.accessors, e.g. (data, index, name=None)

@TomAugspurger
Copy link
Contributor

Also, if you could add new tests to ensure the name is passed through, and a release note in doc/source/whatsnew/v0.21.0.txt under bug fixes.

@pep8speaks
Copy link

pep8speaks commented Sep 13, 2017

Hello @Giftlin! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on September 18, 2017 at 11:36 Hours UTC
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests!

res = method(*args, **kwargs)
if res is not None:
return Series(res, index=self.index)
return Series(res, index=self.index, name=self.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the signature should follow like we do in pandas.core.indexes.accessors, e.g. (data, index, name=None)

@jreback jreback added the Categorical Categorical Data Type label Sep 13, 2017
@codecov
Copy link

codecov bot commented Sep 13, 2017

Codecov Report

Merging #17517 into master will decrease coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #17517 +/- ## ========================================== - Coverage 91.18% 91.16% -0.02%  ========================================== Files 163 163 Lines 49543 49544 +1 ========================================== - Hits 45177 45169 -8  - Misses 4366 4375 +9
Flag Coverage Δ
#multiple 88.95% <100%> (ø) ⬆️
#single 40.21% <75%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/categorical.py 95.52% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f11bbf2...b817366. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 13, 2017

Codecov Report

Merging #17517 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #17517 +/- ## ========================================== - Coverage 91.25% 91.2% -0.05%  ========================================== Files 163 163 Lines 49606 49625 +19 ========================================== - Hits 45266 45261 -5  - Misses 4340 4364 +24
Flag Coverage Δ
#multiple 88.99% <100%> (-0.03%) ⬇️
#single 40.19% <75%> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/core/categorical.py 95.57% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.23% <0%> (-1.82%) ⬇️
pandas/tseries/offsets.py 97% <0%> (-0.18%) ⬇️
pandas/core/frame.py 97.77% <0%> (-0.1%) ⬇️
pandas/core/indexes/interval.py 93.57% <0%> (ø) ⬆️
pandas/core/indexes/datetimes.py 95.53% <0%> (ø) ⬆️
pandas/core/api.py 100% <0%> (ø) ⬆️
pandas/core/resample.py 96.17% <0%> (+0.01%) ⬆️
pandas/plotting/_core.py 82.73% <0%> (+0.03%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 328c7e1...ce78e6c. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Sep 14, 2017

contrib docs are here. pls add tests IN this PR. It should fail with the tests, then pass after the fix.

@jorisvandenbossche jorisvandenbossche changed the title FIX for set_categories issue #17509 BUG: preserve name in set_categories (#17509) Sep 15, 2017
- Bug in the categorical constructor with empty values and categories causing
the ``.categories`` to be an empty ``Float64Index`` rather than an empty
``Index`` with object dtype (:issue:`17248`)
- Bug in preserving name in ``set_categories``. (:issue:`17509`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this more descriptive? Maybe

Bug in categorical operations on Series like `Series.cat.set_categories` not preserving the original Series' name (:issue:`17509`) 
tm.assert_numpy_array_equal(result, expected)

def test_getname_category(self):
result = 'A'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result should be the result of the operation. So. result = s.cat.set_categories([1, 2, 3]).name. And then compare directly to the expected name. assert result == 'A'

s = s.cat.set_categories([1, 2, 3])
expected = s.astype('category').name
tm.assert_almost_equal(result, expected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add additional tests for the other delegated methods that this should apply to? set_ordered, etc?

expected = 'A'
s = pd.Series([1, 2, 3], name='A').astype('category')
s = s.cat.set_categories([1, 2, 3])
result = s.astype('category').name
Copy link
Contributor

@TomAugspurger TomAugspurger Sep 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be result = s.cat.set_categories([1, 2, 3]).name so that we're directly testing the behavior that caused the bug.

the ``.categories`` to be an empty ``Float64Index`` rather than an empty
``Index`` with object dtype (:issue:`17248`)

- Bug in categorical operations on Series like `Series.cat.set_categories`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it should be double backticks, not single

``Series.cat.set_categories`` 
expected = c[np.array([100000]).astype(np.int64)].codes
tm.assert_numpy_array_equal(result, expected)

def test_getname_category(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parametrize with all accessor methods

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use a lambda expression

@pytest.mark.parametrize("method", [ lambda x: x.cat.set_categories([1, 2,3 ]), lambda x: x.cat.reorder_categories([2,3, 1], ordered=True), ]) def test_getname_categorical_accessor(self, method): s = .... expected = 'A' result = method(s).name assert result == expected 
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current test function is also fine only right? What is the difference? Performance?

the ``.categories`` to be an empty ``Float64Index`` rather than an empty
``Index`` with object dtype (:issue:`17248`)

- Bug in categorical operations on Series like ``Series.cat.set_categories``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should instead refer to all Series.cat operations not preserving name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I mention all the functions along with set_categories here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected = c[np.array([100000]).astype(np.int64)].codes
tm.assert_numpy_array_equal(result, expected)

@pytest.mark.parametrize("method",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this is passing linting I would write this like

@pytest.mark.parametrize( "method", [ lambda ..... ]) 

to get more indent

@jreback
Copy link
Contributor

jreback commented Sep 17, 2017

ok this looks fine to me (after fixing linting). ping on green.

@jreback jreback added the Bug label Sep 17, 2017
@jorisvandenbossche jorisvandenbossche added this to the 0.21.0 milestone Sep 18, 2017
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Just a minor comment about how to link to a section in the docs.

- Bug in the categorical constructor with empty values and categories causing
the ``.categories`` to be an empty ``Float64Index`` rather than an empty
``Index`` with object dtype (:issue:`17248`)
- Bug in categorical operations `Series.cat <http://pandas.pydata.org/pandas-docs/stable/categorical.html#working-with-categories>' not preserving the original Series' name (:issue:`17509`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sphinx has the ability to do local references (see http://www.sphinx-doc.org/en/stable/markup/inline.html#cross-referencing-arbitrary-locations), so you can do :ref:`Series.cat <categorical.cat>` instead of listing the actual full url.
You only need to add a label at the specific location (put .. _categorical.cat: on the line before the title):

Working with categories
-----------------------

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the expected change right?

screenshot_2017-09-18-13-25-56-574_com android browser

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

s = pd.Series([1, 2, 3], name='A').astype('category')
expected = 'A'
result = method(s).name
assert result == expected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this, construct the expected Series directly from the constructor and call tm.assert_series_equal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Series is not expected. Only the name is expected, which is a character

Copy link
Member

@gfyoung gfyoung Sep 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because you defined your test as such. The output of method is a Series, not a character. I'm saying that you should check the output of method instead of comparing its name attribute.

We want to make sure that nothing else changes and that the name parameter is preserved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfyoung In this case asserting only the name is fine I think. Assuming the actual methods are already tested separately, this tests just asserts for all of them they preserve the name. If checking the actual resulting series, all of them would need a different result, so would just over-complicate the test.

@Giftlin
Copy link
Contributor Author

Giftlin commented Sep 18, 2017

@jreback
@TomAugspurger
checks have passed

@jorisvandenbossche jorisvandenbossche merged commit 9cc3333 into pandas-dev:master Sep 18, 2017
@jorisvandenbossche
Copy link
Member

@Giftlin Thanks!

(I added a small commit with some white-space clean-up. Not sure what went wrong, but the diff showed changes in whitespace on some lines. You might need to check the settings of your editor to ensure it only uses spaces for whitespace)

@Giftlin Giftlin deleted the patch-4 branch September 18, 2017 18:01
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Categorical Categorical Data Type

6 participants