Skip to content

Conversation

@alysivji
Copy link
Contributor

Pretty straightforward fix in the __setitem__ method

While I was looking into this, I discovered a bug that does not allow for the creation of mixed-dtype Categoricals, if the first element is not a tuple. Should I create an issue?

In [20]: s = pd.Categorical([('a', 'a'), ('a', 'b'), ('b', 'a'), 'c']) In [21]: s Out[21]: [(a, a), (a, b), (b, a), c] Categories (4, object): [(a, a), (a, b), (b, a), c] In [22]: s = pd.Categorical(['c', ('a', 'b'), ('b', 'a'), 'c']) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-22-0f4b5f338532> in <module>() ----> 1 s = pd.Categorical(['c', ('a', 'b'), ('b', 'a'), 'c']) ~/Documents/siv-dev/projects/open-source/pandas/pandas/core/arrays/categorical.py in __init__(self, values, categories, ordered, dtype, fastpath)  328 # _sanitize_array coerces np.nan to a string under certain versions  329 # of numpy --> 330 values = maybe_infer_to_datetimelike(values, convert_dates=True)  331 if not isinstance(values, np.ndarray):  332 values = _convert_to_list_like(values) ~/Documents/siv-dev/projects/open-source/pandas/pandas/core/dtypes/cast.py in maybe_infer_to_datetimelike(value, convert_dates)  893 if not is_list_like(v):  894 v = [v] --> 895 v = np.array(v, copy=False)  896  897 # we only care about object dtypes ValueError: setting an array element with a sequence
@alysivji alysivji changed the title [BUG] Categorical.__setitem__ allows for tuple assignment BUG: Categorical.__setitem__ allows for tuple assignment Jun 10, 2018
@codecov
Copy link

codecov bot commented Jun 10, 2018

Codecov Report

Merging #21412 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #21412 +/- ## ========================================== + Coverage 91.89% 91.9% +<.01%  ========================================== Files 153 153 Lines 49600 49603 +3 ========================================== + Hits 45580 45586 +6  + Misses 4020 4017 -3
Flag Coverage Δ
#multiple 90.3% <100%> (ø) ⬆️
#single 41.89% <60%> (+0.03%) ⬆️
Impacted Files Coverage Δ
pandas/core/arrays/categorical.py 95.71% <100%> (+0.01%) ⬆️
pandas/tseries/offsets.py 97.24% <0%> (+0.24%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8eb4937...fa9f5d8. Read the comment docs.

@gfyoung gfyoung added Enhancement Categorical Categorical Data Type labels Jun 10, 2018

rvalue = value if is_list_like(value) else [value]

# is the new value one of the defined categories?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this.


pytest.raises(ValueError, f)

def test_setitem_with_tuple_categories(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference the issue as a comment.

@gfyoung
Copy link
Member

gfyoung commented Jun 10, 2018

Should I create an issue?

@alysivji : Yes, by all means!

Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

cc @jreback

@alysivji
Copy link
Contributor Author

With the 0.23.1 release last night, moved my whatsnew entry to 0.23.2

from pandas import Index
to_add = Index(rvalue).difference(self.categories)
if isinstance(value, tuple):
rvalue = [value]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the big problem I have here is that this should be a TypeError if the categories are not actual tuples. We dont' have an easy / reliable way to check this as its just an object Index. This is adding more and more technical debt on top of tupleize_cols.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna look into this more over the weekend / next week.

Thinking of sets, but want to understand tupleize_cols before I go another way.

@alysivji alysivji closed this Jul 16, 2018
@alysivji alysivji deleted the categorical-set-item branch June 17, 2024 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Categorical Categorical Data Type Enhancement

3 participants