-
- Notifications
You must be signed in to change notification settings - Fork 19.3k
Closed
Labels
API DesignCategoricalCategorical Data TypeCategorical Data TypeDtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversionsEnhancement
Milestone
Description
A small, complete example of the issue
This is a proposal to allow something like
df.astype({'A': pd.CategoricalDtype(['a', 'b', 'c', 'd'], ordered=True})Currently, it can be awkward to convert many columns in a DataFrame to a Categorical with control over the categories and orderedness. If you just want to use the defaults, it's not so bad with .astype:
In [5]: df = pd.DataFrame({"A": list('abc'), 'B': list('def')}) In [6]: df Out[6]: A B 0 a d 1 b e 2 c f In [8]: df.astype({"A": 'category', 'B': 'category'}).dtypes Out[8]: A category B category dtype: objectIf you need to control categories or ordered, your best off with
In [20]: mapping = {'A': lambda x: x.A.astype('category').cat.set_categories(['a', 'b'], ordered=True), ...: 'B': lambda x: x.B.astype('category').cat.set_categories(['d', 'f', 'e'], ordered=False)} In [21]: df.assign(**mapping) Out[21]: A B 0 a d 1 b e 2 NaN fBy expanding astype to accept instances of Categorical, you remove the need for the lambdas and you can do conversions of other types at the same time.
This would mirror the semantics in #14503
Updated to change pd.Categorical(...) to a new/modified pd.CategoricalDtype(...) based on the discussion below.
jorisvandenbossche
Metadata
Metadata
Assignees
Labels
API DesignCategoricalCategorical Data TypeCategorical Data TypeDtype ConversionsUnexpected or buggy dtype conversionsUnexpected or buggy dtype conversionsEnhancement