-
- Notifications
You must be signed in to change notification settings - Fork 19.2k
Open
Labels
BugCategoricalCategorical Data TypeCategorical Data TypeMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatePDEP missing valuesIssues that would be addressed by the Ice Cream Agreement from the Aug 2023 sprintIssues that would be addressed by the Ice Cream Agreement from the Aug 2023 sprint
Description
From #27929 (comment). In __getitem__ or Categorical.min(..), we always return np.nan as scalar missing value, regardless of the dtype:
In [7]: cat = pd.Categorical([pd.Timestamp("2012"), None], ordered=True) In [8]: cat Out[8]: [2012-01-01, NaT] Categories (1, datetime64[ns]): [2012-01-01] In [9]: cat[1] Out[9]: nan In [10]: cat.min(skipna=False) Out[10]: nan In the above, this could also be pd.NaT instead?
(similar issue will come up once we can use the EAs that use the new NA scalar in categoricals)
However, CategoricalDtype.na_value now also returns np.nan (which should be consistent with what we return in the cases above):
In [13]: cat.dtype.na_value Out[13]: nan We can of course let the CategoricalDtype.na_value be dependent on the na_value of the dtype of the categories. But I am not fully sure we want such values-dependent behaviour?
Metadata
Metadata
Assignees
Labels
BugCategoricalCategorical Data TypeCategorical Data TypeMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatePDEP missing valuesIssues that would be addressed by the Ice Cream Agreement from the Aug 2023 sprintIssues that would be addressed by the Ice Cream Agreement from the Aug 2023 sprint