@@ -10,6 +10,61 @@ including other versions of pandas.
1010
1111.. ---------------------------------------------------------------------------
1212
13+ .. _whatsnew_151.groupby_categorical_regr :
14+
15+ Behavior of ``groupby `` with categorical groupers (:issue: `48645 `)
16+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17+
18+ In versions of pandas prior to 1.5, ``groupby `` with ``dropna=False `` would still drop
19+ NA values when the grouper was a categorical dtype. A fix for this was attempted in
20+ 1.5, however it introduced a regression where passing ``observed=False `` and
21+ ``dropna=False `` to ``groupby `` would result in only observed categories. It was found
22+ that the patch fixing the ``dropna=False `` bug is incompatible with ``observed=False ``,
23+ and decided that the best resolution is to restore the correct ``observed=False ``
24+ behavior at the cost of reintroducing the ``dropna=False `` bug.
25+
26+ .. ipython :: python
27+
28+ df = pd.DataFrame(
29+ {
30+ " x" : pd.Categorical([1 , None ], categories = [1 , 2 , 3 ]),
31+ " y" : [3 , 4 ],
32+ }
33+ )
34+ df
35+
36+ *1.5.0 behavior *:
37+
38+ .. code-block :: ipython
39+
40+ In [3]: # Correct behavior, NA values are not dropped
41+ df.groupby("x", observed=True, dropna=False).sum()
42+ Out[3]:
43+ y
44+ x
45+ 1 3
46+ NaN 4
47+
48+
49+ In [4]: # Incorrect behavior, only observed categories present
50+ df.groupby("x", observed=False, dropna=False).sum()
51+ Out[4]:
52+ y
53+ x
54+ 1 3
55+ NaN 4
56+
57+
58+ *1.5.1 behavior *:
59+
60+ .. ipython :: python
61+
62+ # Incorrect behavior, NA values are dropped
63+ df.groupby(" x" , observed = True , dropna = False ).sum()
64+
65+ # Correct behavior, unobserved categories present (NA values still dropped)
66+ df.groupby(" x" , observed = False , dropna = False ).sum()
67+
1368 .. _whatsnew_151.regressions :
1469
1570Fixed regressions
0 commit comments