-  
 -   Notifications  
You must be signed in to change notification settings  - Fork 19.2k
 
Description
Everything in this issue also applies to Series.groupby and SeriesGroupBy; I will just be writing it for DataFrame.
Currently DataFrame.groupby have two arguments that are essentially for the same thing:
as_index: Whether to include the group keys in the index or, when the groupby is done on column labels (see #49519), in the columns.group_keys: Whether to include the group keys in the index when callingDataFrameGroupBy.apply.
as_index only applies to reductions, group_keys only applies to apply. I think this is confusing and unnecessarily restrictive.
I propose we
- Deprecate both 
as_indexandgroup_keys - Add 
keys_axisto bothDataFrame.groupbyandDataFrameGroupBy.apply; these take the same arguments, the only difference is that the value inDataFrameGroupBy.apply, if specified, overrides the value inDataFrame.groupby. 
keys_axis can accept the following values:
- "infer" (the default): One of the following behaviors, inferred from the computation depending on if it is a reduction, transform, or filter.
 - "index" or 0: Add the keys to the index (similar to 
as_index=Trueorgroup_keys=False) - "columns" or 1: Add the keys to the columns (similar to 
as_index=False) - "none": Don't add the keys to either the index nor the columns. For pandas methods (e.g. 
sum,cumsum,head), reductions will return aRangeIndex, transforms and filters will behave as they do today returning the input's index or a subset of it for a filter. Forapply, this will behave the same asgroup_keys=Falsetoday. 
Unlike as_index, this argument will be respected in all groupby functions whether they be reductions, transforms, or filters.
Path to implementation:
- Add 
keys_axisin 2.0, and either add a PendingDeprecationWarning or a DeprecationWarning to as_index / group_keys - Change warnings for as_index / group_keys to a FutureWarning in 2.1
 - Enforce depredations in 3.0
 
A few natural questions come to mind:
- Why introduce a new argument, why not keep either 
as_indexorgroup_keys? 
Currently these arguments are Boolean, the new argument needs to accept more than two values where the name reflects that it is accepting an axis. Also, adding a new argument provides a cleaner and more gradual path for deprecation.
- Why add 
group_keystoDataFrameGroupBy.apply? 
In other groupby methods, we can reliably use keys_axis="infer" to determine the correct placement of the keys. However in apply, it is inferred from the output, and various cases can coincide - e.g. a reduction and transformation on a DataFrame with a single row. We want the user to be able to use "infer" on other groupby methods, but be able to specify how their UDF in apply acts. E.g.
gb = df.groupby(["a", "b"], keys_axis="infer") print(gb.sum()) # Act as a reduction print(gb.head()) # Act as a filter print(gb.cumsum()) # Act as a transform print(gb.apply(my_udf, keys_axis="index")) # infer from the groupby call is not reliable here, allow user to specify how apply should act - Why should 
keys_axisaccept the value"none"? 
This is currently how transforms and filters work - where the keys are added to neither the index nor the columns. We need to keep the ability to specify to groupby(...).apply that the UDF they are provided acts as a transform or filter.
- Why not name the argument 
group_keys_axis? 
I find "group" here redundant, but would be fine with this name too, and happy to consider other potential names.
cc @pandas-dev/pandas-core @pandas-dev/pandas-triage