API: Consolidate groupby as_index and group_keys

Everything in this issue also applies to Series.groupby and SeriesGroupBy; I will just be writing it for DataFrame.

Currently DataFrame.groupby have two arguments that are essentially for the same thing:

as_index: Whether to include the group keys in the index or, when the groupby is done on column labels (see #49519), in the columns.
group_keys: Whether to include the group keys in the index when calling DataFrameGroupBy.apply.

as_index only applies to reductions, group_keys only applies to apply. I think this is confusing and unnecessarily restrictive.

I propose we

Deprecate both as_index and group_keys
Add keys_axis to both DataFrame.groupby and DataFrameGroupBy.apply; these take the same arguments, the only difference is that the value in DataFrameGroupBy.apply, if specified, overrides the value in DataFrame.groupby.

keys_axis can accept the following values:

"infer" (the default): One of the following behaviors, inferred from the computation depending on if it is a reduction, transform, or filter.
"index" or 0: Add the keys to the index (similar to as_index=True or group_keys=False)
"columns" or 1: Add the keys to the columns (similar to as_index=False)
"none": Don't add the keys to either the index nor the columns. For pandas methods (e.g. sum, cumsum, head), reductions will return a RangeIndex, transforms and filters will behave as they do today returning the input's index or a subset of it for a filter. For apply, this will behave the same as group_keys=False today.

Unlike as_index, this argument will be respected in all groupby functions whether they be reductions, transforms, or filters.

Path to implementation:

Add keys_axis in 2.0, and either add a PendingDeprecationWarning or a DeprecationWarning to as_index / group_keys
Change warnings for as_index / group_keys to a FutureWarning in 2.1
Enforce depredations in 3.0

A few natural questions come to mind:

Why introduce a new argument, why not keep either as_index or group_keys?

Currently these arguments are Boolean, the new argument needs to accept more than two values where the name reflects that it is accepting an axis. Also, adding a new argument provides a cleaner and more gradual path for deprecation.

Why add group_keys to DataFrameGroupBy.apply?

In other groupby methods, we can reliably use keys_axis="infer" to determine the correct placement of the keys. However in apply, it is inferred from the output, and various cases can coincide - e.g. a reduction and transformation on a DataFrame with a single row. We want the user to be able to use "infer" on other groupby methods, but be able to specify how their UDF in apply acts. E.g.

gb = df.groupby(["a", "b"], keys_axis="infer") print(gb.sum()) # Act as a reduction print(gb.head()) # Act as a filter print(gb.cumsum()) # Act as a transform print(gb.apply(my_udf, keys_axis="index")) # infer from the groupby call is not reliable here, allow user to specify how apply should act

Why should keys_axis accept the value "none"?

This is currently how transforms and filters work - where the keys are added to neither the index nor the columns. We need to keep the ability to specify to groupby(...).apply that the UDF they are provided acts as a transform or filter.

Why not name the argument group_keys_axis?

I find "group" here redundant, but would be fine with this name too, and happy to consider other potential names.

cc @pandas-dev/pandas-core @pandas-dev/pandas-triage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: Consolidate groupby as_index and group_keys #49543

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

API: Consolidate groupby as_index and group_keys #49543

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions