-  
-   Notifications  You must be signed in to change notification settings 
- Fork 19.2k
Description
Problem description
The shape returned by groupby.apply currently depends on the path the internal apply takes (fast apply vs slow apply) which is opaque to the user. The following examples show the behaviour for two simple examples where the fast path returns the group key as index while the slow path returns the original index.
Code Sample, a copy-pastable example if possible
import pandas as pd df = pd.DataFrame({'A': [0, 0, 1], 'b': range(3)}) def slow(group): # slow apply because of check `result is input`, c.f. https://github.com/pandas-dev/pandas/blob/44782c0809e296a8e57b7f77d963e999c7e0f4a7/pandas/_libs/reduction.pyx#L494 return group def fast(group): return group.copy() ...: df.groupby("A").apply(slow) ...: Out[3]: A B 0 0 0 1 0 1 2 1 2 In [4]: df.groupby("B").apply(fast) Out[4]: A B B 0 0 0 0 1 1 0 1 Expected Output
A transparent and consistent output data type. In particular, the behaviour should not be coupled to private, performance related internals.
Related issues
There are a lot of related issues which may or may not have the same root cause. Here an excerpt
Output of pd.show_versions()
 INSTALLED VERSIONS
commit : None
 python : 3.8.1.final.0
 python-bits : 64
 OS : Darwin
 OS-release : 18.7.0
 machine : x86_64
 processor : i386
 byteorder : little
 LC_ALL : None
 LANG : None
 LOCALE : None.UTF-8
pandas : 1.0.0
 numpy : 1.16.5
 pytz : 2019.3
 dateutil : 2.8.1
 pip : 20.0.2
 setuptools : 45.1.0.post20200119
 Cython : None
 pytest : None
 hypothesis : None
 sphinx : None
 blosc : None
 feather : None
 xlsxwriter : None
 lxml.etree : None
 html5lib : None
 pymysql : None
 psycopg2 : None
 jinja2 : None
 IPython : 7.12.0
 pandas_datareader: None
 bs4 : None
 bottleneck : None
 fastparquet : None
 gcsfs : None
 lxml.etree : None
 matplotlib : None
 numexpr : None
 odfpy : None
 openpyxl : None
 pandas_gbq : None
 pyarrow : None
 pytables : None
 pytest : None
 pyxlsb : None
 s3fs : None
 scipy : None
 sqlalchemy : None
 tables : None
 tabulate : None
 xarray : None
 xlrd : None
 xlwt : None
 xlsxwriter : None
 numba : None