-
-
Couldn't load subscription status.
- Fork 19.2k
Description
Follow-up issue on #18577
In that PR @jreback cleaned up the apply(..., axis=1) result shape inconsistencies, and we added a keyword to control this.
For example, when the applied function returns an array or a list, it now defaults to returning a Series of those objects, or expanding it to multiple columns if you pass result_type explicitly:
In [1]: df = pd.DataFrame(np.tile(np.arange(3), 4).reshape(4, -1) + 1, columns=['A', 'B', 'C'], index=pd.date_range("2012-01-01", periods=4)) In [2]: df Out[2]: A B C 2012-01-01 1 2 3 2012-01-02 1 2 3 2012-01-03 1 2 3 2012-01-04 1 2 3 In [3]: df.apply(lambda x: np.array([0, 1, 2]), axis=1) Out[3]: 2012-01-01 [0, 1, 2] 2012-01-02 [0, 1, 2] 2012-01-03 [0, 1, 2] 2012-01-04 [0, 1, 2] Freq: D, dtype: object In [4]: df.apply(lambda x: np.array([0, 1, 2]), axis=1, result_type='expand') Out[4]: 0 1 2 2012-01-01 0 1 2 2012-01-02 0 1 2 2012-01-03 0 1 2 2012-01-04 0 1 2 In [5]: df.apply(lambda x: np.array([0, 1, 2]), axis=1, result_type='broadcast') Out[5]: A B C 2012-01-01 0 1 2 2012-01-02 0 1 2 2012-01-03 0 1 2 2012-01-04 0 1 2 However, for axis=0, the default, we don't yet follow the same rules / the keyword in all cases. Some examples:
-
For list, it depends on the length (and if the length matches, it preserves the original index instead of new range index):
In [16]: df.apply(lambda x: [0, 1, 2, 3]) Out[16]: A B C 2012-01-01 0 0 0 2012-01-02 1 1 1 2012-01-03 2 2 2 2012-01-04 3 3 3 In [17]: df.apply(lambda x: [0, 1, 2, 3, 4]) Out[17]: A [0, 1, 2, 3, 4] B [0, 1, 2, 3, 4] C [0, 1, 2, 3, 4] dtype: object(
result_type='expand'andresult_type='broadcast'do work correctly here) -
For an array, it expands when the length does not match (so different as for
axis=1, and also different as for list):In [23]: df.apply(lambda x: np.array([0, 1, 2, 3])) Out[23]: A B C 2012-01-01 0 0 0 2012-01-02 1 1 1 2012-01-03 2 2 2 2012-01-04 3 3 3 In [24]: df.apply(lambda x: np.array([0, 1, 2, 3, 4])) Out[24]: A B C 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4
So the question is: should we follow the same rules for axis=0 as for axis=1?
I would say: ideally yes. But doing so might break some behaviour (although it might be possible to do that with warnings).