-
-
Couldn't load subscription status.
- Fork 19.2k
Open
Labels
Description
Pandas version checks
- I have checked that the issue still exists on the latest versions of the docs on
mainhere
Location of the documentation
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html
Documentation problem
when you use df.apply with raw=True you can get an error if the applied function returns None for some elements, because of the way underlying numpy infers the array type from the first element.
Example:
import pandas as pd from typing import Optional def func(a: int) -> Optional[int]: if a % 3 == 0: return 1 if a % 3 == 1: return 0 else: return None df = pd.DataFrame([[1], [2], [3], [4], [5], [6]]) print(df.apply(lambda row: func(row[0]), axis=1, raw=True))This will raise an error
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType' On the other hand, if the first returned value is None, numpy creates an array of object which can hold either int or None:
df = pd.DataFrame([2], [3], [4], [5], [6]]) print(df.apply(lambda row: func(row[0]), axis=1, raw=True)) will return
0 None 1 1 2 0 3 None 4 1 dtype: object Suggested fix for documentation
Explain that the function must not return None if raw=True
or treat as a bug fix (i.e. allow specifying type of result ndarray explicitly)