-
- Notifications
You must be signed in to change notification settings - Fork 19.2k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
It is a pretty common occurrence to have leading and trailing NaN values in a table or DataFrame. This is particularly true after joins and in timeseries data.
import numpy as np import pandas as pd df1 = pd.DataFrame({ 'a': [1, 2, 3, 4, 5, 6], 'b': [np.NaN, 2, np.NaN, 4, 5, np.NaN], }) Out[0]: a b 0 1 NaN 1 2 2.0 2 3 NaN 3 4 4.0 4 5 5.0 5 6 NaNSee this stack overflow question for more examples (and common workarounds).
Feature Description
Potential solution:
df1.stripna() Out[2]: a b 1 2 2.0 2 3 NaN 3 4 4.0 4 5 5.0Potential kwargs to pass could be:
- how : {‘any’, ‘all’}, default ‘any’
- axis : {0 or ‘index’, 1 or ‘columns’}, default 0
- subset : column label or sequence of labels, optional
- limit_direction : {{‘forward’, ‘backward’, ‘both’}}, Optional
Alternative Solutions
Another solution would be to add area_limit (as in ffill, bfill and interpolate) to dropna. From the point of view of extending the API this is probably more intuitive for those with wider pandas knowledge of ffill, bfill and interpolate, however, I would imagine the source code behind dropna is written in an element-wise manner so there might be a lot of work to extend it. Just an uneducated guess?
For those coming from a more pure python world, stripna is pretty intuitive as more are aware of str.strip.
Additional Context
No response