-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
BUG: Add fillna at the beginning of _where not to fill NA. #60729 #60772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
bc9a942 558569f bbbc720 e2f32cb 6fd8986 d2d5f62 475f2d1 db30b58 55fe420 cb94cf7 71e442e b6bd3af 8bac997 89bc1b4 f154cf5 eed6121 9ac81f0 7e3fd3a 8c5ffff 5516517 2437ce2 9556aa4 b64b8a7 9574746 c073c0b 4eea08e bbc5612 0851593 98fb602 915b8a7 7611f59 88b0530 044d0a9 601f6c9 4528a54 acf805a 3239d2f f7a1e5a f3ed54d 16dda53 dfa96ac e557c05 7e0d4e0 85f35e9 e2df2d0 0526b02 ffd15b7 17b83cb 350b75d e7f8fa7 506f39d 8b0c385 23934ec 9cd462e File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -9698,15 +9698,19 @@ def _where( | |
| if axis is not None: | ||
| axis = self._get_axis_number(axis) | ||
| | ||
| cond = common.apply_if_callable(cond, self) | ||
| | ||
| # We should not be filling NA. See GH#60729 | ||
| if isinstance(cond, np.ndarray): | ||
| cond = np.array(cond) | ||
| cond[np.isnan(cond)] = True | ||
| cond[isna(cond)] = True | ||
| elif isinstance(cond, NDFrame): | ||
| cond = cond.fillna(True) | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Below we Detailss = Series([1.0, 2.0, 3.0, 4.0]) cond = Series([True, False]) print(s.mask(cond)) # 0 NaN # 1 2.0 # 2 NaN # 3 NaN # dtype: float64 s.mask(cond, inplace=True) print(s) # 0 NaN # 1 2.0 # 2 3.0 # 3 4.0 # dtype: float64 Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @rhshadrach , My suggestion is to add an argument to Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, that makes sense to me. But I believe that's only necessary to fix the bug mentioned above and not necessarily for this PR. If that is the case, I think it should be handled separately. | ||
| elif isinstance(cond, (list, tuple)): | ||
| ||
| cond = np.array(cond) | ||
| cond[isna(cond)] = True | ||
| | ||
| # align the cond to same shape as myself | ||
| cond = common.apply_if_callable(cond, self) | ||
| if isinstance(cond, NDFrame): | ||
| # CoW: Make sure reference is not kept alive | ||
| if cond.ndim == 1 and self.ndim == 2: | ||
| | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this trying to fill missing values when NaN is the missing value indicator? I don't think that is right either - the missing values should propogate for all types. We may just be missing coverage for the NaN case (which should be added to the test)
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, @WillAyd .
I thought we could make the values propagate by filling
condwithTrue, since_where()would finally keep the values inselfalive where itscondisTrue.Even if I don't fill those values here,
_wherewould callfillna()using inplace at the below code. That's also why the result varies depending on whetherinpalce=Trueor not.pandas/pandas/core/generic.py
Lines 9695 to 9698 in e3b2de8
Could you explain in more detail what you mean by propagate for all type? Do you mean we need to keep NA as it is even after this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @WillAyd,
I've done some further investigations on this, but I still belive the current code is the simplest way to make the missing values propagate.
If we want to let NA propagate without calling
fillna()here, there might be too many code changes needed. See below codes :where()ormask(). If we don't,fillna()will fill them withinplace.pandas/pandas/core/generic.py
Lines 9695 to 9698 in f1441b2
to_numpy()will fill the missing value usinginplacewhen cond is a DataFrame.pandas/pandas/core/generic.py
Lines 9703 to 9716 in f1441b2
extract_bool_array()fills the missing values using argna_value=FalseatEABackedBlock.where(), we might need to find every single NA index from cond before we call this function(using isna() for example) and then implement additional behaviour to make those values propagate atExtensionArray._where().pandas/pandas/core/internals/blocks.py
Lines 1664 to 1668 in f1441b2
pandas/pandas/core/array_algos/putmask.py
Lines 116 to 127 in f1441b2
If _where() is trying to fill the missing values for cond anyway, I think we don't necessarily have to disfavour the current code change. Could you give me some feedback?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd
By filling in the missing values on
condwith True, the missing value in the caller propagates. It's not filling in this missing values oncondthat then fails to properly propagate the caller's missing value.