-
- Notifications
You must be signed in to change notification settings - Fork 19.2k
[ArrayManager] Array version of putmask logic #44396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ArrayManager] Array version of putmask logic #44396
Conversation
| can the fallback logic stay in internals? maybe internals.methods? there are a few funcs in internals.blocks that might go there too. |
| mask=mask, | ||
| new=new, | ||
| ) | ||
| kwargs = {"mask": mask, "new": new} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not feasible to go through 'apply'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. So I forgot that ArrayManager.apply indeed already has this "align" logic as well. However, that is not used at the moment (only putmask and where define align keys, and those both still go through apply_with_block at the moment; this is also confirmed by codecov that it is unused).
But then I would rather remove it from apply, instead of going through apply, which will also prevent duplication. The reason for doing it in putmask: 1) apply is already quite complex (eg also dealing with ignoring failures), so being able to remove the "alignment" handling there would be nice (can still share this later with where), and 2) putmask can update the existing arrays, while apply always constructs a new manager, and 3) for the CoW branch I need to add copy-on-write logic to putmask, which is not needed for apply in general.
Yes, either way is fine for me. Happy to move it into internals. |
| from pandas.core.arrays._mixins import NDArrayBackedExtensionArray | ||
| | ||
| | ||
| def putmask_flexible(array: np.ndarray | ExtensionArray, mask, new): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is all of this logic from? this doesn't look like 'simple' wrappers to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's logic that for the BlockManager lives on the blocks (the Block.putmask version also calls those putmask_without_repeat and putmask_smart helpers from core.array_algos.putmask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great can u then remove from where it's used now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BlockManager is still used ..
I know this gives some duplication, but the parts that can be shared are already in pandas.core.array_algos.putmask. Both the Block version as this version are using those functions.
I can try to see if there are some more parts that could be moved into array_algos.putmask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so I could easily remove most of the logic of ExtensionBlock.putmask and reuse the new helper function. For base Block.putmask that will be more difficult, as that directly involves splitting blocks if needed etc (so Block-specific logic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me put this another way. you are adding a ton of code to methods, is this generally useful? can we reuse elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are adding a ton of code to methods, is this generally useful? can we reuse elsewhere?
No, it is only for use within array_manager.py, but I put it in a separate internals/methods.py file to not make array_manager.py any longer (and on request of Brock I also didn't put it in /core/array_algos/putmask.py because this is the custom putmask logic for internals (with fallback logic etc that we need for Series/DataFrame), and not a general putmask algo)
| self: T, | ||
| f, | ||
| align_keys: list[str] | None = None, | ||
| align_keys: list[str] | None = None, # not used for ArrayManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not worth having a separate one for AM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean exactly? Separate signature for ArrayManager?
I initially remove this from the signature (since it is not used), but we have this method as an abstract method on the base class manager, and thus it gives typing errors if I remove it here.
| from pandas.core.arrays._mixins import NDArrayBackedExtensionArray | ||
| | ||
| | ||
| def putmask_flexible(array: np.ndarray | ExtensionArray, mask, new): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me put this another way. you are adding a ton of code to methods, is this generally useful? can we reuse elsewhere?
Can you elaborate on this a bit? If we're going to implement CoW for BM anyway, some of the duplication might be avoidable. |
See the 3 lines at https://github.com/pandas-dev/pandas/pull/41878/files#diff-b8c89656832340e9986eeee40df66eb4282555045a257e7a489d98b21b0adfbeR390-R393 that I currently added in that PR. Since putmask is used in indexing, I need to check for a copy-on-write (and this is not needed for
I think that in general the CoW detection on BlockManager will be a separate implementation (but of course difficult to say before actually trying). For example the simplest way is to do this on a block-by-block basis (which would already give different code as the 3 lines I referenced). |
| This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
| I increasingly think that this type of change should be left until right before BlockManager is ripped out. Until then, this just duplicates code and makes it easy for behavior to accidentally drift apart. |
| can you merge main |
| closing as stale |
xref #39146
Trying to remove the usage of
apply_with_blockfor putmask.(I also need to this for the Copy-on-Write branch to be able to perform the copy on write in putmask)