-
- Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
I have read the threads related to DISCUSS/API: setitem-like operations should only update inplace (#39584) and friends (including #47577).
My problem arises from this test code from the Pint-Pandas test suite:
class TestSetitem(base.BaseSetitemTests): def test_setitem_2d_values(self, data): # GH50085 original = data.copy() df = pd.DataFrame({"a": data, "b": data}) df.loc[[0, 1], :] = df.loc[[1, 0], :].values assert (df.loc[0, :] == original[1]).all() assert (df.loc[1, :] == original[0]).all()
As a result of PR #54441 I'm able to test Pint-Pandas with complex128 datatypes. PintArray EAs work perfectly with float and integer magnitudes, but fail in just this one case, with complex128. The failure starts in core/internals/managers.py
in the fast_xs
method:
# GH#46406 immutable_ea = isinstance(dtype, ExtensionDtype) and dtype._is_immutable if isinstance(dtype, ExtensionDtype) and not immutable_ea: cls = dtype.construct_array_type() result = cls._empty((n,), dtype=dtype)
result
becomes a PintArray backed by a FloatingArray :
<PintArray> [<NA>, <NA>] Length: 2, dtype: pint[nanometer] (Pdb) result._data <FloatingArray> [<NA>, <NA>] Length: 2, dtype: Float64
The FloatingArray comes when the PintArray
initializer finds nothing helpful in either dtype
nor values
and falls back to creating a pd.array(values, ...)
:
def __init__(self, values, dtype=None, copy=False): if dtype is None: if isinstance(values, _Quantity): dtype = values.units elif isinstance(values, PintArray): dtype = values._dtype if dtype is None: raise NotImplementedError if not isinstance(dtype, PintType): dtype = PintType(dtype) self._dtype = dtype if isinstance(values, _Quantity): values = values.to(dtype.units).magnitude elif isinstance(values, PintArray): values = values._data if isinstance(values, np.ndarray): dtype = values.dtype if dtype in dtypemap: dtype = dtypemap[dtype] values = pd.array(values, copy=copy, dtype=dtype) copy = False elif not isinstance(values, pd.core.arrays.numeric.NumericArray): values = pd.array(values, copy=copy) if copy: values = values.copy() self._data = values self._Q = self.dtype.ureg.Quantity
The fast_xs fails when result[rl]
is not ready to accept the complex128 data coming from blk.iget((i, loc))
:
for blk in self.blocks: # Such assignment may incorrectly coerce NaT to None # result[blk.mgr_locs] = blk._slice((slice(None), loc)) for i, rl in enumerate(blk.mgr_locs): result[rl] = blk.iget((i, loc))
As I see it, the problem is that we commit too soon to building our backing array with too-limited information.
@andrewgsavage
@topper-123
@jbrockmendel
@mroeschke
Feature Description
I will point out for the record:
- values = [v[loc] for v in self.arrays]
So we have everything we need within the environment of fast_xs. Should we use this knowledge as power to create a result
that can hold slices of data beyond float64? Here's code that tries to use the fast path, but if an exception is raised, it does the sure thing:
try: for blk in self.blocks: # Such assignment may incorrectly coerce NaT to None # result[blk.mgr_locs] = blk._slice((slice(None), loc)) for i, rl in enumerate(blk.mgr_locs): result[rl] = blk.iget((i, loc)) except TypeError: if isinstance(dtype, ExtensionDtype) and not immutable_ea: values = [v[loc] for v in self.arrays] result = cls._from_sequence(values, dtype) else: raise TypeError
Alternative Solutions
I'm open to alternative solutions, but the above actually causes the test case to pass. Should I submit a PR?
Additional Context
No response