Skip to content

Conversation

aijams
Copy link
Contributor

@aijams aijams commented Sep 29, 2025

In the edge case when called with integer arrays and asked to access non-existent entries (to be replaced with NaN), the take method of NumpyExtensionArray produces arrays whose dtypes don't match their underlying data.
Specifically, take promotes the underlying data to a floating-point type, but doesn't promote the dtype of the extension array to match.
These changes ensure that the result of this method has the correct dtype for its data.

@aijams aijams marked this pull request as draft September 29, 2025 18:11
@rhshadrach rhshadrach added Bug ExtensionArray Extending pandas with custom dtypes or arrays. Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Sep 30, 2025
@aijams
Copy link
Contributor Author

aijams commented Oct 2, 2025

Several tests are currently failing due to a new version of numexpr.
#62545
All other tests pass.
I hard-coded a list of dtypes in take to check for the integer types that can't store NaN values since setting the dtype on an extension array after it's created isn't allowed. I have yet to think of another way to correct this issue without modifying the base extension array class to allow its dtype to be modified.
Let me know if you thought of a better way of approaching this.

@aijams aijams marked this pull request as ready for review October 7, 2025 16:33
fill_value=fill_value,
axis=axis,
)
if self.dtype in [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this on the NumpySemantics instead of this mixin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by "Numpy semantics". I changed the condition to check the numpy_dtype so it can be compared to a list of bare Numpy dtypes. I'm not sure if this is what you meant. Can you clarify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops, meant to write NumpyExtensionArray. Implement a take method there along the lines of

def take(...) if self.dtype.kind in "iub": ... else: return super().take(...) 


@pytest.mark.parametrize("dtype", [np.uint32, np.uint64, np.int32, np.int64])
def test_take_assigns_correct_dtype(dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment pointing back to the motivating GH issue

indices, allow_fill=allow_fill, fill_value=fill_value, axis=axis
)
# See GH#62448.
if self.dtype.numpy_dtype in [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do this check before the super call, and just check self.dtype.kind in "iub"

# TODO: Add the smaller width dtypes to the parameter sets of these tests.
@pytest.mark.parametrize(
"dtype",
[np.uint8, np.uint16, np.uint32, np.uint64, np.int8, np.int16, np.int32, np.int64],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also test bool dtype

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug ExtensionArray Extending pandas with custom dtypes or arrays.

3 participants