Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
7a72a06
Added test to check the dtype of the result of take method. Modified …
aijams Sep 29, 2025
65511cd
Added bug info to docs.
aijams Sep 29, 2025
42898c6
Added conditional to handle integer arrays in take method as special …
aijams Oct 2, 2025
333ba45
Merge remote-tracking branch 'upstream/main' into aijams-take-functio…
aijams Oct 2, 2025
3ca2e84
Added cases for smaller integer types to mixins take function.
aijams Oct 6, 2025
b6e45ac
Merge remote-tracking branch 'upstream/main' into aijams-take-functio…
aijams Oct 6, 2025
e0ff0d0
Merge remote-tracking branch 'upstream/main' into aijams-take-functio…
aijams Oct 7, 2025
eeb3d85
Added note to take method.
aijams Oct 9, 2025
cd2f2aa
Merge remote-tracking branch 'upstream/main' into aijams-take-functio…
aijams Oct 14, 2025
468de3f
Added link to issue in test for take method.
aijams Oct 15, 2025
91442a8
Moved changes to take method to NumpyExtensionArray class.
aijams Oct 16, 2025
aa3c228
Cleaned up work comments.
aijams Oct 16, 2025
8bacb94
Merge remote-tracking branch 'upstream/main' into aijams-take-functio…
aijams Oct 16, 2025
56a329a
Merge remote-tracking branch 'upstream/main' into aijams-take-functio…
aijams Oct 17, 2025
040c127
Merge remote-tracking branch 'upstream/main' into aijams-take-functio…
aijams Oct 20, 2025
1bdb1f8
Merge remote-tracking branch 'upstream/main' into aijams-take-functio…
aijams Oct 21, 2025
3e5836c
Tests for take method check against object dtype for boolean inputs.
aijams Oct 21, 2025
b4258f2
Removed TODO comment.
aijams Oct 21, 2025
e4c3a5b
Merge remote-tracking branch 'upstream/main' into aijams-take-functio…
aijams Oct 22, 2025
acdfb62
Updated references to numpy.bool to include underscore.
aijams Oct 22, 2025
fd56024
Merge remote-tracking branch 'upstream/main' into aijams-take-functio…
aijams Oct 23, 2025
8dcf8b2
Merge remote-tracking branch 'upstream/main' into aijams-take-functio…
aijams Oct 28, 2025
f0886de
Merge remote-tracking branch 'upstream/main' into aijams-take-functio…
aijams Oct 29, 2025
c174833
Merge remote-tracking branch 'upstream/main' into aijams-take-functio…
aijams Nov 4, 2025
aa1c58b
Take method now gets underlying array of result to build a new extens…
aijams Nov 4, 2025
79c3116
Removed dtype preservation tests for take, except for boolean case.
aijams Nov 11, 2025
3edbc31
Merge remote-tracking branch 'upstream/main' into aijams-take-functio…
aijams Nov 11, 2025
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Cleaned up work comments.
  • Loading branch information
aijams committed Oct 16, 2025
commit aa3c228927ad82d0f29dbf5e8a920e001b383a8d
77 changes: 0 additions & 77 deletions pandas/core/arrays/_mixins.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,83 +155,6 @@ def view(self, dtype: Dtype | None = None) -> ArrayLike:
# "ExtensionDtype | dtype[Any]"; expected "dtype[Any] | _HasDType[dtype[Any]]"
return arr.view(dtype=dtype) # type: ignore[arg-type]


# Notes on take method fix
# Please remove these once this fix is ready to submit.
# =======================================================================

# One of the base classes to this class: ExtensionArray, provides
# the dtype property, but abstractly, so it leaves the implementation
# of dtype storage up to its derived classes. Some of these derived
# classes don't provide a setter method for their dtype property, so
# I can't set the dtype here and expect it to work for all classes that
# inherit this take method.

# How can I produce an extension array of the same type as self,
# having a floating-point dtype if self has an integer dtype or otherwise
# the same dtype as self?

# Constructing a new object of the same type as self doesn't always
# work since the constructors of some derived classes of this class
# don't accept a dtype parameter, which I need to pass to set the
# result's dtype to a floating-point type.

# All tests pass when I create a new extension array object with the
# appropriate dtype (in the integer-dtype source case), however MyPy
# complains about the missing dtype argument in the call to type(self)
# below. By creating a new array object, this call produces an array
# with a floating point dtype, even when the source dtype is integral.
# I think this happens because the new array is created with the newly
# produced data from the underlying take method, which has the
# appropriate underlying dtype.

# Essentially, these extension arrays are wrappers around Numpy arrays
# which have their own dtype and store the data. Thus, the new
# extension array inherits the dtype from the Numpy array used
# to create it.

# Unfortunately, some of the derived constructors of this class have a
# positional dtype argument, while some do not. If I call a constructor
# without specifying this argument, mypy will complain about the
# missing argument in the case of constructors that require it, but
# if I call the constructor with the dtype argument, the constructors
# that don't have it will fail at runtime since they don't recognize
# it.

# How can I get around this issue?
# Ideas:
# Modify the extension array type to allow modification of its dtype
# after construction.

# Add a conditional branch to this method to call derived constructors
# with or without the dtype argument, depending on their class.
# This approach has the disadvantage of hardcoding information about
# derived classes in this base class, which means that if someone
# changes a constructor of a derived class to remove the dtype argument,
# this method will break.

# Classes derived from this class include:

# Categorical
# DatetimeLikeArrayMixin
# DatelikeOps
# PeriodArray
# DatetimeArray
# TimelikeOps
# TimedeltaArray
# NumpyExtensionArray
# StringArray

# The types of extension arrays (within Pandas) derived from this class are:
# Class name Constructor takes dtype argument Dtype argument required
# Categorical yes no
# PeriodArray yes no
# DatetimeArray
# TimedeltaArray
# StringArray yes no
# NumpyExtensionArray no no


def take(
self,
indices: TakeIndexer,
Expand Down
49 changes: 6 additions & 43 deletions pandas/core/arrays/numpy_.py
Original file line number Diff line number Diff line change
Expand Up @@ -363,6 +363,9 @@ def take(
Take entries from this array at each index in a list of indices,
producing an array containing only those entries.
"""
result = super().take(
indices, allow_fill=allow_fill, fill_value=fill_value, axis=axis
)
# See GH#62448.
if self.dtype.numpy_dtype in [
np.uint8,
Expand All @@ -372,51 +375,11 @@ def take(
np.int8,
np.int16,
np.int32,
np.int64
np.int64,
]:
# In this case, the resulting extension array should have a floating-point
# dtype to match the result of the underlying take method when
# NaN values need to be incorporated into it.
# This occurs when allow_fill is True and fill_value is None.
# (fill_value may be an arbitrary Python object, in which case
# the result will be an array of objects.)

# Call the take method of NDArrayBackedExtensionArray

# TODO: How is the dtype of a newly constructed NumpyExtensionArray set?
# It's set to match the dtype of its underlying array.

result = super().take(
indices,
allow_fill=allow_fill,
fill_value=fill_value,
axis=axis
)
return type(self)(result, copy=False)

# In this case, the resulting extension array will have a dtype
# that matches that of the underlying Numpy array and we can link
# to the underlying array without manipulating the extension's
# dtype.

return super().take(
indices,
allow_fill=allow_fill,
fill_value=fill_value,
axis=axis
)
# result array dtype = self dtype

# Implementation steps:
# Determine requirements for this method, including:
# Argument types [done]
# Return type [done]
# Return dtype [done]
# Write tests to check whether this method satisfies these requirements. [done]
# Figure out what base class method to call to implement the take functionality. [done]
# Implement the call. [done]
# Check whether this method satisfies its requirements by running the tests.


return result

# ------------------------------------------------------------------------
# Reductions
Expand Down
11 changes: 9 additions & 2 deletions pandas/tests/arrays/numpy_/test_numpy.py
Original file line number Diff line number Diff line change
Expand Up @@ -326,7 +326,10 @@ def test_factorize_unsigned():


# TODO: Add the smaller width dtypes to the parameter sets of these tests.
@pytest.mark.parametrize("dtype", [np.uint8, np.uint16, np.uint32, np.uint64, np.int8, np.int16, np.int32, np.int64])
@pytest.mark.parametrize(
"dtype",
[np.uint8, np.uint16, np.uint32, np.uint64, np.int8, np.int16, np.int32, np.int64],
)
def test_take_assigns_floating_point_dtype(dtype):
# GH#62448.
array = NumpyExtensionArray(np.array([1, 2, 3], dtype=dtype))
Expand All @@ -339,7 +342,11 @@ def test_take_assigns_floating_point_dtype(dtype):

assert result.dtype.numpy_dtype == np.float64

@pytest.mark.parametrize("dtype", [np.uint8, np.uint16, np.uint32, np.uint64, np.int8, np.int16, np.int32, np.int64])

@pytest.mark.parametrize(
"dtype",
[np.uint8, np.uint16, np.uint32, np.uint64, np.int8, np.int16, np.int32, np.int64],
)
def test_take_assigns_integer_dtype_when_fill_disallowed(dtype):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this test to make sure that take does the expected thing when not allowed to fill missing values. I did a quick search, but I didn't find any other tests for this. There are other tests that work with this method, but I would like to have a test explicitly for this behavior to check for regressions and make it clear that take should preserve dtype if it can. If you still think this test isn't needed, can you explain why you think so?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is tested in the extension tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aijams if you address this we can get this merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a test in BaseGetitemTests that tests the same case I tested, except for the case of boolean data. However, these tests didn't run under Pytest when I ran it. I removed my test, however I still think the case for boolean data should be tested.

# GH#62448.
array = NumpyExtensionArray(np.array([1, 2, 3], dtype=dtype))
Expand Down