-
- Notifications
You must be signed in to change notification settings - Fork 19.2k
BUG: Raise TypeError for mismatched signed/unsigned dtypes in IntervalIndex.from_arrays #62376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Raise TypeError for mismatched signed/unsigned dtypes in IntervalIndex.from_arrays #62376
Conversation
…lIndex.from_arrays
| @jbrockmendel please review these changes when you get a chance. |
pandas/core/indexes/interval.py Outdated
| copy: bool = False, | ||
| dtype: Dtype | None = None, | ||
| ) -> IntervalIndex: | ||
| # Check for mismatched signed/unsigned integer dtypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it would make more sense to do this on the IntervalArray.from_arrays method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I will update changes in IntervalArray.from_arrays
| @jbrockmendel I’ve updated this as per your suggestions. |
pandas/core/arrays/interval.py Outdated
| ) -> Self: | ||
| # Check for mismatched signed/unsigned integer dtypes | ||
| left_dtype = getattr(left, "dtype", None) | ||
| right_dtype = getattr(right, "dtype", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think putting this after the _maybe_convert_platform_interval calls would be more robust. e.g. if one is a list and the other is uint64?
Also is it just int vs uint we care about, or also e.g. int32 vs int64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are checking int vs unit .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason not to move this to after the _maybe_convert_platform_interval calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, i have updated, also added whatsnew note, please let me know if this needs improvement.
pandas/core/arrays/interval.py Outdated
| | ||
| # Check for mismatched signed/unsigned integer dtypes | ||
| left_dtype = getattr(left, "dtype", None) | ||
| right_dtype = getattr(right, "dtype", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the getattr should be unnecessary. the attribute should always be there now that this is moved to after
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, got it. i have updated the code and removed none check for dtype.
| @jbrockmendel could you please check if this CI failure is related to the changes in this PR?
|
c756fbc to 5b67c6e Compare
I'm pretty sure that's affecting all PRs and is unrelated to this. |
pandas/core/arrays/interval.py Outdated
| if ( | ||
| left_dtype.kind in "iu" | ||
| and right_dtype.kind in "iu" | ||
| and left_dtype.kind != right_dtype.kind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just compare if left.dtype != right.dtype at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this will be more clear . i will update with this .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel i guess we should not use if left.dtype != right.dtype as this will only restrict to int64 vs uint64 and cause failure in other cases . should i revert the changes to previous one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what cases? im pretty sure we always want matching dtypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=========================== short test summary info ============================ FAILED pandas/tests/indexes/categorical/test_astype.py::TestAstype::test_astype - TypeError: Left and right arrays must have matching dtypes. Got float64 and int64. FAILED pandas/tests/indexes/interval/test_constructors.py::TestFromArrays::test_mixed_float_int[int64-float64] - TypeError: Left and right arrays must have matching dtypes. Got int64 and float64. FAILED pandas/tests/indexes/interval/test_constructors.py::TestFromArrays::test_mixed_float_int[float64-int64] - TypeError: Left and right arrays must have matching dtypes. Got float64 and int64. FAILED got these these check fails, after applying changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it is casting mixed int/float to float/float in ensure_simple_new_inputs. So putting this check after that should do the trick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, that make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to confirm before applying changes , i should move these check after ensure_simple_new_inputs, or should I use a strict if left.dtype != right.dtype check after that step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think so yes
pandas/core/arrays/interval.py Outdated
| dtype=dtype, | ||
| ) | ||
| | ||
| # Check for mismatched signed/unsigned integer dtypes after casting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im confused as to why this affects from_arrays, since that goes through simple_new and not __new__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure but could it be as in simple_new- IntervalMixin.__new__ is called not IntervalArray.__new__ .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seeing as how the added test is failing, my guess is "it doesn't". Maybe put the check at the end of _ensure_simple_new_inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, i'll move at the end of _ensure_simple_new_inputs.
| @Aniketsy this is looking pretty good. can you merge main and ping on green |
| @jbrockmendel done, any idea why these 3 checks got cancelled? |
| Those are unrelated and affecting all PRs right now. |
| thanks @Aniketsy |
…lIndex.from_arrays (pandas-dev#62376)
This PR adds a check in I
ntervalIndex.from_arraysto raise aTypeErrorwhen integer arrays have mismatched signedness. Includes a unit test to verify the behavior.Please let me know if my approach or fix needs any improvements . I’m open to feedback and happy to make changes based on suggestions.
Thankyou !