-
- Notifications
You must be signed in to change notification settings - Fork 19.2k
ENH: support downcasting of nullable EAs in pd.to_numeric #38746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this one!
pandas/core/tools/numeric.py Outdated
| values = arg.values | ||
| if is_extension_array_dtype(arg) and isinstance(values, NumericArray): | ||
| is_numeric_extension_dtype = True | ||
| values = extract_array(arg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this extract_array line needed? (the values = arg.values from above should have worked fine, I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not, I reverted this bit
pandas/core/tools/numeric.py Outdated
| else: | ||
| values = arg | ||
| | ||
| if is_extension_array_dtype(arg) and isinstance(values, NumericArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this really should be part of the above logic. also mask needs to be defined for all cases (can be default to None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/core/tools/numeric.py Outdated
| | ||
| if is_extension_array_dtype(arg) and isinstance(values, NumericArray): | ||
| is_numeric_extension_dtype = True | ||
| mask, values = values._mask, values._data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you likely need to just take the masked values only for the following block of code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pandas/core/tools/numeric.py Outdated
| if values.dtype == dtype: | ||
| break | ||
| | ||
| if is_numeric_extension_dtype: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L194 might need to handle the mask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean?
float_32_ind = typecodes.index(float_32_char) I feel like there's a testcase I haven't looked at if yes (the ones I have work as is)
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
| Green + addressed comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also update the doc-string of to_numeric as well as add some examples.
| ([-1, -1], "Int32", "unsigned", "Int32"), | ||
| ([1, 1], "Float64", "float", "Float32"), | ||
| ([1, 1.1], "Float64", "float", "Float32"), | ||
| ([1, 1], "Float64", "integer", "Int8"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: maybe move this one up to the other "integer" downcast cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| ([-1, -1], "Int32", "unsigned", "Int32"), | ||
| ([1, 1], "Float64", "float", "Float32"), | ||
| ([1, 1.1], "Float64", "float", "Float32"), | ||
| ([1, 1], "Float64", "integer", "Int8"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| 2 2.0 | ||
| 3 -3.0 | ||
| dtype: float64 | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring updated.
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @arw2019
| 2 3 | ||
| dtype: Int8 | ||
| >>> s = pd.Series([1.0, 2.1, 3.0], dtype="Float64") | ||
| >>> pd.to_numeric(s, downcast="float") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may want to also accept Float and Integer as aliases for float | integer (separate issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should do that, since those are not actually dtypes, but rather values to a downcast keyword which eg also accepts "signed"/"unsigned"
| Thanks @arw2019 ! |
black pandasgit diff upstream/master -u -- "*.py" | flake8 --diffPicking up #33435.
In
pd.to_numericwhen encountering anIntegerArrayandFloatingArray(orSeriesbuilt from them) we downcast_datathen reconstruct the array using the downcast_dataand the_maskfrom the original array.