Issue27145
This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on 2016-05-28 15:58 by Oren Milman, last changed 2022-04-11 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| proposedPatches.diff | Oren Milman, 2016-05-28 15:58 | proposed patches diff file | review | |
| CPythonTestOutput.txt | Oren Milman, 2016-05-28 15:59 | test output of CPython without my patches (tested on my PC) | ||
| patchedCPythonTestOutput.txt | Oren Milman, 2016-05-28 15:59 | test output of CPython with my patches (tested on my PC) | ||
| proposedPatches.diff | Oren Milman, 2016-06-03 14:17 | proposed patches diff file update 1 | review | |
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 15716 | merged | hongweipeng, 2019-09-06 09:27 | |
| Messages (5) | |||
|---|---|---|---|
| msg266557 - (view) | Author: Oren Milman (Oren Milman) * | Date: 2016-05-28 15:58 | |
------------ the current state ------------ >>> if is32BitCPython: ... PyLong_SHIFT = 15 ... elif is64BitCPython: ... PyLong_SHIFT = 30 ... >>> ##### case A ##### >>> a = 2 ** PyLong_SHIFT - 1 >>> b = 2 ** PyLong_SHIFT - 2 >>> a - b 1 >>> a - b is 1 True >>> a + (-b) is 1 True >>> >>> ##### case B ##### >>> a = 2 ** PyLong_SHIFT >>> b = 2 ** PyLong_SHIFT - 1 >>> a - b 1 >>> a - b is 1 False >>> a + (-b) is 1 False >>> >>> ##### case C ##### >>> a = 2 ** PyLong_SHIFT + 1 >>> b = 2 ** PyLong_SHIFT >>> a - b 1 >>> a - b is 1 False >>> a + (-b) is 1 False >>> This behavior is caused by the implementation of long_add and long_sub: Both long_add and long_sub check whether both a and b are single-digit ints, and then do (respectively): return PyLong_FromLong(MEDIUM_VALUE(a) + MEDIUM_VALUE(b)); or return PyLong_FromLong(MEDIUM_VALUE(a) - MEDIUM_VALUE(b)); Otherwise, long_add and long_sub call x_add or x_sub to do a calculation on the absolute values of a and b. At last, long_add and long_sub negate the result of the calculation, if needed, and return the final result. Where both a and b are single-digit ints (e.g. case A), the result of the calculation is passed to PyLong_FromLong, which uses CHECK_SMALL_INT, and so a reference to an element of small_ints is returned where appropriate. Where a and/or b are not single-digit ints (e.g. cases B and C), x_add or x_sub is called. Both x_add and x_sub don't check whether the result is a small int (except for a case in x_sub where the result is zero), and so long_add and long_sub might return a new int, even where an element of small_ints could be reused. Due to the way CPython uses them, the issue is relevant to x_sub and not to x_add, as the calculation the former performs might result in a small int, while that of the latter would always result in a multiple-digit int. (Except for being called by long_add and long_sub, x_add might be called by k_mul, but in that case also the calculation would certainly result in a multiple-digit int.) ------------ Note ------------ I am not sure whether this is actually an issue that we want to fix. It seems to me that my proposed changes introduce a slight performance gain (in terms of memory, and probably also speed), and a more consistent behavior of CPython. The performance gain would probably be much more relevant if and when greater default values are chosen for NSMALLNEGINTS and NSMALLPOSINTS. Anyway, I guess documenting the issue here, along with a proposal for a fix, is better than nothing. (As far as I know, since the unification of int and long in revision 40626, this issue never came up.) ------------ the proposed changes ------------ All of the proposed changes are in Objects/longobject.c: 1. in x_sub: To make sure x_sub returns a small int where appropriate, I simply wrapped the return value of x_sub with the function maybe_small_long. 2. in long_sub: The previous patch alone would create a nasty bug. In case both a and b are negative, long_sub calls x_sub, and then negates the result in-place by doing 'Py_SIZE(z) = -(Py_SIZE(z));'. If x_sub returned a reference to a statically allocated small int (which is not zero), long_sub would actually change that statically allocated small int. To prevent that, I replaced that in-place negating with a call to _PyLong_Negate. 3. in _PyLong_Negate: The previous patches, along with http://bugs.python.org/issue27073 (another issue I have opened recently), would cause long_sub to call _PyLong_Negate for a zero int, in case a and b are the same multiple-digit negative int. Moreover, in the default CPython branch, in case long_mul receives a multiple-digit negative int and zero, long_mul would call _PyLong_Negate for a zero int. To prevent doing 'PyLong_FromLong(-MEDIUM_VALUE(x))' where x is a zero int, I have added a check before that (along with a little addition to the function comment), so that _PyLong_Negate would do nothing if x is a zero int. It should be noted that MEDIUM_VALUE also checks whether x is a zero int (for its own reasons), so thanks to the wisdom of nowadays compilers, the check I propose to add shouldn't introduce a performance penalty. (Actually, when comparing the assembly of _PyLong_Negate (for win32) of the default CPython branch and the patched one, the latter looks simpler.) With regard to similar changes made in the past, _PyLong_Negate wasn't changed since it replaced the macro NEGATE in revision 84698. 4. in x_sub: The previous patches made it safe for x_sub to return a reference to a statically allocated small int, and thus made it possible to implement the following optimization. In case a and b have the same number of digits, x_sub finds the most significant digit where a and b differ. Then, if there is no such digit, it means a and b are equal, and so x_sub does 'return (PyLongObject *)PyLong_FromLong(0);'. I propose to add another check after that, to determine whether the least significant digit is the only digit where a and b differ. In that case, we can do something very similar to what long_add and long_sub do when they realize they are dealing with single-digit ints (as mentioned in 'the current state' section): return (PyLongObject *)PyLong_FromLong((sdigit)a->ob_digit[0] - (sdigit)b->ob_digit[0]); ------------ alternative changes ------------ As an alternative to these 4 changes, I also thought of simply wrapping the return value of long_add and long_sub with the function maybe_small_long (i.e. change the last line of each of them to 'return (PyObject *)maybe_small_long(z);'). This change might be more simple, but I believe it would introduce a performance penalty, as it adds checks also to flows of long_add and long_sub that would never result in a small int. Furthermore, this change won't save any allocations of small ints. For example, in case C (that was mentioned in 'the current state' section), both in (a - b) and in (a + (-b)): 1. A new int of value 1 would be allocated by x_sub. 2. In the end of long_add or long_sub, maybe_small_long would realize an element of small_ints could be used. 3. maybe_small_long would use Py_DECREF on the new int of value 1, which would cause the deallocation of it. In contrast, in case C, the 4th change (in the proposed changes section) would cause x_sub to realize in advance that the result is going to be a single-digit. Consequently, no new int would be futilely allocated in the process. However, in case B (that was mentioned in 'the current state' section), both the alternative changes and the proposed changes (and also the default CPython branch), won't prevent x_sub from allocating a new int. ------------ micro-benchmarking ------------ I did some micro-benchmarking: - subtraction of multiple-digit ints with the same number of digits, while the result fits in the small_ints array: python -m timeit "[i - (i + 1) for i in range(2 ** 67, 2 ** 67 + 1000)]" -> The patched CPython is approximately 8% faster. - subtraction of multiple-digit ints with the same number of digits, which differ only in the least significant digit (while the result doesn't fit in the small_ints array): python -m timeit "[i - (i + 6) for i in range(2 ** 67, 2 ** 67 + 1000)]" -> The patched CPython is approximately 3% faster. - subtraction of multiple-digit ints with the same number of digits, which differ (among others) in the most significant digit: python -m timeit "[i * 2 - i for i in range(2 ** 67, 2 ** 67 + 1000)]" -> The patched CPython is approximately 1% slower. - subtraction of multiple-digit ints with different number of digits: python -m timeit "[i ** 2 - i * 3 for i in range(2 ** 67, 2 ** 67 + 500)]" -> The patched CPython is approximately 2% faster. I expected the patched CPython to be somewhat slower here. Either I have missed something, or some compiler optimization magic was used here. ------------ diff ------------ The patches diff is attached. ------------ tests ------------ I built the patched CPython for x86, and played with it a little. Everything seemed to work as usual. Also, where cases B and C (that were mentioned in 'the current state' section) returned 'False', the patched CPython returned 'True', as expected. In addition, I ran 'python_d.exe -m test' (on my 64-bit Windows 10) with and without the patches, and got quite the same output. the outputs of both runs are attached. | |||
| msg267102 - (view) | Author: Oren Milman (Oren Milman) * | Date: 2016-06-03 14:17 | |
I just realized I had forgotten to check for a failure after using _PyLong_Negate. The updated diff file is attached. | |||
| msg267103 - (view) | Author: Oren Milman (Oren Milman) * | Date: 2016-06-03 14:44 | |
After giving it some more thought, I feel somewhat uncertain about that check for a failure after using _PyLong_Negate. At first I noticed that after every call to _PyLong_Negate there is such a check. But then I realized that in my patch, and also in long_mul (in the default branch of CPython), the check is unnecessary, as z is just returned after the call to _PyLong_Negate. Is adding such a check anyway (e.g. in long_mul) a convention? Or should such checks be removed? | |||
| msg357486 - (view) | Author: Inada Naoki (methane) * ![]() | Date: 2019-11-26 07:55 | |
New changeset 036fe85bd3e6cd01093d836d71792a1966f961e8 by Inada Naoki (HongWeipeng) in branch 'master': bpo-27145: small_ints[x] could be returned in long_add and long_sub (GH-15716) https://github.com/python/cpython/commit/036fe85bd3e6cd01093d836d71792a1966f961e8 | |||
| msg357487 - (view) | Author: STINNER Victor (vstinner) * ![]() | Date: 2019-11-26 08:20 | |
It seems like there are a few corner cases where long integers are not normalized: https://github.com/python/cpython/pull/15716#pullrequestreview-298002027 But the initial issue described here has been fixed, so it's better to keep this issue closed. If someone wants to cover more cases (to normalize), please open a separated issue. Thanks HongWeipeng for the fix and Oren Milman for the original bug report and original patches! | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:31 | admin | set | github: 71332 |
| 2019-11-26 08:20:48 | vstinner | set | messages: + msg357487 |
| 2019-11-26 07:55:28 | methane | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2019-11-26 07:55:05 | methane | set | nosy: + methane messages: + msg357486 |
| 2019-09-06 09:27:13 | hongweipeng | set | pull_requests: + pull_request15370 |
| 2016-06-03 14:44:40 | Oren Milman | set | messages: + msg267103 |
| 2016-06-03 14:17:18 | Oren Milman | set | files: + proposedPatches.diff messages: + msg267102 |
| 2016-05-31 12:40:00 | vstinner | set | nosy: + yselivanov |
| 2016-05-31 09:24:38 | SilentGhost | set | nosy: + rhettinger, vstinner stage: patch review versions: + Python 3.6 |
| 2016-05-28 15:59:44 | Oren Milman | set | files: + patchedCPythonTestOutput.txt |
| 2016-05-28 15:59:19 | Oren Milman | set | files: + CPythonTestOutput.txt |
| 2016-05-28 15:58:37 | Oren Milman | create | |
