-
- Notifications
You must be signed in to change notification settings - Fork 33.3k
bpo-36575: lsprof: Use _PyTime_GetPerfCounter() #8378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
benjaminp left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to make all of _lsprof work in terms of _PyTime_t rather than casting it to long long?
It's not simple because |
Modules/_lsprof.c Outdated
| /* interpret the result as an integer that will be scaled | ||
| in profiler_getstats() */ | ||
| result = PyLong_AsLongLong(o); | ||
| result = (_PyTime_t)PyLong_AsLongLong(o); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to ensure o is not negative to avoid undefined behavior with this cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_PyTime_t is signed integer (int64_t). What undefined behavior do you care?
Should I use PyLong_AsUnsignedLongLong instead of PyLong_AsLongLong()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use _PyTime_FromNanosecondsObject() here.
vstinner left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the reusing my _PyTime_t API :-)
You should add a NEWS entry to document that _lsprof now uses CLOCK_MONOTONIC on Unix, whereas it uses CLOCK_REALTIME (gettimeofday) previously.
Modules/_lsprof.c Outdated
| double val = PyFloat_AsDouble(o); | ||
| /* error handling delayed to the code below */ | ||
| result = (long long) (val * DOUBLE_TIMER_PRECISION); | ||
| result = (_PyTime_t)(val * SEC_TO_NS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use _PyTime_FromSecondsObject(_PyTime_ROUND_FLOOR) here.
Modules/_lsprof.c Outdated
| /* interpret the result as an integer that will be scaled | ||
| in profiler_getstats() */ | ||
| result = PyLong_AsLongLong(o); | ||
| result = (_PyTime_t)PyLong_AsLongLong(o); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use _PyTime_FromNanosecondsObject() here.
| Would it make sense to modify totaltime and inlinetime of _lsprof.profiler_subentry to store time as an integer number of nanoseconds, rather than using a floating time number? Would it break the backward compatibility? Or maybe we can add two new fields to keep backward compatibility: totaltime_ns and inlinetime_ns? As we did for os.stat_result. |
| Maybe, we can add more option for |
I'm not sure that anyone would opt-in, since it's a new option and nobody will notice the new feature. |
| What do you think of using _PyTime_FromSecondsObject() instead of PyFloat_AsDouble() * SEC_TO_NS? _PyTime_t is supposed to be blackbox type with no unit. You should use functions to create _PyTime_t, instead of creating directly _PyTime_t values. In theory, we might be possible to use 100 ns unit instead of 1 ns on Windows, for example. But I never tried :-) |
Then, we shouldn't use I prefer Do you think casting |
| Maybe you can modify _lsprof to use int64_t: _PyTime_t is exactly int64_t. |
| I'm going to close and reopen this to re-trigger the CI checks. |
vstinner left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of replacing "long long" with _PyTime_t in Profile structures like _ProfilerSubEntry?
Or maybe just add this assertion in PyInit__lsprof(): assert(sizeof(_PyTime_t) == sizeof(long long)); to remind that the code rely on the fact that the two times are basically the same.
| @methane: Oops, it seems like I'm writing the same thing every 6 months :-D Do you plan to update your PR? |
vstinner left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I just suggest to rephase the NEWS entry.
| #define CALL_TIMER(pObj) ((pObj)->externalTimer ? \ | ||
| CallExternalTimer(pObj) : \ | ||
| hpTimer()) | ||
| static inline _PyTime_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for converting macros to static inline :-)
| { | ||
| long long result; | ||
| PyObject *o = PyObject_Call(pObj->externalTimer, empty_tuple, NULL); | ||
| PyObject *o = _PyObject_CallNoArg(pObj->externalTimer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice to see such micro-optimization going away thanks to FASTCALL ;-)
| @@ -0,0 +1,3 @@ | |||
| ``_lsprof`` module now uses ``_PyTime_GetPerfCounter`` private API for | |||
| default timer. It will use better timer depending on platform. Patch by | |||
| Inada Naoki. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, _PyTime_GetPerfCounter() is the same than time.perf_counter(). Maybe mention time.perf_counter() which is public, rather than mentioning a private method? You may explain that time.perf_counter() is monotonic, whereas previously _lsprof uses time.time() which isn't monotonic.
"It will use better timer depending on platform." well, you can explain that the new implementation now has internally nanosecond resolution, whereas the old implementation of the default timer only has a ressolution of 1 microsecond.
I suggest:
The _lsprof module now supports internally nanosecond resolution for timing. It now uses time.perf_counter() rather than time.time() (use _PyTime_GetPerfCounter private API which has nanosecond resolution, rather than gettimeofday() which has microsecond resolution). Timings are no more impact by system clock updates, since perf_counter() is monotonic. Patch by Inada Naoki.
Misc/NEWS.d/next/Library/2019-04-09-22-40-52.bpo-36575.Vg_p92.rst Outdated Show resolved Hide resolved
Co-Authored-By: methane <songofacandy@gmail.com>
| @@ -0,0 +1,4 @@ | |||
| The ``_lsprof`` module now uses internal timer same to ``time.perf_counter()`` by default. | |||
| ``gettimeofday(2)`` was used on Unix. New timer has better resolution on most Unix | |||
| platforms and timings are no more impacted by system clock updates since ``perf_counter()`` | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another English question: is "no more" correct here, or is it "no longer"? I am not sure because I always make the mistake.
vstinner left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the more accurate NEWS entry ;-)
https://bugs.python.org/issue36575