Closed
Description
See faster-cpython/ideas#592. This project can be parallelized, and I could use help!
Note that initially we don't have a tier-2 interpreter or instruction format defined yet, but the first stage of splitting doesn't require that.
Remaining non-viable ops
With usage stats from the latest weekly pyperformance run (not all are uop candidates):
956,835,460 YIELD_VALUE 490,121,077 SEND_GEN (specialization of SEND) 429,220,003 JUMP_FORWARD 410,623,920 EXTENDED_ARG 390,537,200 JUMP_BACKWARD_NO_INTERRUPT 250,483,954 RETURN_GENERATOR 239,793,181 CALL_LIST_APPEND (specialization of CALL) 168,297,543 CALL_KW 162,922,780 FOR_ITER_GEN (specialization of FOR_ITER) 157,442,920 CALL_PY_WITH_DEFAULTS (specialization of CALL) 145,986,780 BINARY_SUBSCR_GETITEM (specialization of BINARY_SUBSCR) 135,636,840 STORE_FAST_LOAD_FAST 83,118,452 MAKE_CELL 74,149,898 CALL_FUNCTION_EX 68,587,076 CALL_ALLOC_AND_ENTER_INIT (specialization of CALL) 49,897,724 STORE_ATTR_WITH_HINT (specialization of STORE_ATTR) 49,846,886 LOAD_ATTR_PROPERTY (specialization of LOAD_ATTR) 8,224,500 RERAISE 6,000,000 END_ASYNC_FOR 5,801,385 BEFORE_WITH 2,892,780 RAISE_VARARGS 1,850,040 IMPORT_FROM 1,813,620 IMPORT_NAME 240 CLEANUP_THROW 120 BEFORE_ASYNC_WITH ENTER_EXECUTOR LOAD_ATTR_GETATTRIBUTE_OVERRIDDEN (specialization of LOAD_ATTR)
Linked PRs
- gh-104909: Split BINARY_OP into micro-ops #104910
- gh-104909: Implement conditional stack effects for macros #105748
- GH-104909: Break LOAD_GLOBAL specializations in micro-ops. #106677
- GH-104909: Split
LOAD_ATTR_INSTANCE_VALUE
into micro-ops #106678 - GH-104909: Move unused cache entries from uops to macros #107444
- GH-104909: Break instrumented instructions into micro-ops. #109316
- gh-104909: Split some more insts into ops #109943
- GH-104909: Implement some instrumented instructions with micro-ops. #110025
- gh-104909: Split more LOAD_ATTR specializations #110317
- gh-104909: Make LOAD_ATTR_PROPERTY a viable uop #110560