-
- Notifications
You must be signed in to change notification settings - Fork 3k
Mypyc Trace Logging
Mypyc can optionally include logging in the generated code that produces a sampled trace of various events or operations that happen during execution. For example, you can use this to find code locations where instances of a specific class are constructed, or identify the most commonly called method of a class.
The trace logs are line-based text files that use :
as the field separator. They are are easy to analyze using ad-hoc Python scripts, Unix command-line tools or an SQL database, for example.
Before you start using this, it's recommended to have a look at the mypyc documentation, since some things may not make much sense otherwise.
You will first need to compile the target code using trace logging enabled. When the code is executed, it will write the trace log into mypyc_trace.txt
. If you use the mypyc
command-line tool to compile your code, define the MYPYC_TRACE_LOG
environment variable with value 1
:
MYPYC_LOG_TRACE=1 mypyc ...
If you use mypycify()
, pass log_trace=True
as an extra argument.
If you are analyzing mypy, there is a helper script misc/log_trace_heck.py
in the mypy GitHub repository that can compile mypy with trace log enabled and either perform a self check or type check a code fragment (e.g. -c "import some_module"
).
Trace logging is deterministic, so C compiler flags and hardware details have no impact. Also the target platform and Python version are unlikely to have a significant impact, unless some optimization requires a recent Python version.
Here is an example log output (details may change over time and this may not 100% reflect what you will see):
mypy.semanal.SemanticAnalyzer.flatten_lvalues::primitive_op:list_get_item_unsafe mypy.binder.ConditionalTypeBinder._get::primitive_op:int_gt mypy.semanal.SemanticAnalyzer.is_func_scope:7094:call_c:CPyList_GetItemShort mypy.typetraverser.TypeTraverserVisitor.traverse_type_tuple::primitive_op:var_object_size mypy.typeops.try_contracting_literals_in_union:1080:call_c:PyIter_Next mypy.renaming.LimitedVariableRenameVisitor.visit_name_expr:544:call_c:PySequence_Contains mypy.util.split_module_names:80:call_c:CPyList_GetItemShort mypy.expandtype.ExpandTypeVisitor.visit_instance::primitive_op:int_eq mypy.semanal.SemanticAnalyzer.process_type_annotation:3823:call_c:CPyList_GetItemShort
Let's look at a few log items in detail. First, this is a call to a native method:
mypy.types.Type.__init__:251:call:mypy.nodes.Context.__init__
The event was logged in mypy.types.Type._init__
on line 251 (based on the first two fields). The type of the event is call
(a native call), and the called function was mypy.nodes.Context.__init__
.
This event is a list get item operation:
mypy.semanal.SemanticAnalyzer.is_func_scope:7094:call_c:CPyList_GetItemShort
The event type call_c
indicates a call of a primitive C function or a Python C API function. Python C API functions have the prefix Py
while mypyc primitive functions have the prefix CPy
.
This event indicates reading the object size of a variable-length object:
mypy.typetraverser.TypeTraverserVisitor.traverse_type_tuple::primitive_op:var_object_size
It's a low-level primitive operation (primitive_op
) and usually isn't very interesting, since these operations are few fast.
This is another very fast operation (integer equality):
mypy.binder.ConditionalTypeBinder._get::primitive_op:int_gt
In this section we will go through some common C primitives or C API functions that are useful to recognize in the trace log.
These are optimized primitives specialized based on the static types of expressions:
-
CPyList_GetItemShort
: Get list item (a[i]
) -
CPyList_GetItem
: Get list item (another variant) -
PyList_New
: Create a list -
PyList_Append
: Append to a list (a.append(x)
) -
PyList_Check
: Equivalent toisinstance(x, list)
-
PyDict_New
: Create a dict -
PyDict_Contains
: Dict contains (x in d
) -
CPyDict_GetItem
: Get dict item (d[x]
) -
CPyDict_GetWithNone
: Get dict item (another variant) -
CPyDict_SetItem
: Set dict item (d[x] = y
) -
CPyStr_Equal
: String equality -
CPyTagged_Add
: Add two integers -
PySequence_Contains
:x in <sequence>
-
PySequence_Tuple
: Equivalent totuple(x)
-
PyUnicode_Contains
: Substring check -
PyUnicode_Concat
: Concatenate strings -
PySet_New
: Create a set -
PySet_Add
: Add an item to a set -
PySet_Contains
: Set contains (x in s
)
These are slower, generic operations that work for arbitrary objects (including with Any
types):
-
PyObject_GetIter
: Equivalent toiter(x)
-
PyIter_Next
: Equivalent tonext(it)
-
PyObject_IsTrue
: Is an object considered to be true -
PyObject_RichCompare
: Compare two objects -
PyObject_Vectorcall
: Call an object (much slower than optimized native calls) -
PyObject_IsInstance
: Equivalent toisinstance(x, t)
-
CPyObject_Hash
: Equivalent tohash(x)
-
CPyObject_Size
: Equivalent tolen(x)
-
CPyObject_GetAttr3
: Look up an attribute of an object the slow way -
CPyObject_GetAttr
: Another way to look up an attribute -
PyObject_Not
: Boolean not operation (not x
) -
CPyGen_SetStopIterationValue
: RaiseStopIteration
with a value
These are internal operations that don't directly map to Python code:
-
CPy_NoErrOccurred
: Check if an exception was raised by an operation or function call
You can analyze data easily using standard Unix command line tools, including cut
, sort
, shuf
, uniq
and wc
. We also use ripgrep (rg
) to search for substrings and regular expressions, but grep also works.
Display the most commonly called C primitives through the call_c
event:
$ rg :call_c: mypyc_trace.txt | cut -d':' -f4 | sort | uniq -c | sort -n ... 4970 PyIter_Next 4987 PyDict_Contains 5074 PyList_Append 7470 CPyStr_Equal 7623 PyList_New 7987 PySet_Contains 8489 CPyList_GetItemShort
We are doing lots of PyIter_Next
operations, which are generic and thus somewhat slow. Maybe we can do better? The second example digs deeper into this.
Display a random sample of 15 events that call PyIter_Next
and sort by function:
$ rg PyIter_Next mypyc_trace.txt | shuf | tail -15 | sort mypy.binder.ConditionalTypeBinder.update_from_options:233:call_c:PyIter_Next mypy.binder.ConditionalTypeBinder.update_from_options:233:call_c:PyIter_Next mypy.graph_utils.prepare_sccs:69:call_c:PyIter_Next mypy.indirection.TypeIndirectionVisitor.find_modules:35:call_c:PyIter_Next mypy.indirection.TypeIndirectionVisitor.find_modules:35:call_c:PyIter_Next mypy.semanal_main.semantic_analyze_target:409:call_c:PyIter_Next mypy.solve.solve_constraints:61:call_c:PyIter_Next mypy.solve.solve_one:266:call_c:PyIter_Next mypy.types.CallableType.__init__:1887:call_c:PyIter_Next mypy.types.CallableType.__init__:1887:call_c:PyIter_Next mypy.types.CallableType.__init__:1887:call_c:PyIter_Next mypy.types.flatten_nested_tuples:3752:call_c:PyIter_Next mypy.type_visitor.TypeQuery.query_types:452:call_c:PyIter_Next mypy.type_visitor.TypeTranslator.translate_types:312:call_c:PyIter_Next mypy.type_visitor.TypeTranslator.translate_types:312:call_c:PyIter_Next
Group by function where call happens:
$ rg PyIter_Next mypyc_trace.txt | cut -d':' -f1 | sort | uniq -c | sort -n ... 108 mypy.type_visitor.TypeQuery.query_types 112 mypy.fastparse.ASTConverter.translate_stmt_list 115 mypy.typeops.FreezeTypeVarsVisitor.visit_callable_type 120 mypy.binder.ConditionalTypeBinder.update_from_options 190 mypy.messages.MessageBuilder.incompatible_argument 198 mypy.fastparse.ASTConverter.translate_opt_expr_list 279 mypy.expandtype.ExpandTypeVisitor.expand_types 351 mypy.indirection.TypeIndirectionVisitor.find_modules 396 mypy.type_visitor.TypeTranslator.translate_types 1252 mypy.types.CallableType.__init__
It looks like CallableType.__init__
does a lot of potentially slow generic iteration. On which source lines does this happen?
$ rg PyIter_Next mypyc_trace.txt | rg CallableType.__init__ | cut -d':' -f2 | sort | uniq -c | sort -n 1252 1887
Everything happens on line 1887.
Calls to compiled functions and methods are usually compiled to fast C-level calls. However, sometimes mypyc falls back to slow Python calls. You can look for PyObject_Vectorcall
, which is used for many such calls. Another option is to look at code that accesses the global namespace dictionary. Access Python namespaces is slow, since dictionary lookup is somewhat expensive, and dictionary values are untyped at runtime.
$ rg :python_call_global: mypyc_trace.txt | cut -d':' -f4 | sort | uniq -c | sort -n 3 defaultdict 5 FunctionContext 6 ErrorMessage 7 pop_on_exit 9 TypeRange 13 PartialTypeScope 44 CurrentType 120 FormalArgument
It looks like we are calling the classes FormalArgument
and CurrentType
and a few others via the global namespace. This indicates that the classes aren't native classes but either imported from Python libraries or mypyc falls back to compiling them into non-native classes. In the latter case the class may be a named tuple or a dataclass -- mypyc can't apply all optimizations yet for them.
The python_get_attr
event indicates an access of an attribute using generic Python semantics, which behaves kind of like a dictionary lookup. Attributes of compiled classes can usually be accessed directly, which is many times faster.
$ rg :python_get_attr: mypyc_trace.txt | cut -d':' -f1 | sort | uniq -c | sort -n | tail -20 59 mypy.fastparse.TypeConverter.visit_Subscript 62 mypy.typeanal.TypeAnalyser.analyze_type_with_type_info 63 mypy.subtypes.are_parameters_compatible 71 mypy.fastparse.ASTConverter.visit_Constant 77 mypy.fastparse.ASTConverter.set_block_lines 81 mypy.checker.TypeChecker.visit_if_stmt 101 mypy.checkexpr.ExpressionChecker.check_argument_types 106 mypy.fastparse.ASTConverter.visit_Call 108 mypy.fastparse.ASTConverter.transform_args 110 mypy.semanal.SemanticAnalyzer.analyze_function_body 116 mypy.fastparse.ASTConverter.visit_Attribute 136 mypy.checkmember.analyze_instance_member_access 136 mypy.fastparse.TypeConverter.visit_Name 144 mypy.typeanal.TypeAnalyser.__init__ 158 mypy.fastparse.ASTConverter.visit_Name 177 mypy.checker.TypeChecker.check_func_def 212 mypy.fastparse.TypeConverter.visit 281 mypy.fastparse.ASTConverter.do_func_def 284 mypy.fastparse.ASTConverter.make_argument 904 mypy.fastparse.ASTConverter.set_line
It looks like most of these accesses are happening the the mypy.fastparse
module. This makes sense, since the module converts data from Python objects constructed using the ast
stdlib module, and these are not native classes, so mypyc can't access the attribute values directly.
Look at the globals_dict_get_item
event to find accesses of values via the module-level global namespace (a dictionary). These can be global variables, names imported from non-compiled modules, or classes that are compiled into regular (slow) Python classes. There are a few things that can help here:
- Annotate a global variable as
Final
(e.g.FOO: Final = 5
) if it's a constant value. Accessing final variables is quick. - If you access the value in a loop, cache the value in a local variable outside the loop. Mypyc assumes that the global dictionary can be modified at any time, so automatic optimizations are limited.
- If it's a class object, you may be able to change it into a native class instead (this may be tricky, however).