Skip to content

Conversation

BobTheBuidler
Copy link
Contributor

@BobTheBuidler BobTheBuidler commented Oct 7, 2025

This PR makes object_annotations deterministic for frozenset objects as long as the frozenset does not contain any tuples or other frozensets

The logic for implementing this was relatively straightforward, but similar logic for the ignored case would be a lot more verbose and potentially impossible to generalize correctly for all cases.

I don't consider that an issue for the purposes of this PR, the PR will still work for most cases and we can always reapproach the topic later for frozensets that contain containers.

Edit: This now also works frozensets that contain other containers.

if formatted.startswith("frozenset({"):
frozenset_items = formatted[11:-2]
# if our frozenset contains another frozenset or a tuple, we will need better logic
# here, but this redimentary logic will still vastly improve codegen determinism.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to subclass pprint.PrettyPrinter and make PrettyPrinter#format deterministic for frozenset?

Copy link
Contributor Author

@BobTheBuidler BobTheBuidler Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused.

I read your suggestion and immediatly thought it made sense, it would allow us to address the recursion issue right now. But when going to implement it I noticed that the default frozenset pprinter sorts the items using this class as the sort key:

class _safe_key: """Helper function for key functions when sorting unorderable objects.   The wrapped-object will fallback to a Py2.x style comparison for  unorderable types (sorting first comparing the type name and then by  the obj ids). Does not work recursively, so dict.items() must have  _safe_key applied to both the key and the value.   """ __slots__ = ['obj'] def __init__(self, obj): self.obj = obj def __lt__(self, other): try: return self.obj < other.obj except TypeError: return ((str(type(self.obj)), id(self.obj)) < \ (str(type(other.obj)), id(other.obj)))

But I have a diff here which clearly shows the order of the items inverts between 2 compiler runs.

So based on this I'm inclined to say let's not mess with the PrettyPrinter class itself unless somebody can speak on the reason for the non-deterministic diff.

Edit: I skipped some steps in my logic. For context, I searched for examples of PrettyPrinter subclasses and found a post suggesting to just modify the pprint logic for your specific class. Doesn't really matter now though, see below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further thought I don't think mypy should alter the behavior of the default PrettyPrinter in any case

Copy link
Contributor Author

@BobTheBuidler BobTheBuidler Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could temporarily alter the PrettyPrinter so long as we return it to its original state afterward.

Separately, I see what I was missing in the _sort_key implementation. The call to id is what makes it non-deterministic across runs. If we modify the sort key to something like this it will be deterministic for all types that mypyc needs here

 def __lt__(self, other): return str(type(self.obj)) + repr(self.obj) < str(type(other.obj) + repr(other.obj)
Copy link
Contributor Author

@BobTheBuidler BobTheBuidler Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should do the trick, and imo is cleaner than subclassing PrettyPrinter. Thanks for your suggestion!

@BobTheBuidler
Copy link
Contributor Author

Thanks to @A5rocks suggestion, the recursion issue is resolved

Copy link
Collaborator

@A5rocks A5rocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'm a little bit concerned that maybe this will be slower (no reason why + I don't know how mypyc performance works), but this certainly works IMO.

This is NOT safe for use as a sort key for other types, so we MUST replace the
original pprint._safe_key once we've pprinted our object.
"""
return str(type(obj)) + repr(obj)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, isn't this still random? repr(obj) might not be a deterministic ordering? Or is it deterministic because maybe frozenset is the same repr based on the same input (I would check but... lazy).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's deterministic so long as the objects contained within the tuple are supported mypyc literal types, and this should have already been validated by this point in the code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, you're correct in one case: if the frozenset contains another frozenset

This can be mitigated by replacing repr with pformat (to ensure the frozenset goes thru the same ordering process)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented this recursive pformatting, let me know if this works for you

@BobTheBuidler
Copy link
Contributor Author

Thanks! I'm a little bit concerned that maybe this will be slower (no reason why + I don't know how mypyc performance works), but this certainly works IMO.

It will definitely be slower, but that's okay because this is all compile-time code and will have no impact any mypyc-compiled project at runtime

Copy link
Collaborator

@A5rocks A5rocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think there might be a way to do this better (within PrettyPrinter#format, format the contents and then sort by string value) but I'll try it on my own time and see if I can come up with a nice diff.

@A5rocks
Copy link
Collaborator

A5rocks commented Oct 8, 2025

Nope, I can't think of a short + nice way to do this looking at the source code for pprint. Thanks for bearing with me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants