-
- Notifications
You must be signed in to change notification settings - Fork 3k
[mypyc] feat: improve LoadLiteral annotation determinism #20012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
mypyc/codegen/emit.py Outdated
if formatted.startswith("frozenset({"): | ||
frozenset_items = formatted[11:-2] | ||
# if our frozenset contains another frozenset or a tuple, we will need better logic | ||
# here, but this redimentary logic will still vastly improve codegen determinism. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to subclass pprint.PrettyPrinter
and make PrettyPrinter#format
deterministic for frozenset
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused.
I read your suggestion and immediatly thought it made sense, it would allow us to address the recursion issue right now. But when going to implement it I noticed that the default frozenset pprinter sorts the items using this class as the sort key:
class _safe_key: """Helper function for key functions when sorting unorderable objects. The wrapped-object will fallback to a Py2.x style comparison for unorderable types (sorting first comparing the type name and then by the obj ids). Does not work recursively, so dict.items() must have _safe_key applied to both the key and the value. """ __slots__ = ['obj'] def __init__(self, obj): self.obj = obj def __lt__(self, other): try: return self.obj < other.obj except TypeError: return ((str(type(self.obj)), id(self.obj)) < \ (str(type(other.obj)), id(other.obj)))
But I have a diff here which clearly shows the order of the items inverts between 2 compiler runs.
So based on this I'm inclined to say let's not mess with the PrettyPrinter class itself unless somebody can speak on the reason for the non-deterministic diff.
Edit: I skipped some steps in my logic. For context, I searched for examples of PrettyPrinter subclasses and found a post suggesting to just modify the pprint logic for your specific class. Doesn't really matter now though, see below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On further thought I don't think mypy should alter the behavior of the default PrettyPrinter in any case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could temporarily alter the PrettyPrinter so long as we return it to its original state afterward.
Separately, I see what I was missing in the _sort_key implementation. The call to id
is what makes it non-deterministic across runs. If we modify the sort key to something like this it will be deterministic for all types that mypyc needs here
def __lt__(self, other): return str(type(self.obj)) + repr(self.obj) < str(type(other.obj) + repr(other.obj)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should do the trick, and imo is cleaner than subclassing PrettyPrinter. Thanks for your suggestion!
Thanks to @A5rocks suggestion, the recursion issue is resolved |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
c9afd2a
to 96aa63e
Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I'm a little bit concerned that maybe this will be slower (no reason why + I don't know how mypyc performance works), but this certainly works IMO.
mypyc/codegen/emit.py Outdated
This is NOT safe for use as a sort key for other types, so we MUST replace the | ||
original pprint._safe_key once we've pprinted our object. | ||
""" | ||
return str(type(obj)) + repr(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, isn't this still random? repr(obj)
might not be a deterministic ordering? Or is it deterministic because maybe frozenset is the same repr based on the same input (I would check but... lazy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's deterministic so long as the objects contained within the tuple are supported mypyc literal types, and this should have already been validated by this point in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, you're correct in one case: if the frozenset contains another frozenset
This can be mitigated by replacing repr with pformat (to ensure the frozenset goes thru the same ordering process)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've implemented this recursive pformatting, let me know if this works for you
It will definitely be slower, but that's okay because this is all compile-time code and will have no impact any mypyc-compiled project at runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think there might be a way to do this better (within PrettyPrinter#format
, format the contents and then sort by string value) but I'll try it on my own time and see if I can come up with a nice diff.
Nope, I can't think of a short + nice way to do this looking at the source code for |
This PR makes object_annotations deterministic for frozenset objects as long as the frozenset does not contain any tuples or other frozensets
The logic for implementing this was relatively straightforward, but similar logic for the ignored case would be a lot more verbose and potentially impossible to generalize correctly for all cases.
I don't consider that an issue for the purposes of this PR, the PR will still work for most cases and we can always reapproach the topic later for frozensets that contain containers.
Edit: This now also works frozensets that contain other containers.