Skip to content

Conversation

@pierreglaser
Copy link
Member

@pierreglaser pierreglaser commented Jun 13, 2019

Closes #282

@pierreglaser pierreglaser changed the title FIX pass the __dict__ item of a class __dict__ pass the __dict__ item of a class __dict__ Jun 13, 2019
@pierreglaser pierreglaser force-pushed the pass-__dict__-attribute branch from ca0c8c8 to cdd0aa7 Compare June 17, 2019 09:41
Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once rebased on top of #290, please run the downstream ci tests.

@pierreglaser
Copy link
Member Author

pierreglaser commented Jun 17, 2019

Arg, this code is not safe with regards to reference cycles when __dict__ is not natively memoized by pickle...

@codecov
Copy link

codecov bot commented Jun 17, 2019

Codecov Report

Merging #291 into master will decrease coverage by 37%.
The diff coverage is 33.33%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #291 +/- ## =========================================== - Coverage 92.84% 55.83% -37.01%  =========================================== Files 2 2 Lines 852 849 -3 Branches 177 176 -1 =========================================== - Hits 791 474 -317  - Misses 37 347 +310  - Partials 24 28 +4
Impacted Files Coverage Δ
cloudpickle/cloudpickle_fast.py 0% <0%> (-95.16%) ⬇️
cloudpickle/cloudpickle.py 76.2% <100%> (-15.8%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2db58f1...ff598c7. Read the comment docs.

@codecov
Copy link

codecov bot commented Jun 17, 2019

Codecov Report

Merging #291 into master will increase coverage by 0.23%.
The diff coverage is 100%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #291 +/- ## ========================================== + Coverage 92.84% 93.07% +0.23%  ========================================== Files 2 2 Lines 852 852 Branches 177 177 ========================================== + Hits 791 793 +2  + Misses 37 36 -1  + Partials 24 23 -1
Impacted Files Coverage Δ
cloudpickle/cloudpickle.py 92% <100%> (ø) ⬆️
cloudpickle/cloudpickle_fast.py 96.03% <100%> (+0.88%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2db58f1...5914c7e. Read the comment docs.

@pierreglaser
Copy link
Member Author

Some updates:

now that __dict__ is part of type_kwargs, we must make sure it does not contain unsafe reference cycles with its class. An unsafe reference cycle will appear in the default case (when __dict__ is not overridden), because in this case, __dict__ is a getset_descriptor, that must be pickled by reference to its class, it is non-avoidable.

The guard

if not isinstance(obj, types.GetSetDescriptorType) and getattr(__dict__, '__objclass__', None), is not obj

serves the purpose of finding this default case.
I don't think there is any other way to generate an unsafe cyclic reference, but @ogrisel if you can think of an explicit pattern that could create infinite recursion, let me know.

Also, now that we special-case the situation where __dict__ is a getset_descripor, #290 and this PR are orthogonal. This PR would work without #290, my bad for the erroneous comments 😕 I still think #290 is a nice to have with very little codebase burden, but I won't mind if you'd better revert #290 from cloudpickle master.

@codecov
Copy link

codecov bot commented Jun 17, 2019

Codecov Report

Merging #291 into master will increase coverage by 0.23%.
The diff coverage is 100%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #291 +/- ## ========================================== + Coverage 92.84% 93.07% +0.23%  ========================================== Files 2 2 Lines 852 852 Branches 177 177 ========================================== + Hits 791 793 +2  + Misses 37 36 -1  + Partials 24 23 -1
Impacted Files Coverage Δ
cloudpickle/cloudpickle.py 92% <100%> (ø) ⬆️
cloudpickle/cloudpickle_fast.py 96.03% <100%> (+0.88%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2db58f1...62e1873. Read the comment docs.

@pierreglaser
Copy link
Member Author

The failure of distributed also appears on cloudpickle master (see #295 )

@ogrisel
Copy link
Contributor

ogrisel commented Jun 18, 2019

In #282 you write:

After #253 (comment), I started examinating these lines. I think the original use-case was to handle classes like namedtuple overloading dict.

Can you be more explicit? Do we have a test for what you describe?

@pierreglaser
Copy link
Member Author

pierreglaser commented Jun 18, 2019

Can you be more explicit? Do we have a test for what you describe?

In Python 2.7, the __dict__ attribute of namedtuple classes created like this:

from collections import namedtuple namedtuple_class = namedtuple('MyTuple', ['a', 'b', 'c'])

is overriden so that instances of these classes:

t = namedtuple_class('foo', 'bar', 'qux')

have their __dict__ look like this:

>>> t.__dict__ OrderedDict([('a', 'foo'), ('b', 'bar'), ('c', 'qux')]) >>> type(namedtuple.__dict__['__dict__']) property

(corresponding line in cpython/Lib/collections.py:
https://github.com/python/cpython/blob/1b57ab5c6478b93cf4150bd8c475022252776598/Lib/collections.py#L290)

Visibly, cloudpickle was aware of that use case

# If type overrides __dict__ as a property, include it in the type
# kwargs. In Python 2, we can't set this attribute after construction.
__dict__ = clsdict.pop('__dict__', None)
if isinstance(__dict__, property):
type_kwargs['__dict__'] = __dict__

It turns out that commenting the lines lines 648-649 does not break the test suite: there is no need to save "__dict__" in the case of namedtuples (the if isinstance(__dict__, property) check clearly looks for namedtuple classes) because the __dict__ property gets re-generated automatically at unpickling time when cloudpickle recreates the namedtuple class.

So IMO, those lines in cloudpickle do not serve any use-case. In addition, they are pretty hard to understand at first sight. So I wanted to clarify the way we handle the presence of a "__dict__" item in a class __dict__.

There are two ways we can do that:

  • popping the "__dict__" from the class __dict__ and do not try to save it. It will break dynamically created classes that override the __dict__ attribute of their instances. Admittedly, it rarely happens.
  • popping the "__dict__" from the class __dict__ and actually save it, which is somewhat tricky (hence the 20 lines of comments in this PR).

I am fine with both ways. I simply stumbled across those lines many times, and was always confused by their purpose/the use-case they served - this PR was an attempt to rationalize and clearly document them. Suggestions welcomed.

Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>
Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants