a smalltalk on Object and Protocol in CPython shiyao.ma <i@introo.me> May. 4th
Why this ‣ Python is great for carrying out research experiment. this should lay the foundation why I discuss Python. ‣ Life is short. You need Python. this should lay the foundation why people like Python. 2 life is neither a short nor a long, just a (signed) int, 31bits at most, I say.
Takeaway ‣ Understand the inter-relation among {.py, .pyc .c} file. ‣ Understand that everything in Python is an object. ‣ Understand how functions on TypeObject affect InstanceObject. 3
CPython Overview ‣ First implemented in Dec.1989 by GvR, the BDFL ‣ Serving as the reference implementation. ‣ IronPython (clr) ‣ Jython (jvm) ‣ Brython (v8, spider) [no kidding] ‣ Written in ANSI C. ‣ flexible language binding ‣ embedding (libpython), e.g., openwrt, etc. 4
CPython Overview ‣ Code maintained by Mercurial. ‣ source: https://hg.python.org/cpython/ ‣ Build toolchain is autoconf (on *nix) ./configure --with-pydebug && make -j2 5
CPython Overview ‣ Structure 6 cpython configure.ac Doc Grammar Include Lib Mac Modules Objects Parser Programs Python
CPython Overview ‣ execution lifetime 7 PY Parser PY[CO] VM LOAD 1 LOAD 2 ADD LOAD X STORE X x = 1 + 2 1 1 2 3 3 x STACK takeaway: py/pyc/c inter-relatoin
8 object and protocol: the objects
Object object: memory of C structure with common header 9 PyListObject PyDictObject PyTupleObject PySyntaxErrorObject PyImportErrorObject … takeaway: everything is object ob_type ob_refcnt PyObject ob_type ob_size ob_refcnt PyVarObject
Object Structure Will PyLongObject overflow? 10 The answer: chunk-chunk digit[n] … digit[3] digit[2] ob_type ob_size digit ob_digit[1] ob_refcnt PyLongObject typedef PY_UINT32_T digit; result = PyObject_MALLOC(offsetof(PyLongObject, ob_digit) + size*sizeof(digit)); n = 2 ** 64 # more bits than a word assert type(n) is int and n > 0
Object Structure Why my multi-dimensional array won’t work? 11 The answer: indirection, mutability allocated ob_type ob_item ob_refcnt ob_size PyListObject PyObject* PyObject* … PyObject* PyObject* allocated ob_type ob_item ob_refcnt ob_size PyObject* PyObject* … 42 None m, n = 4, 2 arr = [ [ None ] * n ] * m arr[1][1] = 42 # [[None, 42], [None, 42], [None, 42], [None, 42]] PyList_SetItem
Object Structure what is the ob_type? 12 The answer: flexible type system class Machine(type): pass # Toy = Machine(foo, bar, hoge) class Toy(metaclass=Machine): pass toy = Toy() # Toy, Machine, type, type print(type(toy), type(Toy), type(Machine), type(type)) ob_type ob_refcnt … toy ob_type ob_refcnt … ob_type ob_refcnt … Toy Machine ob_type ob_refcnt … Type
Object Structure what is the ob_type? 13 # ob_type2 # 10fd69490 - 10fd69490 - 10fd69490 print("%x - %x - %x" % (id(42 .__class__), id(233 .__class__), id(int))) assert dict().__class__ is dict # dynamically create a class named "MagicKlass" klass=“MagicKlass" klass=type(klass, (object,), {"quack": lambda _: print("quack")}); duck = klass() # quack duck.quack() assert duck.__class__ is klass
Object Structure what is the ob_type? 14 ob_type … … … ob_refcnt PyObject … *tp_as_mapping *tp_as_sequence *tp_as_number … ob_type tp_getattr … tp_print ob_refcnt PyTypeObject … nb_subtract … nb_add
15 object and protocol: the protocol
16 Protocol: duck-typing in typing
AOL ‣ Abstract Object Layer 17 … *tp_as_mapping *tp_as_sequence *tp_as_number … ob_type tp_getattr … tp_print ob_refcnt PyTypeObject … nb_subtract … nb_add When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck. Object Protocol Number Protocol Sequence Protocol Iterator Protocol Buffer Protocol int PyObject_Print(PyObject *o, FILE *fp, int flags) int PyObject_HasAttr(PyObject *o, PyObject *attr_name) int PyObject_DelAttr(PyObject *o, PyObject *attr_name) … PyObject* PyNumber_Add(PyObject *o1, PyObject *o2) PyObject* PyNumber_Multiply(PyObject *o1, PyObject *o2) PyObject* PyNumber_FloorDivide(PyObject *o1, PyObject *o2) … PyObject* PySequence_Concat(PyObject *o1, PyObject *o2) PyObject* PySequence_Repeat(PyObject *o, Py_ssize_t count) PyObject* PySequence_GetItem(PyObject *o, Py_ssize_t i) … int PyIter_Check(PyObject *o) PyObject* PyIter_Next(PyObject *o) int PyObject_GetBuffer(PyObject *exporter, Py_buffer *view, int flags) void PyBuffer_Release(Py_buffer *view) int PyBuffer_IsContiguous(Py_buffer *view, char order) … Mapping Protocol int PyMapping_HasKey(PyObject *o, PyObject *key) PyObject* PyMapping_GetItemString(PyObject *o, const char *key) int PyMapping_SetItemString(PyObject *o, const char *key, PyObject *v) …
Example ‣ Number Protocol (PyNumber_Add) 18 // v + w? PyObject * PyNumber_Add(PyObject *v, PyObject *w) { // this just an example! // try on v result = v->ob_type->tp_as_number.nb_add(v, w) // if fail or if w->ob_type is a subclass of v->ob_type result = w->ob_type->tp_as_number.nb_add(w, v) // return result } … *tp_as_mapping *tp_as_sequence *tp_as_number … ob_type tp_getattr … tp_print ob_refcnt PyTypeObject … nb_subtract … nb_add takeaway: typeobject stores meta information
More Example Why can we multiply a list? Is it slow? 19 arr = [None] * 3 # [None, None, None] Exercise: arr = [None] + [None] # [None, None]
Magic Methods access slots of tp_as_number, and its friends 20 Note tp_as_mapping->mp_length and tp_as_sequence->sq_length map to the same slot __len__ If your C based MyType implements both, what’s MyType.__len__ and len(MyType()) ? # access magic method of dict and list dict.__getitem__ # tp_as_mapping->mp_subscript dict.__len__ # tp_as_mapping->mp_length list.__getitem__ # tp_as_sequence->sq_item list.__len__ # tp_as_sequence->sq_length
Magic Methods backfill as_number and its friends 21 class A(): def __len__(self): return 42 class B(): pass # 42 print(len(A())) # TypeError: object of type 'B' has no len() print(len(B())) Py_ssize_t PyObject_Size(PyObject *o) { PySequenceMethods *m; if (o == NULL) { null_error(); return -1; } m = o->ob_type->tp_as_sequence; if (m && m->sq_length) return m->sq_length(o); return PyMapping_Size(o); }Which field does A.__len__ fill?
Next: Heterogeneous Have you ever felt insecure towards negative indexing of PyListObject? 22 The answer: RTFSC words = "the quick brown fox jumps over the old lazy dog".split() assert words[-1] == "dog" words.insert(-100, "hayabusa") assert words[-100] == ??
Thanks 23

Intro python-object-protocol

  • 1.
    a smalltalk on Objectand Protocol in CPython shiyao.ma <i@introo.me> May. 4th
  • 2.
    Why this ‣ Pythonis great for carrying out research experiment. this should lay the foundation why I discuss Python. ‣ Life is short. You need Python. this should lay the foundation why people like Python. 2 life is neither a short nor a long, just a (signed) int, 31bits at most, I say.
  • 3.
    Takeaway ‣ Understand theinter-relation among {.py, .pyc .c} file. ‣ Understand that everything in Python is an object. ‣ Understand how functions on TypeObject affect InstanceObject. 3
  • 4.
    CPython Overview ‣ Firstimplemented in Dec.1989 by GvR, the BDFL ‣ Serving as the reference implementation. ‣ IronPython (clr) ‣ Jython (jvm) ‣ Brython (v8, spider) [no kidding] ‣ Written in ANSI C. ‣ flexible language binding ‣ embedding (libpython), e.g., openwrt, etc. 4
  • 5.
    CPython Overview ‣ Codemaintained by Mercurial. ‣ source: https://hg.python.org/cpython/ ‣ Build toolchain is autoconf (on *nix) ./configure --with-pydebug && make -j2 5
  • 6.
  • 7.
    CPython Overview ‣ executionlifetime 7 PY Parser PY[CO] VM LOAD 1 LOAD 2 ADD LOAD X STORE X x = 1 + 2 1 1 2 3 3 x STACK takeaway: py/pyc/c inter-relatoin
  • 8.
  • 9.
    Object object: memory ofC structure with common header 9 PyListObject PyDictObject PyTupleObject PySyntaxErrorObject PyImportErrorObject … takeaway: everything is object ob_type ob_refcnt PyObject ob_type ob_size ob_refcnt PyVarObject
  • 10.
    Object Structure Will PyLongObjectoverflow? 10 The answer: chunk-chunk digit[n] … digit[3] digit[2] ob_type ob_size digit ob_digit[1] ob_refcnt PyLongObject typedef PY_UINT32_T digit; result = PyObject_MALLOC(offsetof(PyLongObject, ob_digit) + size*sizeof(digit)); n = 2 ** 64 # more bits than a word assert type(n) is int and n > 0
  • 11.
    Object Structure Why mymulti-dimensional array won’t work? 11 The answer: indirection, mutability allocated ob_type ob_item ob_refcnt ob_size PyListObject PyObject* PyObject* … PyObject* PyObject* allocated ob_type ob_item ob_refcnt ob_size PyObject* PyObject* … 42 None m, n = 4, 2 arr = [ [ None ] * n ] * m arr[1][1] = 42 # [[None, 42], [None, 42], [None, 42], [None, 42]] PyList_SetItem
  • 12.
    Object Structure what isthe ob_type? 12 The answer: flexible type system class Machine(type): pass # Toy = Machine(foo, bar, hoge) class Toy(metaclass=Machine): pass toy = Toy() # Toy, Machine, type, type print(type(toy), type(Toy), type(Machine), type(type)) ob_type ob_refcnt … toy ob_type ob_refcnt … ob_type ob_refcnt … Toy Machine ob_type ob_refcnt … Type
  • 13.
    Object Structure what isthe ob_type? 13 # ob_type2 # 10fd69490 - 10fd69490 - 10fd69490 print("%x - %x - %x" % (id(42 .__class__), id(233 .__class__), id(int))) assert dict().__class__ is dict # dynamically create a class named "MagicKlass" klass=“MagicKlass" klass=type(klass, (object,), {"quack": lambda _: print("quack")}); duck = klass() # quack duck.quack() assert duck.__class__ is klass
  • 14.
    Object Structure what isthe ob_type? 14 ob_type … … … ob_refcnt PyObject … *tp_as_mapping *tp_as_sequence *tp_as_number … ob_type tp_getattr … tp_print ob_refcnt PyTypeObject … nb_subtract … nb_add
  • 15.
  • 16.
  • 17.
    AOL ‣ Abstract ObjectLayer 17 … *tp_as_mapping *tp_as_sequence *tp_as_number … ob_type tp_getattr … tp_print ob_refcnt PyTypeObject … nb_subtract … nb_add When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck. Object Protocol Number Protocol Sequence Protocol Iterator Protocol Buffer Protocol int PyObject_Print(PyObject *o, FILE *fp, int flags) int PyObject_HasAttr(PyObject *o, PyObject *attr_name) int PyObject_DelAttr(PyObject *o, PyObject *attr_name) … PyObject* PyNumber_Add(PyObject *o1, PyObject *o2) PyObject* PyNumber_Multiply(PyObject *o1, PyObject *o2) PyObject* PyNumber_FloorDivide(PyObject *o1, PyObject *o2) … PyObject* PySequence_Concat(PyObject *o1, PyObject *o2) PyObject* PySequence_Repeat(PyObject *o, Py_ssize_t count) PyObject* PySequence_GetItem(PyObject *o, Py_ssize_t i) … int PyIter_Check(PyObject *o) PyObject* PyIter_Next(PyObject *o) int PyObject_GetBuffer(PyObject *exporter, Py_buffer *view, int flags) void PyBuffer_Release(Py_buffer *view) int PyBuffer_IsContiguous(Py_buffer *view, char order) … Mapping Protocol int PyMapping_HasKey(PyObject *o, PyObject *key) PyObject* PyMapping_GetItemString(PyObject *o, const char *key) int PyMapping_SetItemString(PyObject *o, const char *key, PyObject *v) …
  • 18.
    Example ‣ Number Protocol(PyNumber_Add) 18 // v + w? PyObject * PyNumber_Add(PyObject *v, PyObject *w) { // this just an example! // try on v result = v->ob_type->tp_as_number.nb_add(v, w) // if fail or if w->ob_type is a subclass of v->ob_type result = w->ob_type->tp_as_number.nb_add(w, v) // return result } … *tp_as_mapping *tp_as_sequence *tp_as_number … ob_type tp_getattr … tp_print ob_refcnt PyTypeObject … nb_subtract … nb_add takeaway: typeobject stores meta information
  • 19.
    More Example Why canwe multiply a list? Is it slow? 19 arr = [None] * 3 # [None, None, None] Exercise: arr = [None] + [None] # [None, None]
  • 20.
    Magic Methods access slotsof tp_as_number, and its friends 20 Note tp_as_mapping->mp_length and tp_as_sequence->sq_length map to the same slot __len__ If your C based MyType implements both, what’s MyType.__len__ and len(MyType()) ? # access magic method of dict and list dict.__getitem__ # tp_as_mapping->mp_subscript dict.__len__ # tp_as_mapping->mp_length list.__getitem__ # tp_as_sequence->sq_item list.__len__ # tp_as_sequence->sq_length
  • 21.
    Magic Methods backfill as_numberand its friends 21 class A(): def __len__(self): return 42 class B(): pass # 42 print(len(A())) # TypeError: object of type 'B' has no len() print(len(B())) Py_ssize_t PyObject_Size(PyObject *o) { PySequenceMethods *m; if (o == NULL) { null_error(); return -1; } m = o->ob_type->tp_as_sequence; if (m && m->sq_length) return m->sq_length(o); return PyMapping_Size(o); }Which field does A.__len__ fill?
  • 22.
    Next: Heterogeneous Have youever felt insecure towards negative indexing of PyListObject? 22 The answer: RTFSC words = "the quick brown fox jumps over the old lazy dog".split() assert words[-1] == "dog" words.insert(-100, "hayabusa") assert words[-100] == ??
  • 23.