Why should descriptor instances in Python be class attributes?

7

I'm studying descritores in Python and I found that they should be implemented as atributos de classe for example:

class Descriptor:

    def __init__(self, obj):
        self.obj = obj

    def __get__(self, instance, owner=None):
       print('Acessando o __get__')
       return self.obj


class Grok:

    attr = Descriptor('value')

# Output
>>> g = Grok()
>>> g.attr
Acessando o __get__
value

That way it works, but if I do it this way:

class Grok:

    def __init__(self, attr):
        self.attr = Descriptor(attr)


# Output
>>> g = Grok('value')
>>> g.attr
<__main__.Descriptor at 0x7fe5bca77550>

It does not work that way, my question is: Why?

    
asked by anonymous 23.10.2018 / 20:48

2 answers

9

As discussed in What is the function of the descriptors in Python? there is an order of calls that the interpreter executes when you g.attr . As g , in this case, is an instance, the interpreter will execute g.__getattribute__('attr') . In Python, what the interpreter will try to access is:

type(g).__dict__['attr'].__get__(g, type(g))

That is, it will look for the value of attr in the class of g , not g directly. This explains why it works when the descriptor is a class attribute, but it is not enough to demonstrate that it does not work for instance attribute. For this, we will go deeper into the code and parse the C code that runs.

The C implementation of the __getattribute__ method is described by PyObject_GenericGetAttr which is implemented in Objects / object.c . Let's take a closer look.

The function is:

PyObject *
PyObject_GenericGetAttr(PyObject *obj, PyObject *name)
{
    return _PyObject_GenericGetAttrWithDict(obj, name, NULL);
}

And so we should look at the implementation of _PyObject_GenericGetAttrWithDict .

PyObject *
_PyObject_GenericGetAttrWithDict(PyObject *obj, PyObject *name, PyObject *dict)
{
    PyTypeObject *tp = Py_TYPE(obj);
    PyObject *descr = NULL;
    PyObject *res = NULL;
    descrgetfunc f;
    Py_ssize_t dictoffset;
    PyObject **dictptr;

    ...
}

Important information to continue:

  • The function receives as parameter obj , a reference to the object g ;
  • The function receives as parameter name , name of the attribute accessed;
  • The function receives as parameter dict , a dictionary that, in this case, will be null;
  • From obj look for the reference to its type, Grok , by the variable tp ;
  • Initializes null pointers descr , which will be a possible descriptor, res , return of function, f , function __get__ of the possible descriptor, as well as other pointers;

    From this the name of the accessed attribute is validated, returning an error if the attribute is not a string . If it does, it increments the number of references to the object with Py_INCREF .

    if (!PyUnicode_Check(name)){
        PyErr_Format(PyExc_TypeError,
                     "attribute name must be string, not '%.200s'",
                     name->ob_type->tp_name);
        return NULL;
    }
    Py_INCREF(name);
    

    Afterwards, the internal dictionary of type g , tp is validated, finalizing the function in case of failure:

    if (tp->tp_dict == NULL) {
        if (PyType_Ready(tp) < 0)
            goto done;
    }
    

    After, it is searched for by the attribute in the class of g , Grok , saving in descr . If found, the references are incremented and the value of f is set to be the __get__ function of the value found in descr . If you find the function and the descriptor is a data descriptor (it has the method __set__ ), it defines res as a result of __get__ and ends the function:

    descr = _PyType_Lookup(tp, name);
    
    f = NULL;
    if (descr != NULL) {
        Py_INCREF(descr);
        f = descr->ob_type->tp_descr_get;
        if (f != NULL && PyDescr_IsData(descr)) {
            res = f(descr, obj, (PyObject *)obj->ob_type);
            goto done;
        }
    }
    
      

    The function that checks whether it is a data descriptor, PyDescr_IsData , is defined by

    #define PyDescr_IsData(d) (Py_TYPE(d)->tp_descr_set != NULL)
    
         

    Which basically checks to see if the __set__ method exists on the object.

    And it is so far that it runs when the (data) descriptor is a class attribute. For an instance attribute, execution continues. Now, since we're going to work directly with the instance, you'll have to consider your internal dictionary as well. So, the next step will be the union between the class and instance dictionaries, and the final pointer will be stored in dict :

    if (dict == NULL) {
        /* Inline _PyObject_GetDictPtr */
        dictoffset = tp->tp_dictoffset;
        if (dictoffset != 0) {
            if (dictoffset < 0) {
                Py_ssize_t tsize;
                size_t size;
    
                tsize = ((PyVarObject *)obj)->ob_size;
                if (tsize < 0)
                    tsize = -tsize;
                size = _PyObject_VAR_SIZE(tp, tsize);
                assert(size <= PY_SSIZE_T_MAX);
    
                dictoffset += (Py_ssize_t)size;
                assert(dictoffset > 0);
                assert(dictoffset % SIZEOF_VOID_P == 0);
            }
            dictptr = (PyObject **) ((char *)obj + dictoffset);
            dict = *dictptr;
        }
    }
    

    After that, it will be searched for by the attribute in the dict dictionary and, if found, the value is returned:

    if (dict != NULL) {
        Py_INCREF(dict);
        res = PyDict_GetItem(dict, name);
        if (res != NULL) {
            Py_INCREF(res);
            Py_DECREF(dict);
            goto done;
        }
        Py_DECREF(dict);
    }
    
    Note that here, as the instance attribute will exist in the dictionary, the value returned in% with% will be the instance of the decorator, which will be returned as null, regardless of whether or not the PyDict_GetItem defined.

    If you can not find the attribute in the instance dictionary, it will be checked whether the descriptor found in the class is a non-data descriptor (which does not have the __get__ method) is called:

    if (f != NULL) {
        res = f(descr, obj, (PyObject *)Py_TYPE(obj));
        goto done;
    }
    

    After, if none of the above conditions have yet been met, it is checked if the __set__ object is other than null (found something about the attribute in descr type), then g is defined as the result returns it:

    if (descr != NULL) {
        res = descr;
        descr = NULL;
        goto done;
    }
    

    And finally, if nothing has worked out so far, it returns the attribute error not found:

    PyErr_Format(PyExc_AttributeError,
                 "'%.50s' object has no attribute '%U'",
                 tp->tp_name, name);
    

    Finally, move on the reference quantities and return the value of descr :

    done:
        Py_XDECREF(descr);
        Py_DECREF(name);
        return res;
    

    The entire function for better visualization is:

    PyObject *
    _PyObject_GenericGetAttrWithDict(PyObject *obj, PyObject *name, PyObject *dict)
    {
        PyTypeObject *tp = Py_TYPE(obj);
        PyObject *descr = NULL;
        PyObject *res = NULL;
        descrgetfunc f;
        Py_ssize_t dictoffset;
        PyObject **dictptr;
    
        if (!PyUnicode_Check(name)){
            PyErr_Format(PyExc_TypeError,
                         "attribute name must be string, not '%.200s'",
                         name->ob_type->tp_name);
            return NULL;
        }
        Py_INCREF(name);
    
        if (tp->tp_dict == NULL) {
            if (PyType_Ready(tp) < 0)
                goto done;
        }
    
        descr = _PyType_Lookup(tp, name);
    
        f = NULL;
        if (descr != NULL) {
            Py_INCREF(descr);
            f = descr->ob_type->tp_descr_get;
            if (f != NULL && PyDescr_IsData(descr)) {
                res = f(descr, obj, (PyObject *)obj->ob_type);
                goto done;
            }
        }
    
        if (dict == NULL) {
            /* Inline _PyObject_GetDictPtr */
            dictoffset = tp->tp_dictoffset;
            if (dictoffset != 0) {
                if (dictoffset < 0) {
                    Py_ssize_t tsize;
                    size_t size;
    
                    tsize = ((PyVarObject *)obj)->ob_size;
                    if (tsize < 0)
                        tsize = -tsize;
                    size = _PyObject_VAR_SIZE(tp, tsize);
                    assert(size <= PY_SSIZE_T_MAX);
    
                    dictoffset += (Py_ssize_t)size;
                    assert(dictoffset > 0);
                    assert(dictoffset % SIZEOF_VOID_P == 0);
                }
                dictptr = (PyObject **) ((char *)obj + dictoffset);
                dict = *dictptr;
            }
        }
        if (dict != NULL) {
            Py_INCREF(dict);
            res = PyDict_GetItem(dict, name);
            if (res != NULL) {
                Py_INCREF(res);
                Py_DECREF(dict);
                goto done;
            }
            Py_DECREF(dict);
        }
    
        if (f != NULL) {
            res = f(descr, obj, (PyObject *)Py_TYPE(obj));
            goto done;
        }
    
        if (descr != NULL) {
            res = descr;
            descr = NULL;
            goto done;
        }
    
        PyErr_Format(PyExc_AttributeError,
                     "'%.50s' object has no attribute '%U'",
                     tp->tp_name, name);
      done:
        Py_XDECREF(descr);
        Py_DECREF(name);
        return res;
    }
    
        
  • 23.10.2018 / 21:51
    3

    The other answer is very good - including code snippets from the reference implementation. But I will write a shorter answer here, addressing another aspect of the question.

    I think you can think of it as a "guide" to understand many of the language's behaviors - which, in my opinion, is very nice because it brings very few surprises once you understand these guides.

    In Python, all "magic" functionality - that is - methods that are called transparently by the language itself - is tied to attributes of the class, not instances.

    Yes, for ease of recognition, and avoid matching names, these features are also generally denoted by names that begin and end with a double underscore - the famous __dunder__ .

    But then, one of the reasons why this behavior is chosen is this: descriptors imply special behavior for attribute access - so, as the attributes __len__ , __init__ , etc ... are special - and so it has to be defined in the class.

    From this motivation that is more a feeling, there come practical reasons, such as implementation - accessing attributes in an instance works from a normal dictionary - this mechanism would have to be changed so that when retrieving an attribute from the instead of delivering this attribute, some other operation was done.

    And - how would the descriptor be "installed" in an instance to begin with? It would be strange, at the first access of objeto.attr = MeuDescriptor() a normal assignment is done, and from there, in subsequent hits objeto.attr = ... , instead of putting the attribute in __dict__ of the instance, the __set__ method of the descriptor would be called. This would put a state dependency on assignment operations that has the potential to complicate the code a lot - since what an assigment does would depend on the execution order.

    So much so that even a descriptor implementation (in the same class) allows you to write code that does different things depending on the order in which values are assigned to the descriptor: just keep a variable state controlled by the descriptor. But this is not used in almost any case.

    last

    This is the implementation option, but the language is dynamic enough to allow you to make your own classes that work with "instance descriptors" - and it would not even work. Just do a base class by setting the __getattribute__ and __setattr__ methods to work with descriptors. (and of course, the crazy effect I mentioned above would be worth).

    To work only with "readonly" would be something like this:

    class InstanceDescriptableBase:
        def __getattribute__(self, attrname):
            attr = super().__getattribute__(attrname)
            if hasattr(attr, "__get__"):
                attr = attr.__get__(self, self.__class__)
            return attr
    

    And in the terminal:

    In [3]: class D:
       ...:     def __get__(self, instace, owner):
       ...:         return 42
       ...:     
    
    In [4]: class Test(InstanceDescriptableBase):
       ...:     def __init__(self):
       ...:         self.attr = D()
       ...:         
    
    In [5]: t = Test()
    
    In [6]: t.attr
    Out[6]: 42
    
        
    24.10.2018 / 14:40