How can you check if the number in range is so fast?

9

It is known that with the module timeit it is possible to measure, in Python, the execution of code snippets. Curious, I've been testing how long it takes to see if a given number is in a range defined by range , such as x in range(A, B) .

For the test, I did:

$ python -m timeit -s '1000000000 in range(9999999999)'

That is, I defined the range [0, 9999999999[ , which is gigantic, and I checked that the 1000000000 value belongs to it. The result was:

100000000 loops, best of 3: 0.00574 usec per loop

Which, in addition to resulting in a rather small time, resembles, for example, the time to check with a much shorter interval:

$ python -m timeit -s '1 in range(10)'
100000000 loops, best of 3: 0.00577 usec per loop

How is this possible?

    
asked by anonymous 17.04.2018 / 03:53

1 answer

13
In fact, considering only the object of type range , the time to check if a certain value belongs to the range is, in the notation big-O , O (1), which means that will be constant regardless of the size of the space checked, which explains the times of both tests being equal. And the explanation for this is quite simple: mathematically it is very easy to check if a number belongs to a range.

TL; DR

The most basic solution I can imagine to check if a number is in a certain range is to go through all the elements of this range, comparing one by one with the value I am checking, and if I find an equal, return true, case otherwise, false.

def pertence(sequencia, valor):
    for item in sequencia:
        if item == valor:
            return True
    return False

Certainly such a solution would never be O (1), in fact it would be O (n), which could justify small times for small intervals, but would not justify the same for large intervals. So how is it possible?

Mathematically, we can say that the number x belongs to the [A, B[ interval if, and only if, x is greater than or equal to A and x is less than B . This is:

Allthismagichappensintherangeobjectthankstotheimplementationofthe__contains__method,whichisimplicitlyinvokedthroughtheinoperator.And,inadditiontoverifyingthatthexvalueisgreaterthanorequaltoAandlessthanB,whenyoudefinetheintervalstep,youmustalsoverifythatthevaluematchesthestep.Forexample,ifIdefinetheobjectrange(0,10,2),Iwillbepickinguptheevenvaluesfrom0to10,andsothenumber3willnotbelongtotherange,evenifitisgreaterthan0andlessthan10.Thus,theimplementationinPythonwouldbesimilarto:

classrange:def__contains__(self,x):ifnotself.A<=x<self.B:returnFalseifself.passoisnotNone:return(x-self.A)%self.passo==0returnTrue

Inthisway,theverificationisnolongerdependentonthesizeoftheinterval,butonaconstantnumberofoperations,whichisasolutionO(1).

Seeworkingat Repl.it

However, the range class of Python belongs to the builtins module, which is implemented in C, not in Python. That is, although the logic is the same, the code that is actually executed to check if the value belongs to the range is another. For the sake of curiosity, you can access the official CPython repository and check the C implementation, but specifically between lines 337 and 379 :

static int
range_contains_long(rangeobject *r, PyObject *ob)
{
    int cmp1, cmp2, cmp3;
    PyObject *tmp1 = NULL;
    PyObject *tmp2 = NULL;
    int result = -1;

    /* Check if the value can possibly be in the range. */

    cmp1 = PyObject_RichCompareBool(r->step, _PyLong_Zero, Py_GT);
    if (cmp1 == -1)
        goto end;
    if (cmp1 == 1) { /* positive steps: start <= ob < stop */
        cmp2 = PyObject_RichCompareBool(r->start, ob, Py_LE);
        cmp3 = PyObject_RichCompareBool(ob, r->stop, Py_LT);
    }
    else { /* negative steps: stop < ob <= start */
        cmp2 = PyObject_RichCompareBool(ob, r->start, Py_LE);
        cmp3 = PyObject_RichCompareBool(r->stop, ob, Py_LT);
    }

    if (cmp2 == -1 || cmp3 == -1) /* TypeError */
        goto end;
    if (cmp2 == 0 || cmp3 == 0) { /* ob outside of range */
        result = 0;
        goto end;
    }

    /* Check that the stride does not invalidate ob's membership. */
    tmp1 = PyNumber_Subtract(ob, r->start);
    if (tmp1 == NULL)
        goto end;
    tmp2 = PyNumber_Remainder(tmp1, r->step);
    if (tmp2 == NULL)
        goto end;
    /* result = ((int(ob) - start) % step) == 0 */
    result = PyObject_RichCompareBool(tmp2, _PyLong_Zero, Py_EQ);
  end:
    Py_XDECREF(tmp1);
    Py_XDECREF(tmp2);
    return result;
}

The r parameter represents the range object itself, and the ob parameter represents the value to be checked. The r object has the start , stop , and step fields that represent the start, end, and step of the range. Although it is a larger implementation, the logic is exactly what was presented earlier in Python.

In C implementation, you can also check that the r object is not modified at any time, which implies that the generator that is defined by range will not be modified by checking whether the value belongs to the range. Because it is a generator and, by definition, a generator can be iterated only once, it would be catastrophic if the in operator traversed range , since the interval would no longer be useful after that. Note that we are only dealing here with the range object and its peculiarities, which does not imply that this is the case with all Python generators.

intervalo = range(10)

if 5 in intervalo:
    for i in intervalo:
        print(i)

See working at Repl.it

    
17.04.2018 / 03:53