Handling items during iteration

0

In Python it is somewhat common to go through items from an iterable handle or checking for the existence of a particular item, or whatever the operation. I'm reading the Python documentation again and I come across something that when seen for the first time I did not believe it to be something we say relevant. For example:

>>> animals = ['Cat', 'Dog', 'Elephant']
>>> for animal in animals:
...     print(animal, len(animal))

Cat 3
Dog 3
Elephant 8

How can you see itero on a list containing animal names and print the current animal on the loop, as well as the number of characters in the loop. The only operation I do in this example is the printing of their name and the number of characters, I do not try to change the value of them in any way. My question exactly when I think about changing those values.

>>> animals = ['Cat', 'Dog', 'Elephant']
>>> for animal in animals:
...     if len(animal) > 3:
            animal = animal[:3]
        print(animal, len(animal))

Cat 3
Dog 3
Ele 3

>>> print(animals)

The output inside the loop as you can see shows that the variable in the loop was changed, but when printing the list after the loop I see that the list itself was not. This puzzled me because honestly what I am showing here as examples are just things that came to mind that I believe we all thought of doing that is changing a variable during a loop of repetition, another fact is that by studying Python we learn that some of its most basic types such as string , int , float and even tuplas are immutable, ie because reassignment in the current loop variable did not generate a new string (in the case a string of three characters), thinking in this way I thought to check if the variable in the current loop was the same as the one in the list being iterated, as follows:

>>> animals = ['Cat', 'Dog', 'Elephant']
>>> for animal in animals:
...     print(animal, id(animal))

Cat 140226536948264
Dog 140226536948040
Elephant 140226536875248

>>> animals[0], id(animals[0])
Cat 140226536948264
>>> animals[1], id(animals[1])
Dog 140226536948040
>>> animals[2], id(animals[2])
Elephant 140226536875248

Have you noticed? They are the same identifiers. I do not know in other languages but in the Python documentation it is described that one should create a copy of the iterate before the loop since it does not implicitly do so, but my doubt is still this because I can not change the current item of the loop for .

    
asked by anonymous 20.11.2018 / 23:47

2 answers

2

The question got a bit confusing because the result can not be reproduced. It seems this is what I wanted to demonstrate:

animals = ['Cat', 'Dog', 'Elephant']
for animal in animals:
    print(animal, id(animal))
for animal in animals:
    if len(animal) > 3:
        animal = animal[:3]
    print(animal, len(animal), id(animal))
for animal in animals:
    print(animal, len(animal), id(animal))

See running on ideone . And in Coding Ground . Also I put GitHub for future reference .

Notice that I showed the addresses of each item first, then I showed the items showing the id with indication when it changes, and again the id s indicating that the list is intact. When you are going to do a test you have to have a control, show the phenomenon occurring and then the state of everything, otherwise you can have illusions.

And the doubt seems to be because the list has not changed. The question has been changed in one detail of the text, but everything else still gives the wrong understanding of what you wanted.

The loop variable is not what you are thinking. It does not have the value of the item but an immutable reference for an item in the data collection that is sweeping the loop. So you are not allowed to change the data in the collection. You think you have a variable that carries an isolated value. But that's not until you try to write in it. There is what is called COW (Copy On Write) and a new reference is created and allocated in the variable, pointing to a new location in memory where the value of the element is copied and for its own protection it does not let you change this value in the list.

In a direct loop there would be no such protection. The foreach common design pattern across multiple languages exists to make it easier to iterate items safely. If you need flexibility and unlock security using a raw loop.

Note that the id before the loop in the third item is one, and inside it when there is change is another. So your test did not parse correctly and gave you false information. The variable animal when it is changed has another value. When it is not changed by optimization it does not have to have another value.

I think the fundamental concept here is the COW, it caused the illusion and the original test did not enable it to be observed. It is common in all languages for value types to be immutable and to use COW to optimize access. Referral types are not usually immutable and there is no point in having the COW, so a type would allow you to change its value. Note that string even is a type by reference by optimization, but has value semantics, so it is immutable and follows the same criteria.

    
21.11.2018 / 00:01
2

This section of the documentation has relevant information .

What happens in fact is not very intuitive, but is easily explained. When iterating over a list, each item is assigned, one at a time, to an intermediate loop variable, a reference to the iterable object.

I mean, this:

minha_lista = [1, 2, 3, 4, 5]

for item in minha_lista:
    if item == 2:
        item = 10
print(minha_lista)
# [1, 2, 3, 4, 5]

It's actually the equivalent of this:

minha_lista = [1, 2, 3, 4, 5]

for i in range(len(minha_lista)):
    item = minha_lista[i]
    if item == 2:
        item = 10
print(minha_lista)
[1, 2, 3, 4, 5]

Notice the difference? When we assign to this intermediate variable, we are actually changing the variable reference, not the original item in the list. That's for immutable types.

The assignment to this intermediate variable works the same as a normal assignment. That is, if we iterate over a list of lists, for example, the variable will point to the reference (and not the value) of each of those lists. We can see that the following behaves differently:

lista_de_listas = [[1, 2, 3], [4, 5, 6]]

for item in lista_de_listas:
    if item == [1, 2, 3]:
        item[1] = 10
print(lista_de_listas)
# [[1, 10, 3], [4, 5, 6]]

One way around this difficulty when dealing with immutable objects is to access the list directly. You can do this without losing the convenience of having an intermediate variable with enumerate :

minha_lista = [1, 2, 3, 4, 5]

for i, item in enumerate(minha_lista):
    if item == 2:
        minha_lista[i] = 10
print(minha_lista)
# [1, 10, 3, 4, 5]
    
21.11.2018 / 00:09