What is the logic behind this function?

2

Why the result of L1 is: [2, 3, 4] instead of [3, 4]?

def remove_dups(L1, L2):
    for e in L1:
        if e in L2:
            L1.remove(e)

L1 = [1, 2, 3, 4]
L2 = [1, 2, 5, 6]
remove_dups(L1, L2)
print(L1)

I tested it in Python Tutor and for some reason the second step of for is in value 3. This does not make sense to me.

    
asked by anonymous 02.04.2018 / 16:53

3 answers

5

When you create a for it uses an iterator, which controls the progress of the collection. When you remove the first element, the second happens to be the first, the third happens to be the second, and so it continues. So after the first pass of the loop the list looks like this:

[2, 3, 4]

And the iterator that was at 0 passes to 1 in the second pass, then the check is from number 3, which is not in L2 , nothing is removed, and then the same as 4.

You can not change the object of a collection during an interaction with impunity. The most correct one would be to make a copy or use another algorithm.

What you're doing is (what's behind the for algorithm):

def remove_dups(L1, L2):
    i = 0
    while i < len(L1):
        if L1[i] in L2:
            L1.remove(L1[i])
        i += 1

L1 = [1, 2, 3, 4]
L2 = [1, 2, 5, 6]
remove_dups(L1, L2)
print(L1)

And the correct one would be something like this, though it's still risky if there's some code maintenance:

def remove_dups(L1, L2):
    i = 0
    while i < len(L1):
        if L1[i] in L2:
            L1.remove(L1[i])
        else:
            i += 1

L1 = [1, 2, 3, 4]
L2 = [1, 2, 5, 6]
remove_dups(L1, L2)
print(L1)
    
02.04.2018 / 17:08
2

The Maniero's answer clearly explained that the problem with the implemented logic was to change the structure being iterated and its examples could not be more didactic, so I would just like to add a summarized form of solutions:

def remove_duplicados(l1, l2):
    return [i for i in l1 if i not in l2]

So, the function returns a new list with elements of l1 that do not belong to l2 .

l1 = [1, 2, 3, 4]
l2 = [1, 2, 5, 6]

print(remove_duplicados(l1, l2))  # [3, 4]

See working at Repl.it | Ideone

Some advantages in this solution are:

  • When you replace brackets with brackets in return, the function returns a generator, which can save memory;
  • It is not necessary to create copies of the input lists;
  • The entry lists remain intact (may be useful in some cases);
  • Readable solution;
  • Feedback :

    In the memory economy mentioned in item 1, we have two observations: a) the economy of gives by the fact that, when replacing the brackets by parentheses, the return of the function ceases to be a list and becomes a list generator; In this way, the final list stored in memory is not available, since the generator calculates each item according to its use ( lazy  calculation or call-by-need ). b) the references to the input objects remain the same in of the generator, not creating a copy of them; that is, even after the generator is set, any change in the input lists will affect the generator.

    With brackets, the return is a new list:

    x = [1, 2, 3, 4, 5]
    y = [2, 4]
    
    def remove_duplicados(l1, l2):
        return [i for i in l1 if i not in l2]
    
    z = remove_duplicados(x, y)
    
    print(type(z))  # <class 'list'>
    
    # Apenas para demonstrar a saída:
    print(list(z))  # [1, 3, 5]
    

    See working at Repl.it | Ideone | GitHub GIST

    With parentheses, return is a generator:

    x = [1, 2, 3, 4, 5]
    y = [2, 4]
    
    def remove_duplicados(l1, l2):
        return (i for i in l1 if i not in l2)
    
    z = remove_duplicados(x, y)
    
    print(type(z))  # <class 'generator'>
    
    # Apenas para demonstrar a saída:
    print(list(z))  # [1, 3, 5]
    

    See working at Repl.it | Ideone | GitHub GIST

    As the list references remain the same, any changes to them will be reflected in the generator result:

    x = [1, 2, 3, 4, 5]
    y = [2, 4]
    
    def remove_duplicados(l1, l2):
        return (i for i in l1 if i not in l2)
    
    z = remove_duplicados(x, y)
    
    # Alteração em y:
    y.append(3)
    
    print(type(z))  # <class 'generator'>
    
    # Saída de z modificada devido alteração em y:
    print(list(z))  # [1, 5]
    

    See working at Repl.it | Ideone | GitHub GIST

    Items 2 and 3 are complementary because if the problem requires that the list of entries should remain intact after the function is executed, using list comprehension or the generator does not need to create a copy in memory of the list to modify it; the return will be a new list with just the items you want.

    Item 4 is quite subjective, but I particularly think the solution is much more readable:

    def remove_duplicates(source, reference):
        return (item for item in source if item not in reference)
    
        
    03.04.2018 / 02:15
    0

    The Python for keeps a counter from where it is in the list, if the size of the list reduces, then this will mess up the counter, this code would work well if the list size did not change. The best way to do this is to clone the list.

    This is a problem in the book Introduction to Computation and Programming Using Python: With Application to Understanding Data

        
    02.04.2018 / 17:10