How to find the most common value inside each column of an array using python?

0

I have an array (3 x 3280), I need to traverse each column and consequently each row and find the most frequent value and thus generate another vector (1 x 3280) with those values. For example:

matriz=[1 2 3 4 ....],[2 3 4 5....],[1 2 4 4...]]

For the first column, traversing all three lines has [1 2 1] , then most common value is 1. For the second column traversing the three lines [2 3 2] then the most common value is 2.

I tried to make a code in Python, but as I do not know anything about Python it gives a lot of errors.

    
asked by anonymous 01.06.2018 / 17:59

2 answers

0

Similar to what the Vitor Hugo posted in your answer, just calculate the transposed matrix and check the most common element of each line; the logic is exactly what he used, but it is possible to do it in a simpler way:

def most_common_of_columns(matrix):
    for column in zip(*matrix):
        most_commons = Counter(column).most_common(1)
        yield most_commons[0][0]

Where zip(*matrix) returns the transposed array; Counter(column).most_common(1) returns a list with the pair (number, quantity) of the most common number, and finally most_commons[0][0] returns the most common number.

So, do something like:

matrix = [
    [1,1,1,1,2,3,4,5,9,8], 
    [1,2,3,3,3,4,5,6,7,8], 
    [1,1,1,1,3,4,4,4,4,4]
]

print(list(most_common_of_columns(m)))

Returns: [1, 1, 1, 1, 3, 4, 4, 5, 9, 8] , which are the most common elements of each column. Note that if there is no more common element, the first line element will be returned.

    
01.06.2018 / 21:04
0

To do the procedure you describe, you will basically need the type Counter found in the collections lib and the zip function, native to python. First I'll paste the example code and at the end I'll explain.

from collections import Counter

m = [
    [1,1,1,1,2,3,4,5,9,8], 
    [1,2,3,3,3,4,5,6,7,8], 
    [1,1,1,1,3,4,4,4,4,4]
]

inversa = []
contadores = []
resultado = []

for x in zip(m[0],m[1],m[2]):
    inversa.append(list(x))


for x in inversa:
    a = Counter(x)
    b = a.most_common(1)
    contadores.append(b)


for x in contadores:
    a = x[0]
    b = a[0]
    resultado.append(b)

print(resultado)

We started the jobs by importing Counter and defining the working matrix. For teaching purposes I have defined a 3 by 10 matrix, but do not worry, you can increase its size and the code will still work.

After this I also defined three auxiliary lists, which will be explained throughout the code.

The first thing to do is to get the inverse (Mathematically, the correct term is transposed ) from the inserted matrix, as this will make our work easier. We do this using a for loop that traverses all the terms in the list of tuples returned from the zip function and saves each tuple as a line from the inverse list.

If you interrupt the code there and give it a print in reverse you would see something like:

[[1, 1, 1],
 [1, 2, 1],
 [1, 3, 1],
 [1, 3, 1],
 [2, 3, 3],
 [3, 4, 4],
 [4, 5, 4],
 [5, 6, 4],
 [9, 7, 4],
 [8, 8, 4]]

After that we use another for loop to run all the rows of the inverse array (which are the columns of our original array) and use Counter to count the number of repetitions of each term.

Still within our for loop we use the most_common method with the 1 parameter to get the most common term for each line, and save the tuples resulting from this loop in the counters list.

If you stopped the code here and gave print to counters you'd see something like:

[[(1, 3)], [(1, 2)], [(1, 2)], [(1, 2)], [(3, 2)], [(4, 2)], [(4, 2)], [(4, 1)], [(9, 1)], [(8, 2)]]

Since each column in our array has a [(x, y)] , x represents the most repeated term and y the number of times it was repeated.

Finally, we run all tuples from the list of counters, we save in% with_the first field of each line (this is necessary because although it is a list of tuples with only one component, it is still a list).

Still within the final loop , we save in a the first term within b , because it is the most repeated term that intersects us, not how many times it was repeated. So we finally put all the values of a into the results list and display this.

At the end of the code you will get this result:

[1, 1, 1, 1, 3, 4, 4, 4, 9, 8]

Being the list results the vector composed of the most common terms:)

    
01.06.2018 / 20:42