Count occurrences in a list according to prefixes

8

Let's imagine that I have a list

['rato', 'roeu', 'rolha', 'rainha', 'rei', 'russia']

and another list with prefixes

['ro', 'ra', 'r']

How do I count how many times each prefix is within the first list?

    
asked by anonymous 07.04.2015 / 23:10

4 answers

5
words = ['rato', 'roeu', 'rolha', 'rainha', 'rei', 'russia']
pref = ['ro', 'ra', 'r']

contTotal = 0
for p in pref:
    cont = 0
    for w in words:
        if w.startswith(p):
            cont+=1
    contTotal += cont
    print p + ' aparece ' + str(cont) + ' vezes nas palavras como prefixo'
print 'O numero total de vezes é ' + str(contTotal)
    
07.04.2015 / 23:31
6

Abominable functional One-liner with sum and map :

sum(map(lambda x: 1 if x.startswith(tuple(pref)) else 0, words))

Abominable one-liner with reduce :

reduce(lambda x, y: x + 1 if y.startswith(tuple(pref)) else x, words, 0)

:)

Update:

As per requirements of the OP, even more abominable one-liner of snows:

map(lambda p: reduce(lambda c, w: c + 1 if w.startswith(p) else c, words, 0), pref)

Less forced example:

def countPrefix(words, prefix):
    return len([1 for w in words if w.startswith(prefix)]) 

[countPrefix(words, p) for p in pref]   

Result:

[2, 2, 6]
    
08.04.2015 / 00:30
5

I could do it this way:

>>> palavras = ['rato', 'roeu', 'rolha', 'rainha', 'rei', 'russia']
>>> prefixos = ['ro', 'ra', 'r']
>>> len(filter(None, [p if p.startswith(tuple(prefixos)) else None for p in palavras]))
6

* In this case to know how many words had one of the occurrences of the list.

    
08.04.2015 / 00:09
5

Assuming that there is no "hierarchy" between prefixes (eg, every word that begins with ro also starts with r ), a simple and straightforward way is using itertools.product . It will combine each element of the first list with each element of the second list. So just filter those that the second is prefixed from the first, and count:

>>> import itertools
>>> palavras = ['rato', 'roeu', 'rolha', 'rainha', 'rei', 'russia']
>>> prefixos = ['ro', 'ra', 'r']
>>> len([palavra for palavra,prefixo in itertools.product(palavras, prefixos) if palavra.startswith(prefixo)])
10

Note that the first 4 words have been counted twice (since they begin with both ro / ra and r ). For a solution that only counts each word once, see for example the responses from Orion and Anthony Accioly (the Dherik does both, and still counts occurrences by prefix).

    
08.04.2015 / 01:39