How to get the 10 most frequent words of em array?

5

I need to know how to get the ten most frequent words.

This code takes all the words in a text and saves how many times it occurred.

if len(palavra) > 0:
   if palavra in conjunto:
     qtd = conjunto[palavra]
     qtd += 1
     conjunto[palavra] = qtd
  else:
     conjunto[palavra]

How do I return only the 10 most frequent occurrences?

    
asked by anonymous 30.06.2017 / 17:56

1 answer

6

( TL; DR )

Collections:

import collections

# Lista de palavras
words = ['Banana', 'Maçã','Laranja', 
'Melão','Uva','Abacaxi','Abacate','Pimenta','Banana', 
'Maçã','Banana','Melão','Banana','Uva','Abacaxi','Fake','Fake']

# Contador para as ocorrencias de cada palavra
c = collections.Counter(words)

print (c)
Counter({'Banana': 4, 'Maçã': 2, 'Melão': 2, 'Uva': 2, 'Abacaxi': 2, 'Fake': 2, 
'Laranja': 1, 'Abacate': 1, 'Pimenta': 1})


# As 3 palavras mais frequentes
c.most_common(3)
[('Banana', 4), ('Melão', 2), ('Uva', 2)]

Run the code on repl.it.

    
30.06.2017 / 18:23