Count the most popular words

Question

Count the most popular words

Navigation

#1 by (2 votes)
#2 by (2 votes)
#3 by (2 votes)

1

I'm trying to find the number of occurrences of a list of words in a text:

from collections import Counter

def popularidade (texto, palavras):

    texto = texto.lower().split()
    palavras = palavras.lower().split()

    lista = []

    for p in palavras:
        for t in texto:
            if p == t:
                lista.append(t)
                return Counter(lista)

print(popularidade("Ao nos resolver a esta tarefa, preste nos atenção nos seguintes a nos pontos:", "nos, a, preste"))

Result:

Counter ({'preste': 1})

Desired result:

{'us': 4, 'a': 2, 'please': 1}

python python-3.x

asked by anonymous 10.04.2018 / 11:34

3 answers

2

We have two problems there, my friend.

The indentation of the code is wrong on the line of your return. It should be referenced to the 1st 'FOR' and not to the IF, so it can return only after the full completion of its list

Since you have commas next to the words at the time of Split, it separates the words together with the commas ( palavra = ['nos,' , 'a,' , 'preste' ] ), so it does not find these words in the text.

The correct code in this case would be:

from collections import Counter

def popularidade (texto, palavras):

    texto = texto.lower().split()

    palavras = palavras.lower().split()


    lista = []

    for p in palavras:
        for t in texto:
            if p == t:
                lista.append(t)

    return Counter(lista)

print(popularidade("Ao nos resolver a esta tarefa, preste nos atenção nos seguintes a nos pontos:", "nos a preste"))

Result: Counter ({'nos': 4, 'a': 2, 'preste': 1})

10.04.2018 / 13:48

2

def frequencia(texto):
    frequencia_por_palavra = [texto.count(p) for p in texto]
    return dict(zip(texto, frequencia_por_palavra))

def popularidade(texto, palavras):
    dFrequencia = frequencia(texto)
    return dict((k, dFrequencia[k]) for k in palavras if k in dFrequencia)

print(popularidade(open('texto.txt').read().split(), ['filhos', 'amada']))

texto.txt contains the Brazilian anthem

Result:

{'amada': 4, 'filhos': 2}

Option 2

If you like regular expressions, you can also do this:

def popularidade(texto, palavra):
    import re
    return sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(palavra), texto))

palavras = "nos, a, preste"
texto = "Ao nos resolver a esta tarefa, preste nos atenção nos seguintes a nos pontos:"
d = dict( (v, popularidade(texto, v)) for v in palavras.split(",") )
print(d)

10.04.2018 / 13:46

Call method by class or instance? Primitive type int () in Python

score 2 · Accepted Answer

There are actually two things that are missing you,

The return causes the function to return (stop the execution) just after this line, and there is a detail that is missing you, the commas of the words, which causes the check to not return true, eg 'nos' == 'nos,' = False .

Your corrected code:

from collections import Counter

def popularidade (texto, palavras):

    texto = texto.lower().split()
    palavras = palavras.lower().replace(',', '').split() # tirar virgulas

    lista = []

    for p in palavras:
        for t in texto:
            if p == t:
                lista.append(t)
    return Counter(lista) # return quando todas as palavras verificadas

print(popularidade("Ao nos resolver a esta tarefa, preste nos atenção nos seguintes a nos pontos:", "nos a preste"))

STATEMENT

To tell the truth you do not even need 'to happen', nor to Counter() :

palavras = "nos, a, preste"
texto = "Ao nos resolver a esta tarefa, preste nos atenção nos seguintes a nos pontos:"
palavras_spl = palavras.lower().replace(',', '').split()
text_spl = texto.lower().split()
count = {p: text_spl.count(p) for p in palavras_spl if p in text_spl}
print(count) # {'preste': 1, 'a': 2, 'nos': 4}

DEMONSTRATION

Sequiseres completely remove the score of both, to ensure that both are only with words:

import string

palavras = "nos, a, preste"
texto = "Ao nos resolver a esta tarefa, preste nos atenção nos seguintes a nos pontos:"

palavras_spl = palavras.translate(palavras.maketrans('','',string.punctuation)).lower().split()
text_spl = texto.translate(texto.maketrans('','',string.punctuation)).lower().split()
count = {p: text_spl.count(p) for p in palavras_spl if p in text_spl}
print(count) # {'preste': 1, 'a': 2, 'nos': 4}

DEMONSTRATION