Make word search within a list and return ordered tuples in PYTHON

3

The algorithm should receive a string, count as many words as possible, and return a list of tuples with the words that appear the most in the string and how often it appears. The problem is that in searches are words that start out the same counts more times. Type: "but" and "butter", it counts but 3X and 2X butter. "betty bought a bit of butter but the butter was bitter"

I still want to sort first by the words that appear the most and if they appear an equal number of times, in the alphabetical order of the words. Type: "falling" and "down", both appear 4X, so on the exit order first "down" and then "falling". "london bridge is falling down falling down london bridge is falling down my fair lady"

def count_words(s, n):   
top_n = []
itens = n
words = s.split()
pref = words
for p in pref:
    cont = 0
    for w in words:
        if w.startswith(p):
            cont+=1
    if (p, cont) not in top_n:
        top_n.append((p, cont))
top_n.sort(key = lambda t:t[1], reverse = True)
#from operator import itemgetter
#sorted(top_n, key = itemgetter(1), reverse = True)
while len(top_n) > itens:
    del top_n[len(top_n)-1]    
return top_n

def test_run():
    print count_words("cat bat mat cat bat cat", 3)
    print count_words("betty bought a bit of butter but the butter was bitter", 3)
    print(count_words("london bridge is falling down falling down falling down london bridge is falling down my fair lady", 5))

if __name__ == '__main__':
    test_run()
    
asked by anonymous 05.11.2016 / 12:01

2 answers

2
def count_words(s, n):   
    top_n = []
    itens = n
    words = s.split()
    top_n = dict([])
    for w in words:
        if w not in top_n.keys(): top_n[w] = 0
        top_n[w] +=1
    top_n = list(top_n.items())
    #print(top_n)
    top_n.sort(key = lambda t:(-t[1],t[0]), reverse = False)
    return top_n[:n]

def test_run():
    print(count_words("cat bat mat cat bat cat", 3))
    print(count_words("betty bought a bit of butter but the butter was bitter", 3))
    print(count_words("london bridge is falling down falling down falling down london bridge is falling down my fair lady", 5))


if __name__ == '__main__':
    test_run()

I think this is clearer, the idea here is to use the words as a dictionary key, doing the counting. The problem with sorting is that you want it to be sorted down by counting and increasing by the word, what I did there was to transform the number to negative -t[1] , first by ordering the negative version of the number and then by the word, now of form decreasing.

    
05.11.2016 / 14:10
2
from operator import itemgetter
import re

sentence = 'london bridge is falling down falling down falling down london bridge is falling down my fair lady'

def count_words(text):
    words = re.findall(r'\w+', text)    
    wordsCount = [(words.count(word), word) for word in set(words)]        
    wordsCount.sort(key=itemgetter(1)) #order by word
    wordsCount.sort(key=itemgetter(0), reverse=True) #order by wordcount   
    return wordsCount

print(count_words(sentence))

Result: [4, 'down'], [4, 'falling'], [2, 'bridge'], [2, 'is'], [2, fair '], [1,' lady '], [1,' my ']]

The above function uses re.findall to find the words and then counts the words returned in the wordsCount list, a sequence of lists with each of the words in the text and their number of occurrences. In sequence we use the sort function to sort the list, first alphabetically and then according to the number of occurrences, highlighting that the sort function persists the alphabetical sort performed in the first step.

A fairly legible solution in a few lines.

    
05.11.2016 / 21:07