The algorithm should receive a string, count as many words as possible, and return a list of tuples with the words that appear the most in the string and how often it appears. The problem is that in searches are words that start out the same counts more times. Type: "but" and "butter", it counts but 3X and 2X butter. "betty bought a bit of butter but the butter was bitter"
I still want to sort first by the words that appear the most and if they appear an equal number of times, in the alphabetical order of the words. Type: "falling" and "down", both appear 4X, so on the exit order first "down" and then "falling". "london bridge is falling down falling down london bridge is falling down my fair lady"
def count_words(s, n):
top_n = []
itens = n
words = s.split()
pref = words
for p in pref:
cont = 0
for w in words:
if w.startswith(p):
cont+=1
if (p, cont) not in top_n:
top_n.append((p, cont))
top_n.sort(key = lambda t:t[1], reverse = True)
#from operator import itemgetter
#sorted(top_n, key = itemgetter(1), reverse = True)
while len(top_n) > itens:
del top_n[len(top_n)-1]
return top_n
def test_run():
print count_words("cat bat mat cat bat cat", 3)
print count_words("betty bought a bit of butter but the butter was bitter", 3)
print(count_words("london bridge is falling down falling down falling down london bridge is falling down my fair lady", 5))
if __name__ == '__main__':
test_run()