I have this text file that is processed to capitalize and this part does it correctly.
olá meu nome é meu nome pois eu olá
é meu nome walt não disney
olá
Then I have this function that should be able to calculate the frequency of each word (and do it as it should). And then you must sort the dataFreq
list and calculate the probability of a particular word appearing in the text. That is, this way: frequenciaPalavra/totalPalavras
def countWordExact(dataClean):
count = {}
dataFreq = []
global total
for word in dataClean.splitlines():
for word in word.split(" "):
if word in count:
count[word] += 1
else:
count[word] = 1
total += 1
dataFreq.append(count)
freq = []
for indice in sorted(count, key=count.get):
#print(count[indice])
freq.append((count[indice])/total)
#print(freq)
return dataFreq
My question is: how to order the dictionary (consecutively the list) and add to it the values resulting from the calculation of the frequency indicated above? I give the example:
[{'olá': 0.12, 'meu': 0.12, 'nome': 0.132, 'é': 0.12321, 'pois': 0.56, 'eu': 0.65, 'walt': 0.7, 'não': 0.7, 'disney': 0.5}]
(the above frequency values are wrong)