Even "a" is appearing I do not know why!
The code as is passed in every word of lista
and see if it exists in the text. And it does not have to exist as a loose word, it just exists in the middle and that's why a
appears:
texto = "Hoje é sábado, vamos sair pois o dia está bonito. Até mais tarde."
#o 'a' está aqui---^---
The in
of Python operator in this case checks to see if the text is in question.
For your purpose, simply reverse the logic of for
by scrolling through the text word by word and checking to see if it exists in the list. This not only solves the problem of a
as well as guarantees the order:
lista = ["dia", "noite", "tarde", "é", "está", "bonito", "o", "a", "muito", "feio"]
texto = "Hoje é sábado, vamos sair pois o dia está bonito. Até mais tarde."
frase = []
for palavras in texto.split(' '): #agora texto e com split(' ') para ser palavras
if palavras in lista: #para cada palavra agora verifica se existe na lista
frase.append(palavras)
print (' '.join(frase))
See the Ideone example
Note that splitting words with spaces will catch words with the characters as .
and ,
, getting words like bonito.
or tarde.
, causing the code to not find them
You can work around this problem in many ways. One of the simplest is to remove these templates before analyzing:
texto2 = texto.replace('.','').replace(',','');
See Ideone on how to get this analyzed
You can even do something more generic and create a list of punctuation marks to remove and remove through a custom function:
def retirar(texto, careteres):
for c in careteres:
texto = texto.replace(c, '')
return texto
And now use this function over the original text:
texto2 = retirar(texto, ".,");
See this example on Ideone