I'm trying to remove unwanted words from any text, but it removes other words. For example:
remover_palavras = ["a", "e", "o"]
The program returns: btt (potato), mns (less)
What to do?
I'm trying to remove unwanted words from any text, but it removes other words. For example:
remover_palavras = ["a", "e", "o"]
The program returns: btt (potato), mns (less)
What to do?
If it's a simple joke, you can create an algorithm that does the following:
Create a list of words to remove from text.
Create a list ( lista_frase
) where each element of the list is a word from its original phrase.
Create a second list ( result
), selecting items from the first list ( lista_frase
) that are not in the list of excluded words ( remover_palavras
).
Put together all elements of the resulting list by separating them by a space.
Code sample:
frase = 'Oi, eu sou Goku e estou indo para a minha casa'
remover_palavras = ['a', 'e']
lista_frase = frase.split()
result = [palavra for palavra in lista_frase if palavra.lower() not in remover_palavras]
retorno = ' '.join(result)
print(retorno)
The output will be
Hi, I'm Goku I'm going to my house
For me the best way would be with Regular Expressions:
import re
text = 'Oi, eu sou Goku e estou indo para a minha casa'
palavras = ['a','e']
for i in palavras:
text = re.sub(r'\s'+i+'([\s,\.])',r'',text)
print(text)
I find it interesting that in case there is any punctuation that it is maintained, but then it will interest you.
I'm a beginner in Python, but a function that solves your problem.
def remover_palavra(palavra, remover):
remover_tamanho = len(remover)
palavra_tamanho = len(palavra)
while True:
remover_posicao = palavra.find(remover)
if remover_posicao != -1:
palavra_inicio = palavra[0:remover_posicao]
palavra_fim = palavra[remover_posicao+remover_tamanho:palavra_tamanho]
palavra = palavra_inicio + palavra_fim
else:
break
return palavra
palavras = ["batata", "menos"]
palavras_para_remover = ["a", "e", "o"]
for palavra in palavras:
resultado = palavra;
for remover in palavras_para_remover:
resultado = remover_palavra(resultado, remover)
print(resultado)
btt
mns