How to remove unwanted words from a text?

2

I'm trying to remove unwanted words from any text, but it removes other words. For example:

remover_palavras = ["a", "e", "o"]

The program returns: btt (potato), mns (less)

What to do?

    
asked by anonymous 30.06.2017 / 20:46

3 answers

4

If it's a simple joke, you can create an algorithm that does the following:

  • Create a list of words to remove from text.

  • Create a list ( lista_frase ) where each element of the list is a word from its original phrase.

  • Create a second list ( result ), selecting items from the first list ( lista_frase ) that are not in the list of excluded words ( remover_palavras ).

  • Put together all elements of the resulting list by separating them by a space.

  • Code sample:

    frase = 'Oi, eu sou Goku e estou indo para a minha casa'
    
    remover_palavras  = ['a', 'e']
    lista_frase = frase.split()
    
    result = [palavra for palavra in lista_frase if palavra.lower() not in remover_palavras]
    
    retorno = ' '.join(result)
    print(retorno)
    

    The output will be

      

    Hi, I'm Goku I'm going to my house

    See working on repl.it

        
    30.06.2017 / 20:55
    0

    For me the best way would be with Regular Expressions:

    import re
    
    text = 'Oi, eu sou Goku e estou indo para a minha casa'
    palavras = ['a','e']
    
    for i in palavras:
        text = re.sub(r'\s'+i+'([\s,\.])',r'',text) 
    
    print(text)
    

    I find it interesting that in case there is any punctuation that it is maintained, but then it will interest you.

        
    21.07.2017 / 00:55
    -1

    I'm a beginner in Python, but a function that solves your problem.

    Function

    def remover_palavra(palavra, remover):
        remover_tamanho = len(remover)
        palavra_tamanho = len(palavra)
        while True:
            remover_posicao = palavra.find(remover)
            if remover_posicao != -1:
                palavra_inicio = palavra[0:remover_posicao]
                palavra_fim = palavra[remover_posicao+remover_tamanho:palavra_tamanho]
                palavra = palavra_inicio + palavra_fim
            else:
                break
        return palavra
    

    Test:

    palavras = ["batata", "menos"]
    palavras_para_remover = ["a", "e", "o"]
    for palavra in palavras:
        resultado = palavra;
        for remover in palavras_para_remover:
            resultado = remover_palavra(resultado, remover)
        print(resultado)
    

    Output:

    btt
    mns
    
        
    02.07.2017 / 07:00