Remove specific characters in Python

1

My question is the following, in this code snippet I'm removing a character that I specify with replace() :

lista = [["de carlos,"],["des.dd carlossd,"],["Peixe, texto!"]]
lista_separados = ['.',',',':','"','?','!',';']



for i, j in enumerate(lista):
   lista[i] = j[0].replace(',','').replace('!','').replace('.','')

print (lista)

output:

['de carlos', 'des dd carlossd', 'Peixe texto']

In this example I was able to delete the specified characters, but

Does anyone have any idea how to accomplish this in another way?

    
asked by anonymous 07.10.2017 / 05:20

2 answers

1

Another way to do this is to translate , which:

lista = [["de carlos,"],["des.dd carlossd,"],["Peixe, texto!"]]
lista_separados = ['.',',',':','"','?','!',';']

trans = {ord(i): '' for i in lista_separados} # mapear codigo ascii de cada caracter para o seu substituto, neste caso nada...    
for idx, val in enumerate(lista):
   lista[idx][0] = val[0].translate(trans)
print(lista) # [['de carlos'], ['desdd carlossd'], ['Peixe texto']]

DEMONSTRATION

I do not know why you have a list of lists, you really need to cut it, but otherwise you can simply do it with a list of one dimension:

lista = [["de carlos,"],["des.dd carlossd,"],["Peixe, texto!"]]
lista_separados = ['.',',',':','"','?','!',';']

trans = {ord(i): '' for i in lista_separados}
lista = [j.translate(trans) for i in lista for j in i]
print(lista) # ['de carlos', 'desdd carlossd', 'Peixe texto']

DEMONSTRATION

Note that you can also pass None instead of empty string: {ord(i): None for ... }

    
07.10.2017 / 11:02
4

You can use Regular Expressions

  

Maybe you have an even better way to do this, someone with more experience.

I'll show you two options, the first one leaves only letters, numbers and spaces.

import re

lista = [["de carlos,"],["des.dd carlossd,"],["Peixe, texto!"]]
for i, j in enumerate(lista): lista[i] = re.sub('[^a-zA-Z0-9 ]', '', re.sub(r'\.', ' ', j[0]))

print (lista)
  

See working at repl

If I wanted to keep a endpoint that is in a sentence, not a . that is in the middle of a word, and any other special character.

import re

lista = [["Esse, ponto! vai permanecer, porque e um ponto final. Agora esses.pontos.serao.substituidos.por.espacos.porque.esta no@ meio¨&*() das #palavras"]]

for i, j in enumerate(lista): lista[i] = re.sub('[^a-zA-Z0-9 .]', '', re.sub(r'\.\b', ' ', j[0]))

print (lista)
  

See working at repl

    
07.10.2017 / 06:54