Remove punctuation and symbols in Python

0

I'm trying to remove punctuation symbols and other symbols (characters like copyright, for example) from a string.

I want to leave the accented characters, the hyphen, the apostrophe ('), the white space, in addition to the letters and numbers.

How to do this in python?

    
asked by anonymous 18.04.2016 / 21:40

2 answers

1

Try to use regex:

import re

string_nova = re.sub(u'[^a-zA-Z0-9áéíóúÁÉÍÓÚâêîôÂÊÎÔãõÃÕçÇ: ]', '', string_velha.decode('utf-8'))
    
18.04.2016 / 21:42
2

You can select the characters one by one to remove them.

def chr_remove(old, to_remove):
    new_string = old
    for x in to_remove:
        new_string = new_string.replace(x, '')
    return new_string

so you can remove only the desired characters. Ex:

> s = "string $com (caracteres#."
> print chr_remove(s, "$(#") # remove $,# e ( da string
string com caracteres.
    
18.04.2016 / 21:57