How to remove special character and column point string from a data frame?

0
raw_data = {'NAME': ['José L. da Silva', 
                      'Ricardo Proença', 
                      'Antônio de Morais']}

df = pd.DataFrame(raw_data, columns = ['NAME'])

How to make NAME column names into:

  • Jose L da Silva (no point or accent)
  • Ricardo Proenca (without the cedilla) and
  • Antonio de Morais (without the accent)?
asked by anonymous 04.07.2017 / 15:32

1 answer

1

You can use the apply() function of objects of type Series . With it you can apply any function that returns something. So you can define a correction function and apply it. For example:

def corrigir_nomes(nome):
    nome = nome.replace('.', '').replace('ç', 'c').replace('ô', 'o').replace('é', 'e')
    return nome

And then apply the column you want:

df['NAME'] = df['NAME'].apply(corrigir_nomes)

The result will look something like:

0      Jose L da Silva
1      Ricardo Proenca
2    Antonio de Morais
Name: NAME, dtype: object
    
23.07.2017 / 00:30