Remove accents

13

I need to know how to remove the accents of a data in a column.

# Eu tentei
> library(stringr)
> a <- dados$Municipio[2]
> a
[1] "Arapeí"
> str_replace_all(a, "[í]", "i")
[1] "Arapeí"


# outra tentativa
> iconv(a, to="ASCII//TRANSLIT")
[1] NA

Can anyone help me?

    
asked by anonymous 11.01.2015 / 01:16

6 answers

9

Some time ago I set up this function to draw accents. It never left me in the hand.

rm_accent <- function(str,pattern="all") {
  # Rotinas e funções úteis V 1.0
  # rm.accent - REMOVE ACENTOS DE PALAVRAS
  # Função que tira todos os acentos e pontuações de um vetor de strings.
  # Parâmetros:
  # str - vetor de strings que terão seus acentos retirados.
  # patterns - vetor de strings com um ou mais elementos indicando quais acentos deverão ser retirados.
  #            Para indicar quais acentos deverão ser retirados, um vetor com os símbolos deverão ser passados.
  #            Exemplo: pattern = c("´", "^") retirará os acentos agudos e circunflexos apenas.
  #            Outras palavras aceitas: "all" (retira todos os acentos, que são "´", "'", "^", "~", "¨", "ç")
  if(!is.character(str))
    str <- as.character(str)

  pattern <- unique(pattern)

  if(any(pattern=="Ç"))
    pattern[pattern=="Ç"] <- "ç"

  symbols <- c(
    acute = "áéíóúÁÉÍÓÚýÝ",
    grave = "àèìòùÀÈÌÒÙ",
    circunflex = "âêîôûÂÊÎÔÛ",
    tilde = "ãõÃÕñÑ",
    umlaut = "äëïöüÄËÏÖÜÿ",
    cedil = "çÇ"
  )

  nudeSymbols <- c(
    acute = "aeiouAEIOUyY",
    grave = "aeiouAEIOU",
    circunflex = "aeiouAEIOU",
    tilde = "aoAOnN",
    umlaut = "aeiouAEIOUy",
    cedil = "cC"
  )

  accentTypes <- c("´","'","^","~","¨","ç")

  if(any(c("all","al","a","todos","t","to","tod","todo")%in%pattern)) # opcao retirar todos
    return(chartr(paste(symbols, collapse=""), paste(nudeSymbols, collapse=""), str))

  for(i in which(accentTypes%in%pattern))
    str <- chartr(symbols[i],nudeSymbols[i], str)

  return(str)
}
    
30.01.2015 / 02:10
8

Use this function:

fa <- function(x) iconv(x, to = "ASCII//TRANSLIT")

fa(c("pelé","época"))

[1] "pele"  "epoca"
    
09.10.2015 / 21:27
2

Have you tried using the gsub function?

Usage is: gsub(padrao a ser substituido, padrao usado na substituicao, string)

For example:

coluna = c("aaaí","eeeeí","ooooí")

gsub("í", "i", coluna)

[1] "aaai"  "eeeei" "ooooi"
    
22.01.2015 / 18:14
2

I like it this way:

s <- c("ájakla","ééhasj", "hsíklf", "fdhjó")
chartr("áéíó", "aeio", s)
[1] "ajakla" "eehasj" "hsiklf" "fdhjo" 

That way, just add the accented characters and their substitutions in order.

    
15.01.2016 / 11:53
2

Function stri_trans_general of package stringi

stri_trans_general("Arapeí", "Latin-ASCII")
    
10.11.2017 / 19:00
1

Some time ago I developed this solution for my PHP environment,

function changeLetters($string, $down = true){

        $letters = array(
            'A'=>array('@','â','ä','à','å','Ä','Å','á','ª','Á','Â','À','ã','Ã'),
            'E'=>array('&','é','ê','ë','è','É','£','Ê','Ë','È'),
            'I'=>array('!','ï','î','ì','¡','Í','Î','Ï','Ì','í'),
            'O'=>array('ô','ö','ò','Ö','ø','Ø','ó','º','¤','ð','Ó','Ô','Ò','õ','Õ'),
            'U'=>array('ü','û','ù','Ü','ú','µ','Ú','Û','Ù'),
            'B'=>array('ß'),
            'C'=>array('Ç','ç','©','¢'),
            'D'=>array('Ð'),
            'F'=>array('ƒ'),
            'L'=>array('¦'),
            'N'=>array('ñ','Ñ'),
            'S'=>array('$','§'),
            'X'=>array('×'),
            'Y'=>array('ÿ','¥','ý','Ý'),
            'AE'=>array('æ','Æ'),
            'P'=>array('þ','Þ'),
            'R'=>array('®'),
            '0'=>array('°'),
            '1'=>array('¹','ı'),
            '2'=>array('²'),
            '3'=>array('³'),
        );

        foreach ($letters as $letter => $change){
            if($down){ $letter = down($letter); }
            $string = str_replace($change, $letter, $string);
        }

        return $string;
    }

I know your question is based on r language, but I posted more as a help for the possible variant of letters.

    
22.01.2015 / 19:12