Function to replace character does not work when data comes from mysql

1

I use a function to replace characters with accents or special characters, but when I used the same function with data coming from MySQL the function is not replacing characters.

Assuming the city is Foz do Iguaçu , the function would return: Foz do Iguacu , so ç would be replaced by c .

In the MySQL database structure the city is:

  

type = varchar (80)
  Collation = latin1_general_ci

$cidade=removeAcentos($row['cli_cidade'])

function removeAcentos ($string){
    // REMOVENDO ACENTOS
    $tr = strtr($string,
        array (
          'À' => 'A', 'Á' => 'A', 'Â' => 'A', 'Ã' => 'A', 'Ä' => 'A', 'Å' => 'A',
          'Æ' => 'A', 'Ç' => 'C', 'È' => 'E', 'É' => 'E', 'Ê' => 'E', 'Ë' => 'E',
          'Ì' => 'I', 'Í' => 'I', 'Î' => 'I', 'Ï' => 'I', 'Ð' => 'D', 'Ñ' => 'N',
          'Ò' => 'O', 'Ó' => 'O', 'Ô' => 'O', 'Õ' => 'O', 'Ö' => 'O', 'Ø' => 'O',
          'Ù' => 'U', 'Ú' => 'U', 'Û' => 'U', 'Ü' => 'U', 'Ý' => 'Y', 'Ŕ' => 'R',
          'Þ' => 's', 'ß' => 'B', 'à' => 'a', 'á' => 'a', 'â' => 'a', 'ã' => 'a',
          'ä' => 'a', 'å' => 'a', 'æ' => 'a', 'ç' => 'c', 'è' => 'e', 'é' => 'e',
          'ê' => 'e', 'ë' => 'e', 'ì' => 'i', 'í' => 'i', 'î' => 'i', 'ï' => 'i',
          'ð' => 'o', 'ñ' => 'n', 'ò' => 'o', 'ó' => 'o', 'ô' => 'o', 'õ' => 'o',
          'ö' => 'o', 'ø' => 'o', 'ù' => 'u', 'ú' => 'u', 'û' => 'u', 'ý' => 'y',
          'þ' => 'b', 'ÿ' => 'y', 'ŕ' => 'r', 'º' => '', 'ª' => ''
        )
    );

    return $tr;
}
    
asked by anonymous 06.06.2015 / 03:20

1 answer

1
fmoreira@saucer UnmergedCode $ echo '<?= strlen("Á") ?>' | php
2

The problem is that UTF-8 accented characters occupy two ( or more!) characters ; strtr operates on bytes, not on characters.

You can use str_replace (although you'll have to separate your vector in two), or if you can install the PHP extension intl (you will need to fiddle with php.ini and connect to php_intl.dll ; I tried it here on my Mac but I could not), you can use normalizer_normalize .

Internally, a call to normalizer_normalize('bênção', Normalizer::FORM_D) converts a string type bênção to something type be^nc¸a~o , breaking the letters "accented" in the original letter in the respective pendurals. Then you can use a regular expression type [^a-zA-Z] to detonate everything that is NOT letter.

(You will still need to do str_replace to "type 'ª' letters.)

I noticed that you're replacing 'þ' with 'b', but better phonetic transcription , despite the visual similarity, is 'th'. If you expect to have to handle these weird characters, I find it more robust to use some variant of unidecode , a library that also converts, eg "北 亰" into "Bei Jing".

    
06.06.2015 / 14:35