How to remove accent in upload with php?

9

Part of the file is working. The problem is that when I send a file with accent. Example: I send a file with the name ação-íaaa.jpg so it stays on the ação-íaaa.jpg server. So I wanted to remove the accents to stay like this acao-iaaa.jpg . Suggestions?

$destination_path = getcwd().DIRECTORY_SEPARATOR;
$result = 0;
$target_path = $destination_path . basename( $_FILES['myfile']['name']);
if(@move_uploaded_file($_FILES['myfile']['tmp_name'], $target_path)) {
$result = 1;}
sleep(1);
    
asked by anonymous 18.09.2014 / 18:39

10 answers

12

I do not like to use files that contain special characters, so I always give a "clean" in the names and etc.

  function clearId($id){
     $LetraProibi = Array(" ",",",".","'","\"","&","|","!","#","$","¨","*","(",")","'","´","<",">",";","=","+","§","{","}","[","]","^","~","?","%");
     $special = Array('Á','È','ô','Ç','á','è','Ò','ç','Â','Ë','ò','â','ë','Ø','Ñ','À','Ð','ø','ñ','à','ð','Õ','Å','õ','Ý','å','Í','Ö','ý','Ã','í','ö','ã',
        'Î','Ä','î','Ú','ä','Ì','ú','Æ','ì','Û','æ','Ï','û','ï','Ù','®','É','ù','©','é','Ó','Ü','Þ','Ê','ó','ü','þ','ê','Ô','ß','‘','’','‚','“','”','„');
     $clearspc = Array('a','e','o','c','a','e','o','c','a','e','o','a','e','o','n','a','d','o','n','a','o','o','a','o','y','a','i','o','y','a','i','o','a',
        'i','a','i','u','a','i','u','a','i','u','a','i','u','i','u','','e','u','c','e','o','u','p','e','o','u','b','e','o','b','','','','','','');
     $newId = str_replace($special, $clearspc, $id);
     $newId = str_replace($LetraProibi, "", trim($newId));
     return strtolower($newId);
  }

USE

$target_path = $destination_path . basename( clearId($_FILES['myfile']['name']));

PS: Depending on the coding of your files it may be necessary to use clearId(utf8_encode($_FILES['myfile']['name']))

    
18.09.2014 / 18:49
14

Remove accents in a simple way:

$file = "ação-íaaa.jpg";
$file = iconv('UTF-8', 'ASCII//TRANSLIT', $file);
echo "{$file} <br>";

Output: acao-iaaa.jpg

Example available at ideone

    
18.09.2014 / 20:00
7

I use the Germanix plugin from one of the WordPress Developers moderators and who knows very character encoding and Internationalization and Localization . First, it does a html_entity_decode , then converts to lowercase, then removes duplicates ( for example, ++ for + ) of allowed characters ( -=+. ), and finally does not replace characters with a long and complete list.

/**
 * Limpar nome de arquivo no upload
 * 
 * Sanitization test done with the filename:
 * ÄäÆæÀàÁáÂâÃãÅåªₐāĆćÇçÐđÈèÉéÊêËëₑƒğĞÌìÍíÎîÏïīıÑñⁿÒòÓóÔôÕõØøₒÖöŒœßŠšşŞ™ÙùÚúÛûÜüÝýÿŽž¢€‰№$℃°C℉°F⁰¹²³⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉±×₊₌⁼⁻₋–—‑․‥…‧.png
 * @author toscho
 * @url    https://github.com/toscho/Germanix-WordPress-Plugin
 */
function t5f_sanitize_filename( $filename )
{

    $filename    = html_entity_decode( $filename, ENT_QUOTES, 'utf-8' );
    $filename    = t5f_translit( $filename );
    $filename    = t5f_lower_ascii( $filename );
    $filename    = t5f_remove_doubles( $filename );
    return $filename;
}

/**
 * Converte maiúsculas em minúsculas e remove o resto.
 * https://github.com/toscho/Germanix-WordPress-Plugin
 *
 * @uses   apply_filters( 'germanix_lower_ascii_regex' )
 * @param  string $str Input string
 * @return string
 */
function t5f_lower_ascii( $str )
{
    $str     = strtolower( $str );
    $regex   = array(
        'pattern'        => '~([^a-z\d_.-])~'
        , 'replacement'  => ''
    );
    // Leave underscores, otherwise the taxonomy tag cloud in the
    // backend won’t work anymore.
    return preg_replace( $regex['pattern'], $regex['replacement'], $str );
}


/**
 * Reduz meta caracteres (-=+.) repetidos para apenas um.
 * https://github.com/toscho/Germanix-WordPress-Plugin
 *
 * @param  string $str Input string
 * @return string
 */
function t5f_remove_doubles( $str )
{
    $regex = array(
        'pattern'        => '~([=+.-])\1+~'
        , 'replacement'  => "\1"
    );
    return preg_replace( $regex['pattern'], $regex['replacement'], $str );
}


/**
 * Substitui caracteres não-ASCII.
 * https://github.com/toscho/Germanix-WordPress-Plugin
 *
 * Modified version of Heiko Rabe’s code.
 *
 * @author Heiko Rabe http://code-styling.de
 * @link   http://www.code-styling.de/?p=574
 * @param  string $str
 * @return string
 */
function t5f_translit( $str )
{
    $utf8 = array(
        'Ä'  => 'Ae'
        , 'ä'    => 'ae'
        , 'Æ'    => 'Ae'
        , 'æ'    => 'ae'
        , 'À'    => 'A'
        , 'à'    => 'a'
        , 'Á'    => 'A'
        , 'á'    => 'a'
        , 'Â'    => 'A'
        , 'â'    => 'a'
        , 'Ã'    => 'A'
        , 'ã'    => 'a'
        , 'Å'    => 'A'
        , 'å'    => 'a'
        , 'ª'    => 'a'
        , 'ₐ'    => 'a'
        , 'ā'    => 'a'
        , 'Ć'    => 'C'
        , 'ć'    => 'c'
        , 'Ç'    => 'C'
        , 'ç'    => 'c'
        , 'Ð'    => 'D'
        , 'đ'    => 'd'
        , 'È'    => 'E'
        , 'è'    => 'e'
        , 'É'    => 'E'
        , 'é'    => 'e'
        , 'Ê'    => 'E'
        , 'ê'    => 'e'
        , 'Ë'    => 'E'
        , 'ë'    => 'e'
        , 'ₑ'    => 'e'
        , 'ƒ'    => 'f'
        , 'ğ'    => 'g'
        , 'Ğ'    => 'G'
        , 'Ì'    => 'I'
        , 'ì'    => 'i'
        , 'Í'    => 'I'
        , 'í'    => 'i'
        , 'Î'    => 'I'
        , 'î'    => 'i'
        , 'Ï'    => 'Ii'
        , 'ï'    => 'ii'
        , 'ī'    => 'i'
        , 'ı'    => 'i'
        , 'I'    => 'I' // turkish, correct?
        , 'Ñ'    => 'N'
        , 'ñ'    => 'n'
        , 'ⁿ'    => 'n'
        , 'Ò'    => 'O'
        , 'ò'    => 'o'
        , 'Ó'    => 'O'
        , 'ó'    => 'o'
        , 'Ô'    => 'O'
        , 'ô'    => 'o'
        , 'Õ'    => 'O'
        , 'õ'    => 'o'
        , 'Ø'    => 'O'
        , 'ø'    => 'o'
        , 'ₒ'    => 'o'
        , 'Ö'    => 'Oe'
        , 'ö'    => 'oe'
        , 'Œ'    => 'Oe'
        , 'œ'    => 'oe'
        , 'ß'    => 'ss'
        , 'Š'    => 'S'
        , 'š'    => 's'
        , 'ş'    => 's'
        , 'Ş'    => 'S'
        , '™'    => 'TM'
        , 'Ù'    => 'U'
        , 'ù'    => 'u'
        , 'Ú'    => 'U'
        , 'ú'    => 'u'
        , 'Û'    => 'U'
        , 'û'    => 'u'
        , 'Ü'    => 'Ue'
        , 'ü'    => 'ue'
        , 'Ý'    => 'Y'
        , 'ý'    => 'y'
        , 'ÿ'    => 'y'
        , 'Ž'    => 'Z'
        , 'ž'    => 'z'
        // misc
        , '¢'    => 'Cent'
        , '€'    => 'Euro'
        , '‰'    => 'promille'
        , '№'    => 'Nr'
        , '$'    => 'Dollar'
        , '℃'    => 'Grad Celsius'
        , '°C' => 'Grad Celsius'
        , '℉'    => 'Grad Fahrenheit'
        , '°F' => 'Grad Fahrenheit'
        // Superscripts
        , '⁰'    => '0'
        , '¹'    => '1'
        , '²'    => '2'
        , '³'    => '3'
        , '⁴'    => '4'
        , '⁵'    => '5'
        , '⁶'    => '6'
        , '⁷'    => '7'
        , '⁸'    => '8'
        , '⁹'    => '9'
        // Subscripts
        , '₀'    => '0'
        , '₁'    => '1'
        , '₂'    => '2'
        , '₃'    => '3'
        , '₄'    => '4'
        , '₅'    => '5'
        , '₆'    => '6'
        , '₇'    => '7'
        , '₈'    => '8'
        , '₉'    => '9'
        // Operators, punctuation
        , '±'    => 'plusminus'
        , '×'    => 'x'
        , '₊'    => 'plus'
        , '₌'    => '='
        , '⁼'    => '='
        , '⁻'    => '-' // sup minus
        , '₋'    => '-' // sub minus
        , '–'    => '-' // ndash
        , '—'    => '-' // mdash
        , '‑'    => '-' // non breaking hyphen
        , '․'    => '.' // one dot leader
        , '‥'    => '..'  // two dot leader
        , '…'    => '...'  // ellipsis
        , '‧'    => '.' // hyphenation point
        , ' '    => '-'   // nobreak space
        , ' '    => '-'   // normal space
    );

    $str = strtr( $str, $utf8 );
    return trim( $str, '-' );
}

Then just pass the file name to the main function:

t5f_sanitize_filename( $nome_do_arquivo );
    
18.09.2014 / 20:07
5

Place at the beginning of the script

ini_set("default_charset","UTF-8");

or use

$nome = utf8_encode($_FILES['myfile']['name']);

Must solve, at least the name gets right, but does not remove the accent. If you want to have a unique name for the file without highlighting you can do this:

$nome = md5(date("YmdHis").$_FILES['myfile']['name']).jpg;
    
18.09.2014 / 18:46
3

#, the alternative below was the most efficient, I believe that the case is very similar.

function replaceChar($str){
        $str = preg_replace('/[áàãâä]/ui', 'a', $str);
        $str = preg_replace('/[éèêë]/ui', 'e', $str);
        $str = preg_replace('/[íìîï]/ui', 'i', $str);
        $str = preg_replace('/[óòõôö]/ui', 'o', $str);
        $str = preg_replace('/[úùûü]/ui', 'u', $str);
        $str = preg_replace('/[ç]/ui', 'c', $str);
        $str = preg_replace('/[^a-z0-9]/i', '_', $str);
        $str = preg_replace('/_+/', '_', $str);
        return $str;
    }
    
12.06.2015 / 14:35
1

Complementing existing responses.

There is a Unicode character block called Combining Diacritical Marks , they are used for produce accents.

If the text contains any of these characters, none of the solutions listed will remove those accents. There are two ways to deal with this problem:

1 - remove characters using regular expression:

<?php
// remove somente caracteres dentro do intervalo
preg_replace('/[\x{0300}-\x{036f}]+/u', '', $string);
// remove todos os caracteres do bloco de caracteres
preg_replace('/[\p{M}]+/u', '', $string);

2 - convert combination signs into accented characters before applying accent removal:

<?php
normalizer_normalize($string);

This function works only if the internationalization extension "intl" is enabled on the server.

    
12.06.2015 / 13:14
1
function removeAcentos($string, $slug = false) {
if(mb_detect_encoding($string.'x', 'UTF-8, ISO-8859-1') == 'UTF-8'){
$string = utf8_decode(strtolower($string)); }
$ascii['a'] = range(224, 230);
$ascii['e'] = range(232, 235);
$ascii['i'] = range(236, 239);
$ascii['o'] = array_merge(range(242, 246), array(240, 248));
$ascii['u'] = range(249, 252);
$ascii['b'] = array(223);
$ascii['c'] = array(231);
$ascii['d'] = array(208);
$ascii['n'] = array(241);
$ascii['y'] = array(253, 255);
foreach ($ascii as $key=>$item) {
$acentos = '';
foreach ($item as $codigo) $acentos .= chr($codigo);
$troca[$key] = '/['.$acentos.']/i'; }
$string = preg_replace(array_values($troca), array_keys($troca), $string);  if ($slug) {
$string = preg_replace('/[^a-z0-9]/i', $slug, $string);
$string = preg_replace('/' . $slug . '{2,}/i', $slug, $string);
$string = trim($string, $slug); }
return $string; }
echo removeAcentos("Palavras com acentuação");
echo removeAcentos("Palavras com acentuação", "_");
    
20.03.2016 / 13:31
1

The simplest and most efficient way to remove accents is to map characters with PHP's built-in function, iconv :

setlocale(LC_CTYPE, 'pt_BR'); // global (pode ser LC_ALL) 

function unaccent($str){
    return iconv('UTF-8', 'ASCII//TRANSLIT', $str);
}

The iconv is a standardized and very mature function , with high performance and high reliability , usually a function call common library operating system (Linux, Windows and other systems).

    
10.07.2017 / 03:13
1

The solution below also solves the problem, and cleanly:

$string = 'ÁÍÓÚÉÄÏÖÜËÀÌÒÙÈÃÕÂÎÔÛÊáíóúéäïöüëàìòùèãõâîôûêÇç'; // Entrada
$semAcentos = preg_replace('/['^~\'"]/', null, iconv('UTF-8', 'ASCII//TRANSLIT', $string));
echo $semAcentos;

// Saída: AIOUEAIOUEAIOUEAOAIOUEaioueaioueaioueaoaioueCc

Thanks to Carlos Coelho, from whom I got this solution.

    
19.10.2017 / 00:22
0

Here is the function to remove accents using regular expression, much simpler and more compact.

<?php
function removerAcentos( $string ) {
    $mapaAcentosHex  = array(
        'a'=> '/[\xE0-\xE6]/',
        'A'=> '/[\xE0-\xE6]/',
        'e'=> '/[\xE8-\xEB]/',
        'E'=> '/[\xE8-\xEB]/',
        'i'=> '/[\xEC-\xEF]/',
        'I'=> '/[\xEC-\xEF]/',
        'o'=> '/[\xF2-\xF6]/',
        'O'=> '/[\xF2-\xF6]/',
        'u'=> '/[\xF9-\xFC]/',
        'U'=> '/[\xF9-\xFC]/',
        'c'=> '/\xE7/',
        'C'=> '/\xE7/',
        'n'=> '/\xF1/',
        'N'=> '/\xF1/'
    );
    foreach ($mapaAcentosHex as $letra => $expressaoRegular) {
        $string = preg_replace( $expressaoRegular, $letra, $string);
    }
    return $string;
}
    
18.04.2017 / 16:14