How do I get a String size correctly using UTF8?

4

I'm doing some tests and I realized that the string has special characters and counts more than one in substr .

Example:

$string = "PAÇOCA";

echo strlen($string);
echo substr($string, 0, 3);

I should print: PAÇ but only print PA , now if I increase a size from 3 to 4 it prints, and if I take out Ç and put C , it counts correctly, so I realize he is considering Ç as if it were two characters, how can I count them correctly?

I've already tried using mb_string as well. and the header with UTF8.

    
asked by anonymous 08.07.2016 / 20:03

3 answers

6

Functions mb_ are enough, but need to configure for the correct encoding:

mb_internal_encoding('UTF-8');

Then the result is

$string = "PAÇOCA";

echo mb_strlen($string);            // 6
echo mb_substr($string, 0, 3);      // PAÇ

Except your code must have been saved in UTF-8 in the editor / IDE too!
After all, you're providing a literal value in the source that is not affected by the PHP settings itself.

Be careful not to unnecessarily set other settings to avoid confusion. The ideal thing is to hit everything in php.ini , if possible, not runtime .


Manual:

  

link

Configuring in php.ini

  

link

    
08.07.2016 / 20:28
5

I suggest updating the settings. It would look like this:

    setlocale(LC_ALL,'pt_BR.UTF8');
    mb_internal_encoding('UTF8'); 
    mb_regex_encoding('UTF8');

    $string = "PAÇOCA";
    echo strlen($string);
    echo '<br>';
    echo mb_substr($string, 0, 3);
    
08.07.2016 / 20:24
2

Adding the small detail.

In the substr documentation Andreas Bur says:

  

To get a substring of UTF-8 characters, I recommend   mb_substr

Example:

<?php
 $string = "PAÇOCA";

 echo strlen($string);
 echo mb_substr($string, 0, 3, 'UTF-8');
?>
    
08.07.2016 / 20:37