The PHP functions whose nomenclature starts with "mb_" belong to the functions MBString
MB stands for "Multibyte", ie functions for manipulating multibyte strings.
Encodes as UTF8 are multibyte (multi-byte). In the official documentation, see the list of supported encodings: link
Practical example
<?php
date_default_timezone_set('Asia/Tokyo');
ini_set('error_reporting', E_ALL);
error_reporting(E_ALL);
ini_set('log_errors',TRUE);
ini_set('html_errors',FALSE);
ini_set('display_errors',TRUE);
define( 'CHARSET', 'UTF-8' );
ini_set( 'default_charset', CHARSET );
if( PHP_VERSION < 5.6 ){
ini_set( 'mbstring.http_output', CHARSET );
ini_set( 'mbstring.internal_encoding', CHARSET );
}
header( 'Content-Type: text/html; charset=' . CHARSET );
/*
Retorna 6
Cada caracter "coração" está ocupando 3 bytes.
Caso queira contar a quantidade de bytes, strlen() é o mais indicado.
*/
echo strlen('I♥NY') . PHP_EOL . '<br />';
/*
Retorna 4
Caso queira contar a quantidade de caracteres, utilize a função equivalente em MBString
*/
echo mb_strlen('I♥NY');
/*
Note que mesmo os caracteres latinos são multibyte
*/
echo strlen('ação') . PHP_EOL . '<br />';
echo mb_strlen('ação');
?>
Another term rarely used to refer to multibyte characters is "variable-width encoding".
link
Additional note
It is not always necessary to use mbstring functions. An example of a case is when it is known that a given string does not have multibyte characters.
Example:
echo strlen('123') . PHP_EOL . '<br />';
echo mb_strlen('123');
As the example shows, in this case it is unnecessary, however, we can delve deeper with another numerical example.
echo strlen('123') . PHP_EOL . '<br />';
echo mb_strlen('123');
In this example, they are numbers, however, multibyte.
There are many well-developed systems that "think" to be internationalized, but the vast majority do not test with the real world, as if the global term is simply the American and European continent.
More than 60% of the planet (Arabs, Greeks, Russians, Indians, Asians) uses multibyte characters and each language has such peculiarities as this example of multibyte numbers in the Japanese language table.
Therefore, it is recommended to use the MBString functions if you want to build a system that offers the greatest possible compatibility with the various existing encodings.
Another important note: UTF8 is not an encode compatible with all languages. And the MBString functions are not limited to UTF8.
For example, Chinese characters are best supported by the Big5 encode.
There is also the use of UTF16 or UTF32.
However, even for Chinese characters, UTF8 is also used with some certainty, as it is "rare" for the Chinese themselves to use all the ideograms. There are more than 60 thousand.