Doubt function PHP - mb_strlen

0

I understand how "mb_strlen" works, but I did not understand an example:

<?php mb_strlen($string, '8bit'); ?>

What would this "8-bit" be?

    
asked by anonymous 02.02.2018 / 05:19

3 answers

3

8bit is one of the internal character encodings supported in functions Multibyte String - mb_[função] .

This code basically tells the Multibyte functions how the string should be converted to run correctly.

For example, if you run the code below you will get the following outputs:

<?php
    $string = 'ὼ'; // Caractere especial qualquer

    echo strlen($string);             // Saída: 3
    echo mb_strlen($string, '8bit');  // Saída: 3
    echo mb_strlen($string, 'UTF-8'); // Saída: 1 - CORRETO!

In conclusion, the strlen() q function works fine for characters from the ASCII table and the 8bit encoding returns incorrectly relative to UTF-8 . The UTF-8 ( Unicode ) pattern is the most efficient and recommended by W3.org .

To find out the default encoding set in your project, you can run:

<?php
    echo mb_internal_encoding(); // Aqui retornou: UTF-8

Or to set the internal encoding for the UTF-8 pattern:

<?php
    mb_internal_encoding('UTF-8');

Here you can see the list of supported encodings.

    
02.02.2018 / 15:34
1

The second parameter is the character encoding that you are using. Most likely you'll want this parameter set to UTF-8 ,

If you'd like to understand the function better, I suggest you take a look at the reference by clicking here

    
02.02.2018 / 12:43
1

Summary: strlen is not trusted, but using mb_string (..., '8bit') is not always possible.

The question is interesting, because 8bit is not typically common, as stated in the other answers. But I think the answer of @Paul Imon, leads to the mistake in several cases. There is nothing wrong with mb_strlen('ὼ', '8bit') result 3 , you are just ignoring the encoding used, this response is correct for 8bit .

Imagine that, for example, you have the following two information:

0xDF     0xBF
11011111 10111111

This is any two bytes, which may or may not have been uniformly generated. If you are interested in bytes, it matters little your coding. UTF-8 has a kind of "signaling" for next bytes, so the first byte indicates how many bytes there are, so we can treat it as a single character.

UTF-8, for example, will always be an ASCII when using a single byte (0xxxxxxx), but when it has two it will be (110xxxxx) and all bytes that are not the first one must be (10xxxxxx).

This character DOES NOT EXIST in UTF-8, try:

echo "\xDF\xBF"; //= ߿

But its coding indicates that it has two bytes, so execute:

echo mb_strlen("\xDF\xBF", 'UTF-8'); //= 1

Returns% w /%, even if the character does not even exist. But, this character exists in UTF-16LE, this set of bytes represents 1 in UTF-16LE:

echo iconv('UTF-16LE', 'UTF-8', "\xDF\xBF"); //= 뿟

However using will result in 2, after all there are 2 bytes. I believe "wrong" is not the word that best describes it, because all forms are correct, depending on where you will apply it, of course.

The 8bit will treat each byte individually, regardless of encoding, it will treat each byte as one byte, in the simplest possible way, it can even use values outside of ASCII, such as 8bit .

0xFF should be used to prevent problems with the mb_strlen(..., '8bit') function, which only now has become obsolete . This problem is not applicable if you do not have the Multibyte String installed.

Then the answer from @Paul Imon is wrong again. Using a native language feature set at mbstring.func_overload modifies php.ini entirely:

mbstring.func_overload = 2

Test:

echo strlen("\xDF\xBF");  //= 1

See, the behavior of strlen() is no longer the same as strlen , if you use mb_strlen(..., '8bit') .

Summary, if you want to deal with bytes:

$texto = "\xDF\xBF";

if (extension_loaded('mbstring') && defined('MB_OVERLOAD_STRING') && ini_get('mbstring.func_overload') & MB_OVERLOAD_STRING) {

 echo mb_strlen($texto, '8bit');

}else{

 echo strlen($texto);

}

This will use mbstring.func_overload = 2 by default, but if overload is being used, then we use strlen to ensure that we will not use modified mb_strlen . Remember that not all have mbstring installed, so using default strlen is not always possible. If you are sure that mbstring is installed you can only use mb_string(..., '8bit') . ;)

    
07.04.2018 / 13:46