str_pad
does not work with multi-byte characters, since it does not add all the characters at once.
The original code of str_pad
is exactly:
switch (pad_type_val) {
case STR_PAD_RIGHT:
left_pad = 0;
right_pad = num_pad_chars;
break;
case STR_PAD_LEFT:
left_pad = num_pad_chars;
right_pad = 0;
break;
case STR_PAD_BOTH:
left_pad = num_pad_chars / 2;
right_pad = num_pad_chars - left_pad;
break;
}
/* First we pad on the left. */
for (i = 0; i < left_pad; i++)
ZSTR_VAL(result)[ZSTR_LEN(result)++] = pad_str[i % pad_str_len];
/* Then we copy the input string. */
memcpy(ZSTR_VAL(result) + ZSTR_LEN(result), ZSTR_VAL(input), ZSTR_LEN(input));
ZSTR_LEN(result) += ZSTR_LEN(input);
/* Finally, we pad on the right. */
for (i = 0; i < right_pad; i++)
ZSTR_VAL(result)[ZSTR_LEN(result)++] = pad_str[i % pad_str_len];
ZSTR_VAL(result)[ZSTR_LEN(result)] = 'const STR_PAD_INSERT_ALL = 4;
function mb_str_pad(string $input, int $pad_length, string $pad_string, int $pad_type, string $pad_encoding = 'utf8') : string {
$result = '';
$pad_insert_all = 0;
$pad_inset_limit = 1;
$pad_str_len = mb_strlen($pad_string, $pad_encoding);
$input_len = mb_strlen($input, $pad_encoding);
if ($pad_length < 0 || $pad_length <= $input_len) {
return $input;
}
if(($pad_type & STR_PAD_INSERT_ALL) === STR_PAD_INSERT_ALL){
$pad_insert_all = PHP_INT_MAX;
$pad_inset_limit = null;
$pad_type -= STR_PAD_INSERT_ALL;
}
if ($pad_str_len === 0) {
trigger_error ( "Padding string cannot be empty", E_WARNING);
return $input;
}
if ($pad_type < STR_PAD_LEFT || $pad_type > STR_PAD_BOTH) {
trigger_error ("Padding type has to be STR_PAD_LEFT, STR_PAD_RIGHT, or STR_PAD_BOTH", E_WARNING);
return $input;
}
$num_pad_chars = $pad_length - $input_len;
if ($num_pad_chars >= PHP_INT_MAX) {
trigger_error ("Padding length is too long", E_WARNING);
return $input;
}
switch ($pad_type) {
case STR_PAD_RIGHT:
$left_pad = 0;
$right_pad = $num_pad_chars;
break;
case STR_PAD_LEFT:
$left_pad = $num_pad_chars;
$right_pad = 0;
break;
case STR_PAD_BOTH:
$left_pad = floor($num_pad_chars / 2);
$right_pad = $num_pad_chars - $left_pad;
break;
}
for ($i = 0; $i < $left_pad; $i++){
$result .= mb_substr($pad_string, ($i % $pad_str_len) &~$pad_insert_all, $pad_inset_limit, $pad_encoding);
}
$result .= $input;
for ($i = 0; $i < $right_pad; $i++){
$result .= mb_substr($pad_string, ($i % $pad_str_len) &~$pad_insert_all, $pad_inset_limit, $pad_encoding);
}
return $result;
}
';
RETURN_NEW_STR(result);
Source.
Note the presence of i % pad_str_len
, that is, it only adds a single byte, which can cause an unknown byte to remain. For example, if you are using chr(160)
, this is for Latin1 and not for UTF8.
In Latin1, the byte A0
represents "non-breaking space". But the same thing in UTF8 requires two bytes, being C2 A0
. If you cut one of them, for example, by isolating C2
, you will have a ?
.
If you want a "new version" of str_pad
we could create a mb_str_pad()
:
mb_str_pad($nome, 30, "\xc2\xa0", STR_PAD_BOTH, 'utf8');
This requires PHP 7 +
This is a version extremely based on the original version of PHP, indicated above, with some changes:
It supports multi-bytes, so you can do:
switch (pad_type_val) {
case STR_PAD_RIGHT:
left_pad = 0;
right_pad = num_pad_chars;
break;
case STR_PAD_LEFT:
left_pad = num_pad_chars;
right_pad = 0;
break;
case STR_PAD_BOTH:
left_pad = num_pad_chars / 2;
right_pad = num_pad_chars - left_pad;
break;
}
/* First we pad on the left. */
for (i = 0; i < left_pad; i++)
ZSTR_VAL(result)[ZSTR_LEN(result)++] = pad_str[i % pad_str_len];
/* Then we copy the input string. */
memcpy(ZSTR_VAL(result) + ZSTR_LEN(result), ZSTR_VAL(input), ZSTR_LEN(input));
ZSTR_LEN(result) += ZSTR_LEN(input);
/* Finally, we pad on the right. */
for (i = 0; i < right_pad; i++)
ZSTR_VAL(result)[ZSTR_LEN(result)++] = pad_str[i % pad_str_len];
ZSTR_VAL(result)[ZSTR_LEN(result)] = 'const STR_PAD_INSERT_ALL = 4;
function mb_str_pad(string $input, int $pad_length, string $pad_string, int $pad_type, string $pad_encoding = 'utf8') : string {
$result = '';
$pad_insert_all = 0;
$pad_inset_limit = 1;
$pad_str_len = mb_strlen($pad_string, $pad_encoding);
$input_len = mb_strlen($input, $pad_encoding);
if ($pad_length < 0 || $pad_length <= $input_len) {
return $input;
}
if(($pad_type & STR_PAD_INSERT_ALL) === STR_PAD_INSERT_ALL){
$pad_insert_all = PHP_INT_MAX;
$pad_inset_limit = null;
$pad_type -= STR_PAD_INSERT_ALL;
}
if ($pad_str_len === 0) {
trigger_error ( "Padding string cannot be empty", E_WARNING);
return $input;
}
if ($pad_type < STR_PAD_LEFT || $pad_type > STR_PAD_BOTH) {
trigger_error ("Padding type has to be STR_PAD_LEFT, STR_PAD_RIGHT, or STR_PAD_BOTH", E_WARNING);
return $input;
}
$num_pad_chars = $pad_length - $input_len;
if ($num_pad_chars >= PHP_INT_MAX) {
trigger_error ("Padding length is too long", E_WARNING);
return $input;
}
switch ($pad_type) {
case STR_PAD_RIGHT:
$left_pad = 0;
$right_pad = $num_pad_chars;
break;
case STR_PAD_LEFT:
$left_pad = $num_pad_chars;
$right_pad = 0;
break;
case STR_PAD_BOTH:
$left_pad = floor($num_pad_chars / 2);
$right_pad = $num_pad_chars - $left_pad;
break;
}
for ($i = 0; $i < $left_pad; $i++){
$result .= mb_substr($pad_string, ($i % $pad_str_len) &~$pad_insert_all, $pad_inset_limit, $pad_encoding);
}
$result .= $input;
for ($i = 0; $i < $right_pad; $i++){
$result .= mb_substr($pad_string, ($i % $pad_str_len) &~$pad_insert_all, $pad_inset_limit, $pad_encoding);
}
return $result;
}
';
RETURN_NEW_STR(result);
Differences from the original version:
PS: Assuming I did not insert a bug.
-
Multi-byte support:
It supports characters that require multiple bytes. You can specify the encoding type used, including UTF8, which is the default.
-
A new "STR_PAD_INSERT_ALL":
You can insert the whole string, instead of "switching to one another", if you have a string with more than one character (eg "abc"), you can specify to always insert "abc" this has a side effect since the number of characters entered is not measured. To use, just use STR_PAD_BOTH | STR_PAD_INSERT_ALL
, but this is not necessary in YOUR CASE.
-
Return on error:
Even in case where a WARNING is issued it will return the original string, which is not the behavior of the original function.