Situation
I'm doing a search with regex
in a specific word inválido
, but by preference I decided to use inv.lido
. Which I knew I had in the test string, but did not return.
Tests
vr
= var_dump
pr
= print_r
$string = 'até, atenção, Hipótese, você, português, café, órgão';
vr(preg_match('~at.~', $string, $match));
pr($match);
vr(preg_match('~aten..o~', $string, $match));
pr($match);
vr(preg_match('~Hip.tese~', $string, $match));
pr($match);
vr(preg_match('~voc.~', $string, $match));
pr($match);
vr(preg_match('~portugu.s~', $string, $match));
pr($match);
vr(preg_match('~caf.~', $string, $match));
pr($match);
vr(preg_match('~.rg.o~', $string, $match));
pr($match);
Out
int(1)
Array([0] => at�)
int(0)
Array()
int(0)
Array()
int(1)
Array([0] => voc�)
int(0)
Array()
int(1)
Array([0] => caf�)
int(0)
Array()
Question
As you can see, he did not catch the words, except some, but even the ones he captured, I do not know what �
is, because even using utf8_decode
or even utf8_encode
it does not return the character correct.
For the little bit that I know of C
and of binary, I suppose it has to do with the fact that these characters are two-way tracer 8bits, however they are present in the ASCII
table and so I know regex
follows the ASCII
table.
Why did this happen?