Try this:
print_r(array_count_values(str_word_count($texto, 1, "óé")));
Result:
Array (
[Hoje] => 1
[nós] => 1
[vamos] => 1
[falar] => 1
[de] => 2
[PHP] => 2
[uma] => 1
[linguagem] => 1
[criada] => 1
[no] => 1
[é] => 1
[ano] => 1
)
To understand how array_count_values
works see the php manual .
Editing
A smarter solution (language independent)
With the above solution, you need to specify the entire set of utf-8 special characters (as was done with ó
and é
).
Following a tricky solution, however, eliminates the problem of the special character set.
$text = str_replace(".","", "Hoje nós vamos falar de PHP. PHP é uma linguagem criada no ano de ...");
$namePattern = '/[\s,:?!]+/u';
$wordsArray = preg_split($namePattern, $text, -1, PREG_SPLIT_NO_EMPTY);
$wordsArray2 = array_count_values($wordsArray);
print_r($wordsArray2);
In this solution I use regular expressions to break words and then I use array_count_values
to count words. The result is:
Array
(
[Hoje] => 1
[nós] => 1
[vamos] => 1
[falar] => 1
[de] => 2
[PHP] => 2
[é] => 1
[uma] => 1
[linguagem] => 1
[criada] => 1
[no] => 1
[ano] => 1
)
This solution also meets the need, however, the points must be eliminated before splitting the words, otherwise words with .
and words without .
will appear in the result. For example:
...
[PHP.] => 1
[PHP] => 1
...
Word counting is never such a simple task. It is necessary to know well the string
who wants to count the words before applying a definitive solution.