Essentially, it will be necessary to break the text into words into an array. Then we need to count the repeats, sort the result from the highest number of repeats for the least number of repeats, and finally get only the first X.
For this purpose we will use the PHP function array_count_values()
to count the values in the array, the PHP function str_word_count()
to count the number of times the word exists in the given text, PHP function < to order the array in descending order without losing the relation to the key and finally the function of PHP arsort()
to stay in the array just the right amount of words:
/**
* Palavras Mais Repetidas
* Com base no texto recebido, devolver as primeiras X
* palavras mais repetidas
*
* @param string $texto O texto a avaliar
* @param integer $quantidade A quantidade de palavras a devolver
*
* @return array Matriz com as palavras mais repetidas
*/
function palavrasMaisRepetidas($texto="", $quantidade=4) {
$palavras = array_count_values(str_word_count($texto, 1));
arsort($palavras);
return array_slice($palavras, 0, $quantidade);
}
Example:
$texto = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Maecenas porttitor non felis quis dignissim. Morbi varius arcu lorem, eget efficitur nibh interdum vitae. Aenean tristique hendrerit diam a consequat. Nunc eleifend dolor ut rhoncus sollicitudin. Suspendisse tincidunt sodales turpis et egestas. Sed maximus libero malesuada lacus tempor, quis placerat nunc varius. Nam eget lectus imperdiet, lobortis mi sit amet, tristique justo. Fusce in felis et erat auctor vehicula quis dapibus libero. In commodo a leo eu eleifend.";
var_dump(palavrasMaisRepetidas($texto, 5));
Result:
array(4) {
["quis"]=>
int(3)
["tristique"]=>
int(2)
["varius"]=>
int(2)
["a"]=>
int(2)
}
See example on Ideone .