Is there a function in MySQL that gives me the 3 most common words between a column in TEXT
format of a table?
Example
Maria Joaquina
Maria Antonienta
Maria B.
Is there a function in MySQL that gives me the 3 most common words between a column in TEXT
format of a table?
Example
Maria Joaquina
Maria Antonienta
Maria B.
First you have to separate the delimiter for lines, it can be done through function or a solution found in SOEn . / p>
SELECT
dados.id,
SUBSTRING_INDEX(SUBSTRING_INDEX(dados.descricao, ' ', quantidade.n), ' ', -1) as descricao
FROM
(SELECT 1 n UNION ALL SELECT 2
UNION ALL SELECT 3 UNION ALL SELECT 4) as quantidade
INNER JOIN dados
ON CHAR_LENGTH(dados.descricao)-CHAR_LENGTH(REPLACE(dados.descricao, ' ',''))>=quantidade.n-1
Once you have the data line by line simply group and sort by the most used.
SELECT descricao, count(descricao) as quantidade
FROM ( ... )
GROUP BY descricao
ORDER BY quantidade DESC
LIMIT 1
One way is to take advantage of the array and count repeated items using array_count_values
. You will have output as follows:
Array
(
[Maria] => 3
[Joaquina] => 1
[Antonienta] => 1
[B] => 1
)
I worked with a string and used explode, but you can adapt it to the mysql result. I've set an example on Ideone for you to see.
$str = 'Maria Joaquina Maria Antonienta Maria B';
$str = explode( ' ' , $str );
$str = array_count_values( $str );
print_r( $str ); // retorna o output acima, com a contagem total
print_r( key( $str ) ); // retorna o primeiro índice com maior peso: 'Maria'
There is no function ready to do this.
I know what to do in SQL but it is more complicated. As my familiarity with SQL is more limited, especially MySQL and how you put the tag
It can be done in php with preg_split
(you can use explode
but this only supports one character at a time, and in case of whitespaces the best is REGEX) and a vector, in case you can preferably create a function:
function buscarPalavraMaisRecorrente($data, $last=1) {
$itens = array();
$list = preg_split('/\s/', strtolower($data));
$j = count($list);
for ($i = 0; $i < $j; ++$i) {
$key = $list[$i];
if (false === isset($itens[$key])) {
$itens[$key] = 1;//Cria um item ex. maria
} else {
$itens[$key] += 1;//Soma
}
}
$list = null;
$j = count($itens);
if ($j === 0) {
return array();
}
$j = max($itens);
$j = max($j, $last);
$found = array();
for ($i = 0; $i < $j; ++$i) {
$tmp = array_keys($itens, $j - $i);
if (false !== $tmp) {
$found = array_merge($found, $tmp);
}
}
$found = array_slice($found, 0, $j);
return $found;
}
Search the 3 most used words:
$exemplo = 'Maria Joaquina Maria Antonienta Maria B.';
print_r(buscarPalavraMaisRecorrente($exemplo, 3));
Returns the most used word:
$exemplo = 'Maria Joaquina Maria Antonienta Maria B.';
print_r(buscarPalavraMaisRecorrente($exemplo));
Using within your while
:
while ($linha = $consulta->fetch(PDO::FETCH_ASSOC)) {
echo 'Palavra que mais repete: ',
implode(',', buscarPalavraMaisRecorrente($linha['coluna_text'], 3));
}