Most common words among rows

3

Is there a function in MySQL that gives me the 3 most common words between a column in TEXT format of a table?

Example

Maria Joaquina
Maria Antonienta
Maria B.

    
asked by anonymous 30.12.2014 / 00:26

4 answers

4

First you have to separate the delimiter for lines, it can be done through function or a solution found in SOEn . / p>

SELECT 
  dados.id, 
  SUBSTRING_INDEX(SUBSTRING_INDEX(dados.descricao, ' ', quantidade.n), ' ', -1) as descricao
FROM 
    (SELECT 1 n UNION ALL SELECT 2
     UNION ALL SELECT 3 UNION ALL SELECT 4) as quantidade
INNER JOIN dados
     ON CHAR_LENGTH(dados.descricao)-CHAR_LENGTH(REPLACE(dados.descricao, ' ',''))>=quantidade.n-1

SQLFiddle

Once you have the data line by line simply group and sort by the most used.

SELECT descricao, count(descricao) as quantidade
FROM ( ... )
GROUP BY descricao
ORDER BY quantidade DESC
LIMIT 1

SQLFiddle

    
30.12.2014 / 03:55
6

One way is to take advantage of the array and count repeated items using array_count_values . You will have output as follows:

Array
(
    [Maria] => 3
    [Joaquina] => 1
    [Antonienta] => 1
    [B] => 1
)

I worked with a string and used explode, but you can adapt it to the mysql result. I've set an example on Ideone for you to see.

$str = 'Maria Joaquina Maria Antonienta Maria B';
$str = explode( ' ' , $str );
$str = array_count_values( $str );
print_r( $str ); // retorna o output acima, com a contagem total
print_r( key( $str ) ); // retorna o primeiro índice com maior peso: 'Maria'
    
30.12.2014 / 03:23
4

There is no function ready to do this.

I know what to do in SQL but it is more complicated. As my familiarity with SQL is more limited, especially MySQL and how you put the tag

30.12.2014 / 01:21
3

It can be done in php with preg_split (you can use explode but this only supports one character at a time, and in case of whitespaces the best is REGEX) and a vector, in case you can preferably create a function:

function buscarPalavraMaisRecorrente($data, $last=1) {
    $itens = array();

    $list = preg_split('/\s/', strtolower($data));
    $j = count($list);

    for ($i = 0; $i < $j; ++$i) {
        $key = $list[$i];
        if (false === isset($itens[$key])) {
            $itens[$key] = 1;//Cria um item ex. maria
        } else {
            $itens[$key] += 1;//Soma
        }
    }

    $list = null;

    $j = count($itens);
    if ($j === 0) {
        return array();
    }

    $j = max($itens);
    $j = max($j, $last);

    $found = array();

    for ($i = 0; $i < $j; ++$i) {
        $tmp = array_keys($itens, $j - $i);
        if (false !== $tmp) {
            $found = array_merge($found, $tmp);
        }
    }

    $found = array_slice($found, 0, $j);
    return $found;
}

Search the 3 most used words:

$exemplo = 'Maria Joaquina Maria Antonienta Maria B.';
print_r(buscarPalavraMaisRecorrente($exemplo, 3));

Returns the most used word:

$exemplo = 'Maria Joaquina Maria Antonienta Maria B.';
print_r(buscarPalavraMaisRecorrente($exemplo));

Using within your while :

while ($linha = $consulta->fetch(PDO::FETCH_ASSOC)) {
    echo 'Palavra que mais repete: ',
          implode(',', buscarPalavraMaisRecorrente($linha['coluna_text'], 3));
}
    
30.12.2014 / 01:06