How do I return the words that appear the most in a column?

6

I have this table below which has two columns, being id and description :

CREATE TABLE myBigTable (
    id INT(11) AUTO_INCREMENT PRIMARY KEY,
    description TEXT NOT NULL
)

After inserting some records, I made a simple select and resulted in this below:

+----+-----------------------+
| id |     description       |
+----+-----------------------+
| 1  | joão de santo cristo  |
| 2  | eduardo e mô nica     |
| 3  | santo cristo joão     |
| 4  | cristo tadeu joão     |
| 5  | juazeiro do joão      |
+----+-----------------------+

I would like a select to return the number of times each word appears in a given column, such as in the description column. Below is the return in which it would be desired, containing the word and the quantity. See:

+------------+----------+
|    work    |    qnt   |
+------------+----------+
|    joão    |    4     |
|   cristo   |    3     |
|   santo    |    2     |
|    ...     |   ...    |
+------------+----------+

I made a small test using LIKE, however I have to put the word I want it to return to.

SELECT count(*) as qnt FROM 'phrase' WHERE description LIKE "%joao%"

Return:

+----------+
|   qnt    |
+----------+
|    4     |
+----------+

How would a select be to return the words that pop up in a particular column? Is it possible to do this using only database resources?

Note: would not necessarily have to be the number of times a word appears, but rather, if possible, the number of rows it is. >     

asked by anonymous 07.03.2017 / 23:00

2 answers

4

I found an online solution, I give all credit to article author / a> which has arranged an ingenious solution to split a string in% with%. However the solution requires you to create a table of numbers with as many entries as the number of words you support.

Here's an example:

Layout:

Create table Data2(
  id int AUTO_INCREMENT primary key,
  nome nvarchar(255)
);

insert into Data2 (nome) values ('ola');
insert into Data2 (nome) values ('ola adeus');
insert into Data2 (nome) values ('adeus ola');
insert into Data2 (nome) values ('gelados ola');
insert into Data2 (nome) values ('viva');

create table numbers (
  n int
);

insert into numbers values (1);
insert into numbers values (2);
--...

Query:

select nome, count(1) totalLinhas from (
  select id, nome, count(1) as total from ( 
    select 
      id, 
      substring_index(
        substring_index(nome, ' ', n), 
        ' ', 
        -1
      ) as nome
    from Data2
    join numbers
      on char_length(nome) 
        - char_length(replace(nome, ' ', '')) 
        >= n - 1
  ) t1
  group by id, nome
)t2
group by nome

sqlfiddle

    
07.03.2017 / 23:44
1

I do not know how it would look in MySQL, I'm not the bank here to do the test, but in the sql server it would be like this;

declare @tabela table
(
    id int,
    description varchar(500)
) 

declare @tabelaAux table
(
    description varchar(500)
) 

declare @texto varchar(200), @textoAux varchar(200), @Index int,@Separador varchar(2) = ' '

--insert into @tabela values
--( 1,'joão de santo cristo')  
--,(2,'eduardo e mô nica')    
--,(3,'santo cristo joão')    
--,(4,'cristo tadeu joão')   
--,(5,'juazeiro do joão') 

insert into @tabela values 
( 1,'joão de santo cristo joão carlos joão joão de santo cristo joão carlos joão  ') ,
(2,'eduardo e mô nica') ,
(3,'joão santo joão cristo cristo  cristo cristo  joão') ,
(4,'cristo tadeu joão') ,
(5,'juazeiro do joão')


-- Cursor para percorrer os nomes dos objetos 
DECLARE cursor_tabela CURSOR FOR
select description from @tabela

    -- Abrindo Cursor para leitura
    OPEN cursor_tabela

    -- Lendo a próxima tabela
    FETCH NEXT FROM cursor_tabela INTO @texto

    -- Percorrendo linhas do cursor (enquanto houverem)
    WHILE @@FETCH_STATUS = 0
    BEGIN
         SET @Index = charIndex(@Separador,@texto)
         WHILE (@Index > 0) BEGIN  
            set @textoAux = SubString(@texto,1,@Index-1);
            insert into @tabelaAux( description) values(@textoAux)
            set @texto =  SubString(@texto, @Index+1, len(@texto) - @Index)
            SET @Index = charIndex(@Separador,@texto)

        END
        insert into @tabelaAux( description) values(@texto)

    -- Lendo a próxima linha
    FETCH NEXT FROM cursor_tabela INTO @texto
    END

-- Fechando Cursor para leitura
CLOSE cursor_tabela

-- Desalocando o cursor
DEALLOCATE cursor_tabela

select count(description)total,  description from @tabelaAux
group by description

Output;

total   description
2   
2   carlos
7   cristo
2   de
1   do
1   e
1   eduardo
11  joão
1   juazeiro
1   mô
1   nica
3   santo
1   tadeu
    
08.03.2017 / 14:24