Can I modify a field before performing a search? (PHP MYSQL)

1
  

"SELECT * FROM table WHERE content LIKE '% $ search%'"

Is it possible to change the values of the 'content' column before the search is performed?

Example:

  

"SELECT * FROM table WHERE strip_tags (content) LIKE '% $ search%'"

Is it possible to do something similar?

I have content that is all encoded with HTML tags, I would like to filter it and leave only the text while searching.

Example Content:

   <p><b>A</b> hist&oacute;ria da vida.</p>

If the $ search is 'history' for example, it will not find any results.

    
asked by anonymous 13.11.2015 / 01:05

3 answers

2

I see three possibilities for your problem.

  • Use the REPLACE function, which would make the search extremely slow, due to the amount of substitutions you have to make per record;
  • Write non-HTML text in another table field, but will consume lots of disk space by writing duplicate data;
  • Use Information Retrieval algorithms.

There are several types of information retrieval algorithms, ranging from simple to very complex like Google's. What I am going to present here is very used in SEO.

But here's an explanation of how to put together a simple algorithm and improve search.

The basic process for retrieving information is to treat the text that will be saved, extracting relevant information that will aid in the query and ranking of the results. Being them the text treatments:

  • Tokenization : Separate all words from the text;
  • Normalization : Make lowercase letters, remove symbols and accents;
  • Stopwords : Removal of irrelevant parables for searches such as "a", "and", "the", "for", "for", etc;
  • Stemming : Convert all words to the grammar root. We will not be working this part because you need a very complex dictionary.

Let's use the example string:

$string = '&Aacute; <strong>Oi</strong> aqui é um teste!!     \n Faça tudo para tirar simbolos e acentos deste teste.';

/**
 * Removendo o HTML.
 */
$clean = html_entity_decode($string);
$clean = strip_tags($clean);

/**
 * Removendo acendos e símbolos.
 */
setlocale(LC_ALL, 'pt_BR.UTF8');

// Remove espaços e quebra de linha.
$clean = trim($string);
$clean = preg_replace('/\s(?=\s)/', '', $clean);
$clean = preg_replace('/[\n\r\t]/', ' ', $clean);

// Remove acentos. Atenção para a função iconv que deve estar instalada.
$clean = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string);

// Remove tudo que não for letra ou número.
$clean = preg_replace("/[^a-zA-Z0-9\/_| -]/", '', $clean);

// transforma tudo em minúsculo.
$string = strtolower(trim($clean, '-'));

/**
 * Removendo stopwords.
 */
// Criando dicionário de stopwords.
$stopwords = array(
    'a', 'agora', 'ainda', 'ali', 'alguem', 'algum', 'alguma', 'algumas',
    'alguns', 'ampla', 'amplas', 'amplo', 'amplos', 'ante', 'antes', 'ao',
    'aos', 'apos', 'aquela', 'aquelas', 'aquele', 'aqueles', 'aqui', 'aquilo',
    'as', 'ate', 'atraves', 'cada', 'coisa', 'coisas', 'com', 'como', 'contra',
    'contudo', 'da', 'daquele', 'daqueles', 'das', 'de', 'dela', 'delas', 'dele',
    'deles', 'depois', 'dessa', 'dessas', 'desse', 'desses', 'desta',
    'destas', 'deste', 'deste', 'destes', 'deve', 'devem', 'devendo',
    'dever', 'devera', 'deverao', 'deveria', 'deveriam', 'devia', 'deviam',
    'disse', 'disso', 'disto', 'dito', 'diz', 'dizem', 'do', 'dos', 'e',
    'ela', 'elas', 'ele', 'eles', 'em', 'enquanto', 'entre', 'era', 'essa',
    'essas', 'esse', 'esses', 'esta', 'esta', 'estamos', 'estao', 'estas',
    'estava', 'estavam', 'estavamos', 'este', 'estes', 'estou', 'eu',
    'fazendo', 'fazer', 'feita', 'feitas', 'feito', 'feitos', 'foi', 'for',
    'foram', 'fosse', 'fossem', 'grande', 'grandes', 'ha', 'isso', 'isto',
    'ja', 'la', 'lhe', 'lhes', 'lo', 'mas', 'me', 'mesma', 'mesmas',
    'mesmo', 'mesmos', 'meu', 'meus', 'minha', 'minhas', 'muita', 'muitas',
    'muito', 'muitos', 'na', 'nao', 'nas', 'nem', 'nenhum', 'nessa',
    'nessas', 'nesta', 'nestas', 'ninguem', 'no', 'nos', 'nos', 'nossa',
    'nossas', 'nosso', 'nossos', 'num', 'numa', 'nunca', 'o', 'os', 'ou',
    'outra', 'outras', 'outro', 'outros', 'para', 'pela', 'pelas', 'pelo',
    'pelos', 'pequena', 'pequenas', 'pequeno', 'pequenos', 'per', 'perante',
    'pode', 'pude', 'podendo', 'poder', 'poderia', 'poderiam', 'podia',
    'podiam', 'pois', 'por', 'porem', 'porque', 'posso', 'pouca', 'poucas',
    'pouco', 'poucos', 'primeiro', 'primeiros', 'propria', 'proprias',
    'proprio', 'proprios', 'quais', 'qual', 'quando', 'quanto', 'quantos',
    'que', 'quem', 'sao', 'se', 'seja', 'sejam', 'sem', 'sempre', 'sendo',
    'sera', 'serao', 'seu', 'seus', 'si', 'sido', 'so', 'sob', 'sobre',
    'sua', 'suas', 'talvez', 'tambem', 'tampouco', 'te', 'tem', 'tendo',
    'tenha', 'ter', 'teu', 'teus', 'ti', 'tido', 'tinha', 'tinham', 'toda',
    'todas', 'todavia', 'todo', 'todos', 'tu', 'tua', 'tuas', 'tudo',
    'ultima', 'ultimas', 'ultimo', 'ultimos', 'um', 'uma', 'umas', 'uns',
    'vendo', 'ver', 'vez', 'vindo', 'vir', 'vos', 'vos'
);
$string = preg_replace('/\b(' . implode('|', $stopwords) . ')\b/', '', $string);

/**
 * Obtendo os Tokens já com a quantidade de repetição das palavras.
 */
$tokens = explode(' ', $string);
//$tokens = array_count_values($string);
// Agrupa todas as palavras repetidas.
// Caso queira partir para um algoritmo mais complexo de indexação, utilize
// a função 'array_count_values' no lugar de 'array_flip', assim vc terá as
// palavras e a quantidade que cada uma repete.
// $tokens = array_count_values($string);
$tokens = array_flip($string);

// Tratando para o armazenamento
$keywords = join(',', $tokens);

Ready now we need to save this in the database for easy searching. To do this create in your table a field called keywords , and before entering the content, run the algorithm above.

In your searches, mount a code like this:

$sql = 'SELECT * FROM textos WHERE keywords LIKE "%:keywords%"';
$keywords = explode(',', $string);
$keywords = join('%', $string);
$stmt->execute( array( ':keywords' => $keywords ) );
$result = $res->fetchAll();

If your table is MyISAM it will look even better.

$sql = 'SELECT * FROM textos WHERE MATCH(keywords) AGAINST(":keywords" IN BOOLEAN MODE)';
$stmt->execute( array( ':keywords' => $keywords ) );
$result = $res->fetchAll();

The use of IN BOOLEAN MODE allows you to specify things like "+ test take symbol" that will find results with mandatory the word test and others if it has. See the links below to learn more.

Now for the existing texts, make a script that covers all the records and do the above treatment.

    
13.11.2015 / 05:47
0

Yes, you can use the REPLACE () function, follows an example:

SELECT * FROM tabela WHERE REPLACE(conteudo, 'procura', 'substitui') LIKE '%$pesquisa%'

Hope it helps

    
13.11.2015 / 01:12
0

For something similar to what you want, MYSQL supports a type of search called FULL TEXT SEARCH that performs a search quite different from the common comparison.

link

It is possible to do some searches with variable content where the result is given by the percentage that the value searched is in the columns searched.

    
13.11.2015 / 03:26