Data and database queries have a very special place in the universe of computers. The popularization of the internet was due to the agility of searches implemented by the Enginner Search Web mechanisms.
Simply searching for a string using sql + mysql is not such an effective alternative to managing a sophisticated search that returns results that are closer to what you want to find. You can for example search for a literal phrase:
busca('palavra');
busca('minha frase de busca deve retornar algo que tenha sentido com o que eu digitei');
Getting to such a result should not be such a simple task. You have to scale the problem to its highest complexity.
I am currently developing a collection system for PDF files. First I thought of making a program that would make this query in the binaries of the PDF file. However I realized that this solution could bring me some extra difficulties and a lot of processing. Hence I thought of removing this data with the PDF Parse (text extractor) to write them in a table. The first way I found to map this object was by using Single Table Inheritance .
Why did I have this choice? I have a collection with more than 300 magazines each with an average of 30 pages. I want to get the simple functionality of finding a string provided by the search in the content column and that returns the other fields. Well that I know how to do. My main concern about all of this is the speed and performance of this system. Since these search requests in the bank will be made very frequently. If there are clearer and more efficient ways to implement this functionality would like to read reviews.
I have the following scenario: tabela = paginas(id, post_id, n_pagina , conteudo)
.
So my goal is to just locate where particular text appears in the posts. And with that capture the other fields where the reference is found along with a portion of string that precedes and succeeds the search term. The post_id key will be a pointer to the publications object of my cms.
$consulta = 'Amazônia';
$results = metodoDeBusca($consulta){
// código
return $resultados;
};
$resultados = [
[
'pagina_id' => 12,
'pagina' => 5,
'post_id' =>1,
'str_before' => 'A viagem será com destino a',
'str_after' => 'Esse dia será muito explendoroso'
],
[
'pagina_id' => 74 ,
'pagina' => 1,
'post_id' => 4,
'str_before' => 'A',
'str_after' => 'Tem uma fauna rica em biodiversidade'
]
];
How could a PHP script return an array with all this information so I could retrieve it in my CMS?
I would like to get mainly some implications that the construction of this class can generate. How can I be improving my architecture from the use of some external resources. Tips on libraries or platforms that interact with wordpress will be very welcome.