How to reduce the search time in a table with more than 200 thousand records?

4

I'm having problems with listing and searching table data with more than 200,000 records.

I read in searches, that for the search system, the ideal would be to do indexes of the type "fulltext" in the fields where it will be will have a search for the records.

So I did it this way:

EDIT : I changed the tables by creating indexes as suggested in the comments.

Ialsoreadthelogcountforpaging,itwouldbebettertousethefunction:

SELECTFOUND_ROWS();

Ihavenotdonetheresearchpart,butI'malreadyslowtolisttherecordsaccordingtotheuser.

Ihaveanadministrativearea,whereseveraluserswillberesponsibleforregistration.Thenthesystemhastodisplayonthescreeneveryrecordthatwasmadebyaparticularuserandwithpagination.

SoIdiditthisway:

publicfunctionAdmListEmpresas($link,$usercod,$pagina=1,$ItensPorPagina=100){//Definirapartirdequantosregistroiniciaabuscadeacordocomapáginaatual.$inicio=($pagina-1)*$ItensPorPagina;$sql='SELECTSQL_NO_CACHESQL_CALC_FOUND_ROWS'empresas'.'nome','usuarios'.'nome'AS'usuarionome','equipes'.'nome'AS'equipenome','setores01'.'titulo'AS'setor01titulo','setores02'.'titulo'AS'setor02titulo','setores03'.'titulo'AS'setor03titulo','empresas'.'telprincipal','empresas'.'email','empresas'.'website','empresas'.'plantype','empresas'.'datavencimento','empresas'.'datacadastro','empresas'.'ativoinfo'FROM''.BDtables['BD_EMPRESAS'].'''empresas'LEFTJOIN''.BDtables['BD_USERS'].''AS'usuarios'ON'empresas'.'useradmcod'='usuarios'.'cod'LEFTJOIN''.BDtables['BD_USERS'].''AS'equipes'ON'empresas'.'codequiperesp'='equipes'.'cod'LEFTJOIN''.BDtables['BD_SETORES'].''AS'setores01'ON'setores01'.'cod'='empresas'.'codsetor1'LEFTJOIN''.BDtables['BD_SETORES'].''AS'setores02'ON'setores02'.'cod'='empresas'.'codsetor2'LEFTJOIN''.BDtables['BD_SETORES'].''AS'setores03'ON'setores03'.'cod'='empresas'.'codsetor3'WHERE('empresas'.'ativouser'="1" AND
            'empresas'.'ativosys' = "1" AND
            'empresas'.'useradmcod' = "'.$usercod.'") OR
            ('empresas'.'codequiperesp' = "'.$usercod.'") OR
            ('empresas'.'usersfunccods' LIKE "%'.$usercod.'%")
           ORDER BY
            'empresas'.'useradmcod' LIMIT '.$inicio.', '.$ItensPorPagina.';';

    $sqlcount = 'SELECT FOUND_ROWS() as "rowcount";';

    $result = $this->conn->query($sql);
    $numrows = $this->conn->query($sqlcount);
    $numrows = $numrows->fetch_array(MYSQLI_ASSOC);

    if ($result->num_rows >= 1) {
        while ($row = $result->fetch_array(MYSQLI_ASSOC)) {
            $listaarray[] = ['empresa' => $row['nome'],
                'usuarionome' => $row['usuarionome'],
                'equipenome' => $row['equipenome'],
                'setor01titulo' => $row['setor01titulo'],
                'setor02titulo' => $row['setor02titulo'],
                'setor03titulo' => $row['setor03titulo'],
                'telprincipal' => $row['telprincipal'],
                'email' => $row['email'],
                'website' => $row['website'],
                'plantype' => $row['plantype'],
                'datavencimento' => $row['datavencimento'],
                'datacadastro' => $row['datacadastro'],
                'ativoinfo' => $row['ativoinfo']
            ];
        }
    }

    if (empty($listaarray)) { $listaarray = false; }

    // Array da Paginação
    $paginacaoarray = $this->Paginacao($link, $numrows['rowcount'], $pagina, $ItensPorPagina);

    return array('paginacao' => $paginacaoarray,'lista' => $listaarray);
}

The BD_EMPRESAS table has the registration of all companies with more than 200 thousand records.

The BD_USERS table can also contain more than 200,000 records, and in the business listing, you may have to search the user IDs to find the name of each.

The BD_SETORES table will have a small table of 300 records, and each company will have up to 3 sectors, where in the search it must get the name of each sector according to the ID.

In WHERE in the query, you get the user code logged in, and search for it between 3 table fields where it can be the administrator, system team, or company employee. Then a search of all companies that that user participates is done, being able to return in average of 5 thousand registries in this list. Where pagination will display 200 to 200 records. (I had no problems with the paging, only with the time to delay the return of the records).

In WHERE , the ativouser and ativosys fields are only for searching the records that are active, listed as a enum field that stores 0 for disabled and 1 for enabled . The q returns only the enabled companies, but a super administrator will also have to list the disabled logs which can further increase the number of registrations displayed.

All this is then stored in an Array where it is sent to another area of the php that displays the results inside a template compiled by a smarty type system.

Anyone who has ever worked with giant tables could help me with a better solution ??

EDIT 01: As suggested by @bfavaretto, below is the result of EXPLAIN :

EDITED 02: I did some tests here, the query the way you are with limit displaying 200 of 200 records, it takes 7 seconds for each query.

I noticed that SQL_CALC_FOUND_ROWS is very heavy on the query, it alone takes more JOINS than it does with the JOINS.

If I leave the JOINS but retire the SQL_CALC_FOUND_ROWS, 7 seconds down for 4 or 5 seconds.

If I remove SQL_CALC_FOUND_ROWS and JOINS the query goes for less than 1 second, which would be perfect.

The problem is that I need the JOINS to display the results on the screens. Now I need to think about a way to replace JOINS and display the same results.

    
asked by anonymous 31.08.2017 / 08:59

5 answers

3

After many hours of testing, and no more time to continue in the same problem, I resolved the situation by modifying the table, query and what will be returned on the screen by PHP.

Before in WHERE I had the ativouser and ativosys fields that were fields that determined when the company was activated or deactivated by the company owner or the system administrator. So I merged the two keeping only ativosys in the table.

I've also changed the usersfunccods field to funcsenha , which instead of fetching the logged-in user from this field, I create a special password for employees.

I deleted two INDEX of type FULLTEXT that I resolved not to put them anymore in the search.

On the list screen I will no longer display the sectors so I deleted all the JOIN that searched the sector names for the codes.

And also I will no longer display the name of the person in charge of the team, displaying only the name of the owner of the company. So I deleted another JOIN and left only the most important.

Then the modified table looks like this:

SomeINDICESwillbeusedinotherqueries,soforthisI'musingthefunction:USEINDEX(codusers,ativadostatus)soIuseonlythemostimportantINDICESforthisquery.

Thenmynewquerylookedlikethis:

$sql='SELECTSQL_CALC_FOUND_ROWS'empresas'.'nome','usuarios'.'nome'AS'usuarionome','empresas'.'plantype','empresas'.'datavencimento','empresas'.'datacadastro','empresas'.'ativoinfo'FROM''.BDtables['BD_EMPRESAS'].'''empresas'USEINDEX(codusers,ativadostatus)LEFTJOIN''.BDtables['BD_USERS'].''AS'usuarios'ON'empresas'.'useradmcod'='usuarios'.'cod'WHERE'empresas'.'ativosys'="1" AND
            ('empresas'.'useradmcod' = "'.$usercod.'" OR 'empresas'.'codequiperesp' = "'.$usercod.'")
           ORDER BY
            'empresas'.'useradmcod' LIMIT '.$inicio.', '.$ItensPorPagina.';';

Then the query time that was previously in almost 7 seconds decreased to less than 1 second. And if I get the ORDER BY it gets even faster by slowing down to less than half a second.

It's not the solution I wanted, but it will work out for a long time.

Thanks for everyone's patience, even though I've made these changes all over, I've learned many things from you that I did not know about. Thanks anyway. Hugs.

    
02.09.2017 / 10:19
6

Then, the result of EXPLAIN shows that the query returns 60k + company table rows. With those JOINs all, each of these lines will be combined with each row obtained from the other tables. It looks like there's your performance bottleneck.

What I would try, looking at this result, is to create some indexes. Note that none were used in the company table (column key came NULL in EXPLAIN ). The first candidate would be an index in the useradmcod column, which is used in both WHERE and ORDER BY . The using filesort that came in the explain is often a cause of slowness, and is related to sorting on columns without index. If this is not enough, also place an index on the column that you compare with LIKE , since this type of comparison is one of the slowest.

Be careful when creating indexes in your tables. Having no index beyond PKs is often a problem, but having too many indexes, as they need to be rebuilt or adjusted in INSERT, UPDATE, and DELETE operations.

    
01.09.2017 / 01:01
2

200,000 are not many records, but you can try breaking the result by page by limiting the logical id of the rows. That is, return from 1 to 10, 11 to 20, 21 to 30 ....

To do this has the rownumber in SQL Server, limit / offset in MySQl.

    
31.08.2017 / 19:07
1

In addition to the suggestions of colleagues, there is software that can assist in the analysis. The NeorProfile is free and helps me to see all the queries that pass into the database (such as a tunneling). It tells you the estimated execution time, the size of the return, and lets you do an EXPLAIN of the commands. This can be a good way to see query performance and test changes.

    
01.09.2017 / 22:01
-1

01.09.2017 / 01:25