Query DELETE too slow to load

1

I'm having a problem with a query, it has the function of deleting some existing duplicate records in a table.

Follows:

DELETE t1 
FROM bid_account t1 
    INNER JOIN bid_account t2 
WHERE t1.id > t2.id AND 
    t1.bidding_price = t2.bidding_price AND 
    t1.auction_id = '" . $obj->auction_id . "' AND 
    t1.bidding_type = 's' AND 
    t1.bid_flag = 'd' AND 
    DATE(t1.bidpack_buy_date) >= DATE(NOW())

The problem is that running it takes too long to complete and does not resolve anything, it does not seem to work.

I made a select with these same conditions, and something absurd happened, he repeated several records, which do not exist, eg for the auction with ID 15 has 100 records, when querying using these conditions it returns more than 80 thousand records .

How can I make this query work properly?

    
asked by anonymous 11.04.2018 / 06:47

2 answers

10

Before answering, let's put some points in is :

  

I made a select with these same conditions, and something absurd happened, he repeated several records, which do not exist, eg for the auction with ID 15 has 100 records, when querying using these conditions it returns more than 80 thousand records .

Nothing absurd happened. This result is a Cartesian product that was caused by the fact that you have declared a JOIN of the bid_account table with itself without any restriction. So assuming you had 10 records in it. The result would be 10 * 10 = 100 records in the initial result. In the sequence, the restrictions you declared in WHERE would apply, which could decrease that number (but not much).

  

When running it, it takes a long time to complete and does not resolve anything, it does not seem to work.

Delay! And for each new record that is inserted into it, the delay increases exponentially. This is certainly for the same reason as above.

TLDR Version;

If I understand correctly, you want to delete records that are completely repeated in bid_account and keep only the one that has the lowest id. In this case, your query should look like this *:

DELETE t1 
FROM bid_account t1 
WHERE EXISTS ( SELECT 1 
               FROM bid_account t2 
               WHERE t2.bidding_price = t1.bidding_price AND
                    t2.auction_id = t1.auction_id AND
                    t2.bidding_type  = t1.bidding_type AND 
                    t2.bid_flag = t1.bid_flag AND 
                    DATE(t2.bidpack_buy_date) = DATE(t1.bidpack_buy_date) AND
                    t2.id < t1.id) AND
    t1.auction_id = '" . $obj->auction_id . "' AND 
    t1.bidding_type = 's' AND 
    t1.bid_flag = 'd' AND 
    DATE(t1.bidpack_buy_date) >= DATE(NOW())

* Note: You should evaluate whether the rules that define your "duplicity" are the same ones that I stated in the subselect. That is where you must identify them. At WHERE you set your exclusion criteria (except what identifies duplicity). So your query gets cleaner and easier to make adjustments like changing the criteria

A little more about what happened

In your query, with what was declared as constraints on where and assuming the table has the following content, we would have the following:

----------------------------------------------------------------------------------------------
| id | bidding_price |        auction_id        | bidding_type | bid_flag | bidpack_buy_date |
----------------------------------------------------------------------------------------------
|  1 |           2.0 | " . $obj->auction_id . " | s            | d        |    05/05/2018    | // Seria eliminado de t1 porque o id não é maior do que nenhum de t2
----------------------------------------------------------------------------------------------
|  2 |           2.0 | " . $obj->auction_id . " | s            | d        |    13/08/2018    |
----------------------------------------------------------------------------------------------
|  3 |           2.0 | " . $obj->auction_id . " | s            | d        |    17/10/2019    | 
----------------------------------------------------------------------------------------------
|  4 |           1.2 | " . $obj->auction_id . " | s            | a        |    22/02/2019    | // Seria eliminado de t1 pelo flag 'a'
----------------------------------------------------------------------------------------------
|  5 |           1.2 | " . $obj->auction_id . " | t            | d        |    30/09/2018    | // Seria eliminado de t1 pelo tipo 't'
----------------------------------------------------------------------------------------------
|  6 |           1.2 | " . $obj->auction_id . " | s            | d        |    01/04/2017    | // Seria eliminado de t1 pela data menor que hoje
----------------------------------------------------------------------------------------------

Then there are 2 records of t1 and all of t2 since at least one record of t2 has id less than each id of t1 and has bidding_price equivalent also. Knowing that the processing would already be relatively slow by the Cartesian product, if we made a Select t1.* ... we would have the following final result (2 * 6 = 12):

----------------------------------------------------------------------------------------------
| id | bidding_price |        auction_id        | bidding_type | bid_flag | bidpack_buy_date |
----------------------------------------------------------------------------------------------
|  2 |           2.0 | " . $obj->auction_id . " | s            | d        |    13/08/2018    |
----------------------------------------------------------------------------------------------
|  3 |           2.0 | " . $obj->auction_id . " | s            | d        |    17/10/2019    | 
----------------------------------------------------------------------------------------------
|  2 |           2.0 | " . $obj->auction_id . " | s            | d        |    13/08/2018    |
----------------------------------------------------------------------------------------------
|  3 |           2.0 | " . $obj->auction_id . " | s            | d        |    17/10/2019    | 
----------------------------------------------------------------------------------------------
|  2 |           2.0 | " . $obj->auction_id . " | s            | d        |    13/08/2018    |
----------------------------------------------------------------------------------------------
|  3 |           2.0 | " . $obj->auction_id . " | s            | d        |    17/10/2019    | 
----------------------------------------------------------------------------------------------
|  2 |           2.0 | " . $obj->auction_id . " | s            | d        |    13/08/2018    |
----------------------------------------------------------------------------------------------
|  3 |           2.0 | " . $obj->auction_id . " | s            | d        |    17/10/2019    | 
----------------------------------------------------------------------------------------------
|  2 |           2.0 | " . $obj->auction_id . " | s            | d        |    13/08/2018    |
----------------------------------------------------------------------------------------------
|  3 |           2.0 | " . $obj->auction_id . " | s            | d        |    17/10/2019    | 
----------------------------------------------------------------------------------------------
|  2 |           2.0 | " . $obj->auction_id . " | s            | d        |    13/08/2018    |
----------------------------------------------------------------------------------------------
|  3 |           2.0 | " . $obj->auction_id . " | s            | d        |    17/10/2019    | 
----------------------------------------------------------------------------------------------

I hope this helps.

    
11.04.2018 / 14:17
1

The best solution I can find is to set a limit to delete. It deletes it by adding, for example, LIMIT 10000 at the end of the query. It deletes the records in the same one more by parcels.

A cycle or a routine to run this query and I think the problem is solved.

    
11.04.2018 / 11:09