Giving a "SELECT" before an "INSERT" is a safe way of not having duplicate records?

33

I have a table with a codigo column. I can not set it as primary key. On a system that receives many notifications before it checks with a SELECT to see if that code already exists in the table. If it does not exist it gives INSERT and if it does it UPDATE .

It turns out that this system receives many notifications via POST where many of them happen at the same time for the same code.

Often the system, even giving SELECT before, ends up inserting duplicate codes into the table.

I do not know how MySQL handles this, so I do not know if it's safe to make a% checker% before SELECT . I assume that there is a sort of query queue and that it is being processed one by one.

$rs = $db->query("SELECT COUNT(1) AS tem_codigo FROM tabela WHERE codigo = $codigo");

if ($rs['tem_codigo'] == 0){
    // aqui daria o insert
} else {
    // aqui daria o update
}

How would a code above allow you to enter two records with equal codes? Imagine a request with a notification going on at the same time, at exactly the same time.

What is the best way to get around this without using a primary key?

    
asked by anonymous 16.01.2014 / 16:39

8 answers

23

One solution would be to lock the table with the command LOCK TABLES inside of a transaction. Obviously one can argue that this is bad for performance. And that's right, because you're queuing all the requests in a single queue and not by code, as would be ideal.

A more appropriate solution would be to use the INSERT ON DUPLICATE command. , which allows you to perform the UPDATE in case the registry exists.

Example:

INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;

However, since you can not put a PK in the table, this solution becomes impractical.

Update

An alternative to the second solution that does not involve changing the table would be to create another helper table with only the code in question as PK. The steps of the code would be:

  • Ensure code exists in this alternative table using INSERT DUPLICATE KEY
  • Start a transaction
  • Run a SELECT FOR UPDATE in the alternative table record to do a lock of the record in question
  • Normally insert or update in the final table, because other requests will be blocked
  • Effective the transaction
  • Although this solution seems very complex, in the case of many concurrent operations it will bring the advantage of not blocking the entire table . As I mentioned, locking the table will queue up all requests, while blocking by code will allow different codes to be inserted or updated simultaneously.

    Final Note

    "I can not move" statements are a bit complicated. If someone is trying to stop you from making changes because of some bureaucracy, then as a good software engineer you should argue that improvements and fixes need changes.

        
    16.01.2014 / 16:56
    9

    The scenario of two operations happening exactly in the same timestamp is remote, but it can happen.

    Given that you can not make changes to the database, I suggest you block the table before you perform any MySQL-related operations LOCK TABLES :

    LOCK TABLES 'minhaTabela' WRITE;
    /*!40000 ALTER TABLE 'minhaTabela' DISABLE KEYS */;
    INSERT INTO 'minhaTabela' VALUES ...
    /*!40000 ALTER TABLE 'minhaTabela' ENABLE KEYS */;
    UNLOCK TABLES;
    

    This is not very advantageous in terms of performance , since the operations in the table will be blocked until the previous request has been completed. In short, you'll be creating a single queue of shares of the FIFO (first come, first served) / p>

    The ideal scenario is to apply extra indexes in the table for the purpose of performance , that is, as the query is performed in the codigo field, it should be indexed.

    On the other hand, by what you describe the field código can not be with repeated values, so it should be marked as UNIQUE to avoid duplicates.

    I do not know if when you say that you can not move the table it is because you can not access the management of the table, if this is the case, you can always run a query to change that particular field by assigning it UNIQUE :

    • Mark field as UNIQUE

      ALTER IGNORE TABLE minhaTabela ADD UNIQUE (codigo);
      
    • Mark field as index

      ALTER TABLE minhaTabela ADD INDEX (codigo);
      

    If you can not access the management of the table and you can not execute queries that change the structure of the table, I advise you to talk to who can to do this, or as the data in the table grows, the application will be slower.

        
    16.01.2014 / 17:07
    6

    To think a little bit out of the box:

    Is there any restriction that the code column has to be numeric and sequential? By the way, what type of her? If it is numeric, what is the maximum value it supports?

    Sometimes we forget to ask this question and there may be different solutions.

    Your problem is to avoid collision between two records in this column. One way to do this would be to write data that has a larger distribution between them. For example, instead of getting the "next value", store the hash of multiple identifiers. That is, store in it the value of MD5 (user + timestamp + anything). The probability of collision is much lower, so the stored values would end up being unique without you having changed anything in the database, just the logic of your application.

        
    16.01.2014 / 17:32
    3

    When it comes to multiple connections to the same database, with data being selled and inserted all at once, you really have to worry about transactions and when a certain group of data will or will not exist.

    As already mentioned, using and understanding TRANSACTION ISOLATION LEVEL is very important. In MySQL, InnoDB has some levels of transaction isolation. They are REPEATABLE READ , READ COMMITTED , READ UNCOMMITED , and SERIALIZABLE .

    Transactions

      Repeatable Read - When the isolation level is repeatable read, it means that within the same transaction, any data that is read will be kept the same. That is, no matter how many times a SELECT does, it will return exactly the same thing, even though something else does UPDATE in the same data.

    • Read Committed - This level of isolation means that each 'read' ( SELECT ) looks at everything that already COMMIT ed, ie transactions that are already finished. Thus, you will never have data that has been tallied within a transaction before the transaction is finished.

    • Read Uncommitted - This level is also known, at least here in the US, as 'Dirty Read'. This is because it allows any SELECT to read data that has been placed in the table, but is not committed. That is, if you run a SELECT twice, it is able to give different results because the data is not 'concrete'.

    • Serializable - Almost equal to REPEATABLE READ , serializable is used when you want to be absolutely certain that, within a transaction, the data is always exactly the same. There are other more technical differences as well.

    Source: MySQL 5.0 Reference Manual

        
    06.02.2014 / 17:30
    2

    The right thing would be to use transactions using the correct transaction isolation method to encapsulate transactions.

    For your case, when you start the transaction, you need to see if another transaction has changed the data, so you can put the isolation like this:

    SET TRANSACTION ISOLATION LEVEL READ COMMITED
    

    To start a transaction implicitly on the next query, run SQL:

    SET AUTOCOMMIT=0
    SELECT ...
    
    INSERT ...
    

    or

    UPDATE ...
    

    and finally:

    COMMIT
    SET AUTOCOMMIT=1
    
        
    30.01.2014 / 01:35
    2

    Without changing the base just changing the dynamic query would be something + - like this:

    begin tran with isolation level read commited
    
    update table where id = @iD
    
    if (@@rowcount = 0) -- se não atualizou
    
    insert into table
    
    commit tran
    

    @@ rowcount is a global MS SQL variable that returns how many records were affected (I do not know if MySql has something like this but the logic would be the same)

    The solution would be to leave everything within a transaction with a very restrictive isolation level, possibly locking the table to avoid dirty reads from other selects.

    Dai does insert or update within the same query (same transaction) which is the only way you can ensure that another concurrent command will not read or write to the same table.

        
    06.02.2014 / 18:05
    1

    I have this problem.

    In cases like yours where changing the database is not an option I simulate the sequences of Oracle in MySql. For this I create a tb_seq_sequence table, in this table I store the table name and a numeric value , which would be the current value.

    Then I encapsulate access to this table in a SequenceUtil class, where it is responsible for retrieving the current value and incrementing it.

    There is a possibility that two people access this sequence at the same time, but it is very unlikely. I've always used this kind of solution and until today it has suited me fine.

    If you want something foolproof you can use the @Zuul solution and give a lock on the sequence table. This will certainly not cause you any problems.

        
    16.01.2014 / 18:33
    0

    If you can use SELECT within a transaction, it is safe to do so.

    If you can create a procedure, you can only pass the values to it and the procedure takes care of doing what you have to do, without having the database return values and you need to check them with your php programming

    Simple procedure example with MySQL:

    Creating a table to test and a test mass:

    CREATE TABLE 'cliente' (
      'id' int(11) NOT NULL,
      'nome' varchar(255) NOT NULL
    );
    
    INSERT INTO 'cliente' ('id', 'nome') VALUES
    (1, 'Alexandre'),
    (2, 'Maria'),
    (3, 'Pedro'),
    (4, 'Paulo'),
    (5, 'João'),
    (6, 'Zaqueu'),
    (7, 'Marcelo'),
    (8, 'Marcela');
    

    Procedure that will check with a SELECT if the ID already exists and make an INSERT, and if it does not exist it will do an UPDATE:

    DELIMITER //
    
    CREATE PROCEDURE verify_id_and_insert( IN esteid int(11), IN estenome varchar(255) )
    BEGIN
      DECLARE EXIT HANDLER FOR SQLEXCEPTION ROLLBACK;
      START TRANSACTION;
    
       SELECT id INTO @verificaid FROM cliente WHERE id=esteid;
    
      IF (@verificaid IS NULL) THEN
        INSERT INTO cliente(id,nome) VALUES(esteid, estenome);
        COMMIT;
      ELSE
        UPDATE cliente SET nome=estenome WHERE id=esteid;
        COMMIT;
      END IF;
    END//
    
    DELIMITER ;
    

    Procedure call examples:

    Example that does an INSERT because there is no id = 9:

    CALL verify_id_and_insert(9,'Penélope');
    

    Example that does an UPDATE by exisitr client with id = 1, named Alexander, then it changes:

    CALL verify_id_and_insert(1,'Alexandre Alterado'); 
    

    How will the client table be after the two procedure calls created:

    1   Alexandre Alterado
    2   Maria
    3   Pedro
    4   Paulo
    5   João
    6   Zaqueu
    7   Marcelo
    8   Marcela
    9   Penélope
    

    If you can not create a procedure, you must start the transaction by running a START TRANSACTION query; , then do what you have to do and then a COMMIT; in the end. If you make a mistake and want to revert what you did, you should run a ROLLBACK query;

        
    09.01.2017 / 17:19