How to manipulate large volumes of data in a database?

7

I have a process that I need to implement on my system which, at a certain point, will check a relatively large number of data and, if it does not find it, should save to the MySQL database.

I'm not finding a way to do this without overloading the server. I'm developing in PHP and so far the only way I thought it was to loop through each "line", checking if there is, if it does not exist, I get the ID, and I write to another table. If it already exists, I only retrieve the ID and saved it in another table.

This way you would have to do 1 query + 1 record + 1 read (retrieve the newly saved ID) + 1 record for each record. If we think that it will be common for each operation to do this on average 3000 times, it becomes unfeasible. Also it will be common to also have more than one user doing this same process at the same time.

What would be the most correct way to proceed in this case?

[Additional Information]

It is a product movement system. Each product has a "serial". So I need to check each serial in table "A" and if it does not exist, I make the register, I get the ID and haul in table "B". If the serial is already registered in the "A" table, I just drop the serial from it in table "B".

    
asked by anonymous 30.05.2015 / 22:52

3 answers

4

I think you should not, and should not, be writing the SERIAL in the Input and Output tables, in this case to do the insert you would have to do using a Stored Procedure because you would be passing information to the two-table database.

So the solution I propose:

Tables:

CREATE TABLE 'testes'.'produto' (
  'id' INT NOT NULL AUTO_INCREMENT,
  'descricao' VARCHAR(45) NULL,
  'serial' BIGINT NOT NULL,
  PRIMARY KEY ('id'),
  UNIQUE INDEX 'serial_UNIQUE' ('serial' ASC));

CREATE TABLE 'testes'.'entrada' (
  'identrada' INT NOT NULL AUTO_INCREMENT,
  'id_produto' INT NULL,
  'data' TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY ('identrada'),
  INDEX 'fk_produto_idx' ('id_produto' ASC),
  CONSTRAINT 'fk_produto'
    FOREIGN KEY ('id_produto')
    REFERENCES 'testes'.'produto' ('id')
    ON DELETE SET NULL
    ON UPDATE NO ACTION);

Stored Procedure:

DELIMITER $$

CREATE DEFINER='root'@'localhost' PROCEDURE 'grava_item_entrada'(in serial_produto bigint(20))
BEGIN
    DECLARE ID_PROD INT DEFAULT NULL;

    SET ID_PROD := (SELECT P.ID FROM PRODUTO P WHERE P.SERIAL = serial_produto);

    IF (ID_PROD IS NULL ) THEN
        INSERT INTO PRODUTO (SERIAL) VALUE (serial_produto);
        SET ID_PROD := LAST_INSERT_ID();
    END IF;

    INSERT INTO ENTRADA (ID_PRODUTO) VALUES (ID_PROD);  
END

Now, just call the procedure from PHP that should work, for example.

$mysqli->query("CALL grava_item_entrada(@serial)");
    
03.09.2015 / 14:07
3
Its problem seems to be non-scalable, that is, if all the records in the table have different keys, there is no way to optimize this search, since this is already the competence of the database and its decision algorithms, search, insertion and etc.

Unless you do partitioning of your table. This can be a good exit when you can not create more specific and optimizable indexes, which seems to me to be your case. With partitioning the tables will be smaller, and will always mirror each other which makes it easy to find the data that is searched for in a large volume of data.

    
17.08.2015 / 19:38
2

I'll give you a more elegant, simple and easy solution by first creating the tables:

CREATE TABLE 'tab_produto' (
  'id_produto' int(11) NOT NULL AUTO_INCREMENT,
  'serial' varchar(45) DEFAULT NULL,
  PRIMARY KEY ('id_produto'),
  UNIQUE KEY 'serial_UNIQUE' ('serial')
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;

  CREATE TABLE 'tab_entrada' (
  'id_entrada' int(11) NOT NULL AUTO_INCREMENT,
  'id_produto' int(11) DEFAULT NULL,
  'dt_emissao' date DEFAULT NULL,
  PRIMARY KEY ('id_entrada'),
  KEY 'fk_tab_produto_idx' ('id_produto'),
  CONSTRAINT 'fk_produto' FOREIGN KEY ('id_produto') REFERENCES 'tab_produto' ('id_produto') ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;  

Well the important thing here is UNIQUE in the serial field, this will ensure that you will not have duplicate products. The relationship issue I do not think is necessary to explain why it is believed that you already have the understanding of why it is necessary.

Now how to do the inserts, see how simple it is:

insert ignore tab_produto (serial) values(123);

insert into tab_entrada (id_produto,dt_emissao) values ( (select id_produto from tab_produto where serial=123), curdate() );

Assuming that you wanted to insert serial 123, you first do the INSERT IGNORE if it already exists, it will simply ignore it or it will not generate an error and will not duplicate because the field is UNIQUE

Then the insert in the input tab (the same would be for the output tab) makes the INSERT already referring to the same serial that was passed in the first insert, the balcony here is that you INSERT with SELECT at once

I'll leave a SQLFiddle , notice that until you "force" a duplicate insert into the product_product for the test to be near of what can happen in your environment, and in the sequence I posted 2 records on the entry tab just for demonstration purposes.

You have to think of a solution where the client does only 1 insert on the input tab and with a trigger on before you would insert it into the product_tab.

This saves time against the client-> mysql, but I believe the cost for the DB. would be the same.

    
04.09.2015 / 19:18