What is the best way to insert a large amount of records into the database?

6

Hello, I'm creating an application where I need to constantly query a worksheet, pick up the data from it and insert it into the database (MySql).

  

The point is that these worksheets will always have at least 55,000 (fifty-five thousand) records

What do I have to do in each record:

  • A query to check whether it already exists or not in bd
  • If the same exists I do an UPDATE
  • If I do not exist I make an insert
  • For now I'm just checking whether or not it exists in the bank, and it's already taking forever, follow the code below:

    set_time_limit(0);
    
    include_once '../../db/conexao.php';
    include_once '../../ClassesPhpExcel/PHPExcel/IOFactory.php';
    $objReader = new PHPExcel_Reader_Excel5();
    $caminho = array('C:','Users','brayan','Documents','LN','estrutura_ecn.xls');
    $objPHPExcel = $objReader->load(join(DIRECTORY_SEPARATOR, $caminho));
    $sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);
    
    unset($sheetData['1']);
    
    $count = 0;
    
    foreach ($sheetData as $value){
    
        try {
            $criteria = array(  'select' => 'COUNT(codigo) as codigo',
                'condition'=>'cod_produto ='.$value['A']);
    
            $existe = Connection::findAllByAttributes('produto', $criteria, false);
    
            if($existe[0]->codigo == 0){
    
                //insiro o registro
    
            }else{
    
                // faço o update
    
            }
    
            $count++;
        }catch (PDOException $e){
            echo $e->message."<br/>";
        }
    }
    
      

    I would like to know if you have any other way to do these insertions and updates more efficiently, and that does not take so long ...

    From now on I am grateful ...

        
    asked by anonymous 30.07.2015 / 21:54

    3 answers

    1

    The bulk quoted by cantoni (and 2 or 3 levels in links entering what he put) is a way to do much faster, with a little creativity can do (only an example of idea) two arrays, one with insert and another with update, the insert goes directly and the update goes in a temporary table and then an update with inner join.

    If you want, before trying the bulk, you can see if this solves your problem: you can use prepared statements, which are suitable for repeated use.

    Ex whereas $connection is an object of class PDO :

    $stmtSel = $connection->prepare("SELECT cod FROM tab WHERE cod = :cod");
    $stmtUpd = $connection->prepare("UPDATE tab SET c1 = :c1, c2 = :c2 WHERE cod = :cod");
    $stmtIns = $connection->prepare("INSERT INTO tab (cod, c1, c2) VALUES (:cod, :c1, :c2)");
    
    for ($dados as $linha){
      $filtro = array('cod' => $linha[0]);
      $stmtSel->execute($filtro);
      $existe = ($stmtSel->fetch(PDO::FETCH_ASSOC) !== FALSE);
      $stmtSel->closeCursor();
      if ($existe) {
        $valores = array(
          'c1' => $linha[1],
          'c2' => $linha[2],
          'cod' => $linha[0] 
        );
        $stmtUpd->execute($valores);
      } else {
        $valores = array(
          'cod' => $linha[0],
          'c1' => $linha[1],
          'c2' => $linha[2]
        );
        $stmtIns->execute($valores);
      }
    }
    

    This considers a table with cod , c1 and c2 fields. I can not remember if the order of the query parameters should be followed, so I put the variable $valores for each case of if. The closeCursor is in case the database requires the query to be released before doing another.

    This way of executing the instructions prepared with array is my favorite, some people prefer to use bindParam .

    If your database makes heavy use of indexes, you may not be able to improve much. In% with%% tables%, which accept foreign keys, are usually slower. If you have no problems, switching to MySQL can make them faster, but without foreign keys.

    I worked with a system that had the keys done in a way that only worked properly with Oracle, SQL Server was more or less and any other was very slow. There was no way to improve.

    References in PHP: link link link link link

        
    31.07.2015 / 07:07
    0

    As in a Shipment system, we usually import everything and then compare the data. One suggestion I can give you is, create a temporary table in the bank just to import all the data at once, without comparing anything, then make the comparison. You can create a procedure in the database to run a query similar to this:

    INSERT IGNORE INTO produto (cod_produto, campo_x, campo_y, campo_z)
    SELECT TEMP.cod_produto, TEMP.campo_x, TEMP.campo_y, TEMP.campo_z
    FROM    produto_importado as TEMP
    LEFT OUTER JOIN produto as TABELA_ATUAL 
    ON (TABELA_ATUAL.cod_produto != TEMP.cod_produto)
    WHERE  TABELA_ATUAL.cod_produto != '';  
    

    If you prefer to use Prepared Statement , you can also do that, but then you will have the problem of running PHP, which may crash, 0 (infinity):

    ini_set('max_execution_time', 0);
    
    $mysqli = new mysqli("localhost", "root", "senha", "seu_banco");
    if ($mysqli->connect_errno) {
        echo "Erro de conexão do MySQL: (" . $mysqli->connect_errno . ") " . $mysqli->connect_error;
    }
    
    $file_path = "c:\seu_arquivo.xls";
    
    $SQL = "LOAD DATA INFILE :fileData 
            INSERT INTO TABLE produtos_importado
            FIELDS TERMINATED BY '\t'
            LINES TERMINATED BY '\n'
            (cod_produto, campo_x, campo_y, campo_z)";
    
    if (!($stmt = $mysqli->prepare($SQL))) {
        echo "Prepare falhou: (" . $mysqli->errno . ") " . $mysqli->error;
    }
    if (!$stmt->bind_param("fileData", $file_path)) {
        echo "Parâmetro falhou: (" . $stmt->errno . ") " . $stmt->error;
    }
    if (!$stmt->execute()) {
        echo "Execução falhou: (" . $mysqli->errno . ") " . $mysqli->error;
    }
    

    You can also do something like this for MySQL as well:

    Select the data to check:

    SELECT * INTO OUTFILE "C:\planilha.xls" 
         FIELDS TERMINATED BY '\t' 
         LINES TERMINATED BY '\n' 
    FROM seu_banco.produto_export;

    And you can insert in the temporary table:

    LOAD DATA LOCAL INFILE "C:\planilha.xls"
    INTO TABLE produto_export
    FIELDS TERMINATED BY '\t' 
    LINES TERMINATED BY '\n' 
    IGNORE 1 LINES (cod_produto, campo_x, campo_y, campo_z)
    

    I believe that even excel itself allows you to export / import, take a look here (I'm not sure): link link

        
    31.07.2015 / 13:54
    0

    On the optimization issue (best way to insert a large amount of data into MySQL ) I believe you could change the engine of your database as long as you do not need many controls because I believe that the values you imported will only be handled by code and probably will not be modified constantly.

    I have systems that run with 40gb (others with 96 Gb) of data in the MySQL database, but to achieve 'maximum' optimization on large data issues, I have modified the engine to MyISAM , because it has fewer controls (it does not have rollback e other things) it becomes faster to insert the data in the database by the engine, as well as the query, let's say you are exchanging a database (various foreign key controls etc, InnoDB ) by working almost purely with files.

    Do the test, with me it was the only solution to have a high performance with heavy loads.

        
    14.08.2015 / 22:50