Remove duplicate data and return the highest value date field

1

We're migrating signatures from an Access database to Mysql, but before migrating I need to treat some parts of the data. I need:

  • Remove duplicate id's
  • Return the highest value of the date field for these id's

In the librecalc itself it has remove the duplicate id's, however I need to compare the date field of both duplicate records, since I want to use the largest. Then

example:

id, data
2,  05/07/2016
2,  08/12/2018

Notice that they are duplicates, but I need to return the largest date between these 2 fields, which, in the example, would be 08/12/2018 . In the end, there would be only one record with id 2 and with the highest date filtered, 08/12/2018 .

Would you like to do this with SQL? If not,

Would someone like me explain the logic to do this in PHP? I tried to make a script in PHP that does this, but it has not worked for all cases.

My approach in PHP:

<?php

$file = file("access_old2.csv");
$new = fopen("access_formatado.csv", "w");

for ($i=0; $i < count($file); $i++) { 
print "---------------------";

$explode = explode("|", $file[$i]);
$explode[10] = strtotime(str_replace("/", "-", $explode[10])); // $explode[10] é o campo data
$explode_next = explode("|", $file[$i+1]);
$explode_next[10] = strtotime(str_replace("/", "-", $explode_next[10])); // $explode_next[10] é o campo data do próximo laço

if ($explode[0] == $explode_next[0]) { // $explode[0] é a id | não grava os dados repetidos e recupera a maior data de assinatura

    $date_atual = $explode[10];
    $date_next = $explode_next[10];

    if ($date_atual !== $date_next) {
        echo "diferente";

        print("<pre><br>"); //hack

        var_dump($date_next);
        var_dump($date_atual);
        var_dump(date("Y-m-d", max($date_atual, $date_next)));

        print("</pre><br>"); //hack

        $explode[10] = max($date_atual, $date_next);
        $explode_next[10] = max($date_atual, $date_next);
    }
    continue;
}

$explode[10] = date("Y-m-d", $explode[10]);
$explode_next[10] = date("Y-m-d", $explode_next[10]);

if ($explode[8] == "SP") { // coloca um zero a esquerda dos ceps de SP
    $explode[6] = str_pad($explode[6], 8, "0", STR_PAD_LEFT);
}

$explode[11] = "ASSCH";
$explode[6] = substr($explode[6], 0, 5)."-".substr($explode[6], 5, 8);

if ($explode[9] == "") {
    $explode[9] = "BR";
}

print("<pre><br>"); //hack
var_dump($explode);
print("</pre><br>"); //hack

fputcsv($new, $explode, "|", '"');


}


fclose($new);

In this script, if the id of the current loop $explode[0] is equal to the id of the next loop, it should return max($data_atual, $data_next) . It works for some, but for others it does not. It has to do with the order of the records, but I am already in this code a day ago and did not make progress. (I believe it is an error in the script logic).

TL; DR

  • SQL to remove duplicates and leave the largest date among duplicates
  • My script does not work for some cases.
asked by anonymous 05.07.2016 / 21:03

2 answers

2

If the column is set to date type, you can do this directly in MySQL (it should work in Access too):

SELECT id, MAX(some_date) 
FROM old_table
GROUP BY id;

This query will group the largest date by id. Then you could move to a new table, already defined with primary key :

INSERT INTO new_table
SELECT id, MAX(some_date) 
FROM old_table
GROUP BY id;

See working in SQL Fiddle .

    
06.07.2016 / 04:10
0

I think this might help:

$formato = 'd/m/Y'; // define o formato para dd/mm/yyyy
$data1 = DateTime::createFromFormat($formato, '05/07/2016'); // define data 1
$data2 = DateTime::createFromFormat($formato, '08/12/2018'); // define data 2

if($data1 > $data2){  // checa se data1 é maior que data2
    echo '$data1 é maior: ';
    echo date_format($data1, $formato) ;
}else{
    echo '$data2 é maior: ';
    echo date_format($data2, $formato) ;
}
    
06.07.2016 / 01:17