We're migrating signatures from an Access database to Mysql, but before migrating I need to treat some parts of the data. I need:
- Remove duplicate id's
- Return the highest value of the date field for these id's
In the librecalc itself it has remove the duplicate id's, however I need to compare the date field of both duplicate records, since I want to use the largest. Then
example:
id, data
2, 05/07/2016
2, 08/12/2018
Notice that they are duplicates, but I need to return the largest date between these 2 fields, which, in the example, would be 08/12/2018
. In the end, there would be only one record with id 2 and with the highest date filtered, 08/12/2018
.
Would you like to do this with SQL? If not,
Would someone like me explain the logic to do this in PHP? I tried to make a script in PHP that does this, but it has not worked for all cases.
My approach in PHP:
<?php
$file = file("access_old2.csv");
$new = fopen("access_formatado.csv", "w");
for ($i=0; $i < count($file); $i++) {
print "---------------------";
$explode = explode("|", $file[$i]);
$explode[10] = strtotime(str_replace("/", "-", $explode[10])); // $explode[10] é o campo data
$explode_next = explode("|", $file[$i+1]);
$explode_next[10] = strtotime(str_replace("/", "-", $explode_next[10])); // $explode_next[10] é o campo data do próximo laço
if ($explode[0] == $explode_next[0]) { // $explode[0] é a id | não grava os dados repetidos e recupera a maior data de assinatura
$date_atual = $explode[10];
$date_next = $explode_next[10];
if ($date_atual !== $date_next) {
echo "diferente";
print("<pre><br>"); //hack
var_dump($date_next);
var_dump($date_atual);
var_dump(date("Y-m-d", max($date_atual, $date_next)));
print("</pre><br>"); //hack
$explode[10] = max($date_atual, $date_next);
$explode_next[10] = max($date_atual, $date_next);
}
continue;
}
$explode[10] = date("Y-m-d", $explode[10]);
$explode_next[10] = date("Y-m-d", $explode_next[10]);
if ($explode[8] == "SP") { // coloca um zero a esquerda dos ceps de SP
$explode[6] = str_pad($explode[6], 8, "0", STR_PAD_LEFT);
}
$explode[11] = "ASSCH";
$explode[6] = substr($explode[6], 0, 5)."-".substr($explode[6], 5, 8);
if ($explode[9] == "") {
$explode[9] = "BR";
}
print("<pre><br>"); //hack
var_dump($explode);
print("</pre><br>"); //hack
fputcsv($new, $explode, "|", '"');
}
fclose($new);
In this script, if the id of the current loop $explode[0]
is equal to the id of the next loop, it should return max($data_atual, $data_next)
. It works for some, but for others it does not. It has to do with the order of the records, but I am already in this code a day ago and did not make progress. (I believe it is an error in the script logic).
TL; DR
- SQL to remove duplicates and leave the largest date among duplicates
- My script does not work for some cases.