How to delete duplicate lines sequentially

2

I need help deleting the row that is sequentially duplicated, but not the duplicate amount of the same record in the table, for example rows 6 and 7, are with status repeats sequentially, then you would have to delete row 7, keeping intact lines 6 and 9. The following is an example.

declare @Pedidos as table (Cliente int, Status int, DataHora datetime )

Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,0 ,'03-11-2017 18:10:56:500')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,2 ,'03-11-2017 18:10:57:410')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,1 ,'03-11-2017 18:10:54:923')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,3 ,'03-11-2017 18:26:50:513')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,4 ,'03-11-2017 18:27:06:143')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 20:00:01:523')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,18,'03-11-2017 20:10:05:563')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 19:55:26:983')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 20:15:06:563')

select ROW_NUMBER() OVER(ORDER BY DataHora ASC) as linha , * from @pedidos order by DataHora asc

Result

linha|cliente|status|datahora
1    |565510 |1     |2017-11-03 18:10:54.923
2    |565510 |0     |2017-11-03 18:10:56.500
3    |565510 |2     |2017-11-03 18:10:57.410
4    |565510 |3     |2017-11-03 18:26:50.513
5    |565510 |4     |2017-11-03 18:27:06.143
6    |565510 |6     |2017-11-03 19:55:26.983
7    |565510 |6     |2017-11-03 20:00:01.523
8    |565510 |18    |2017-11-03 20:10:05.563
9    |565510 |6     |2017-11-03 20:15:06.563
    
asked by anonymous 07.11.2017 / 17:02

2 answers

1

I came up with the following result, where I selected all the records that should be deleted.

with tabela as (
Select ROW_NUMBER() OVER(ORDER BY DataHora ASC) as linha, 
p.Cliente,
p.Status,
p.DataHora
from Pedidos p)
, Lixos as (
select 
tx.* 
from tabela t
inner join tabela tx on tx.linha = t.linha+1 and t.status = tx.status  and t.Cliente = tx.Cliente
)

Select * from Lixos;

And finally we can delete the wrong records:

with tabela as (
Select ROW_NUMBER() OVER(ORDER BY DataHora ASC) as linha, 
p.Cliente,
p.Status,
p.DataHora
from Pedidos p)
, Lixos as (
select 
tx.* 
from tabela t
inner join tabela tx on tx.linha = t.linha+1 and t.status = tx.status  and t.Cliente = tx.Cliente
)

delete p from Pedidos p
          inner join Lixos l
           on l.Cliente = P.Cliente 
           and l.Status = P.Status
           and l.DataHora = P.DataHora;


Select ROW_NUMBER() OVER(ORDER BY DataHora ASC) as linha, 
p.Cliente,
p.Status,
p.DataHora
from Pedidos p ORDER BY DataHora ASC;
  

Results:

Current records:

Recordstobedeleted:

Resultafterdeletion:

  

IputitinSQLFiddle,wherealltherecordsareready,andthenwhichonesshouldbedeleted.Note:Iinsertedothertestlinesinthecaseof3equalstatuses: link

     

Finally, SQLFiddle with the delete command: link

Obs. It would be appropriate for the records to have a primary key.

    
07.11.2017 / 18:40
0

Well ... If I correctly understood the question, and without looking at performance :

Edit large: I noticed by the comments that

  

If there are three equal lines in sequence, only one line will remain.

I did not take this into account in the original code. Big Edits to run.

declare @Pedidos as table (Cliente int, Status int, DataHora datetime )

Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,0 ,'03-11-2017 18:10:56:500')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,2 ,'03-11-2017 18:10:57:410')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,1 ,'03-11-2017 18:10:54:923')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,3 ,'03-11-2017 18:26:50:513')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,4 ,'03-11-2017 18:27:06:143')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 20:00:01:523')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,18,'03-11-2017 20:10:05:563')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 19:55:26:983')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 20:15:06:563');

1) List all requests:

WITH todosOsPedidosOrdenados as 
(
    select ROW_NUMBER() OVER(ORDER BY DataHora ASC) as linha , * from @pedidos
)

2) Count the number of duplicates:

, contagem as 
(
    select COUNT(*) as duplicates, cLIENTE, status  FROM todosOsPedidosOrdenados  GROUP BY Cliente, Status
)
, duplicados as 
(
    select * from contagem where duplicates > 1
)

3) Realize what is the request we will want to keep

, pedidoAManter as
(
    SELECT 
        todosOsPedidosOrdenados.Cliente, 
        todosOsPedidosOrdenados.Status, 
        min(todosOsPedidosOrdenados.DataHora) as dataHoraAManter
    FROM 
        duplicados LEFT JOIN todosOsPedidosOrdenados 
        ON todosOsPedidosOrdenados.Cliente = duplicados.Cliente 
            AND todosOsPedidosOrdenados.Status = duplicados.Status
    group by todosOsPedidosOrdenados.Cliente, todosOsPedidosOrdenados.status
)

4) Find out which order we will remove , and choose only the latest (guaranteed by top 1 and order by)

, pedidoARemover as
(
    SELECT  todosOsPedidosOrdenados.Cliente, 
        todosOsPedidosOrdenados.Status,
        todosOsPedidosOrdenados.DataHora
    FROM todosOsPedidosOrdenados LEFT JOIN pedidoAManter 
    ON  todosOsPedidosOrdenados.Cliente = pedidoAManter.Cliente 
        AND todosOsPedidosOrdenados.Status = pedidoAManter.Status 
    WHERE
        todosOsPedidosOrdenados.DataHora <> pedidoAManter.dataHoraAManter
)

5) Save the result

select * INTO #pedidoARemover_ FROM pedidoARemover;

6) Dealing with delete

delete a
from @Pedidos a
    left Join #pedidoARemover_ b
    on  a.Cliente = b.Cliente
        and a.Status = b.Status
        AND a.DataHora = b.DataHora
WHERE
    b.Cliente IS NOT NULL

7) View the result

SELECT * FROM @Pedidos
order by status asc

8) Throw away the temporary table

DROP TABLE #pedidoARemover_;

Final code (copy-paste for SQL Management Studio should work)

declare @Pedidos as table (Cliente int, Status int, DataHora datetime )

Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,0 ,'03-11-2017 18:10:56:500')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,2 ,'03-11-2017 18:10:57:410')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,1 ,'03-11-2017 18:10:54:923')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,3 ,'03-11-2017 18:26:50:513')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,4 ,'03-11-2017 18:27:06:143')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 20:00:01:523')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,18,'03-11-2017 20:10:05:563')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 19:55:26:983')
Insert into @Pedidos (Cliente,Status,DataHora)Values(565510,6 ,'03-11-2017 20:15:06:563');

SELECT ROW_NUMBER() OVER(ORDER BY DataHora ASC) as linha , * from @pedidos;

WITH todosOsPedidosOrdenados as 
    (
        select ROW_NUMBER() OVER(ORDER BY DataHora ASC) as linha , * from @pedidos
    )
, contagem as 
    (
        select COUNT(*) as duplicates, cLIENTE, status  FROM todosOsPedidosOrdenados  GROUP BY Cliente, Status
    )

, duplicados as 
    (
        select * from contagem where duplicates > 1
    )
, pedidoAManter as
    (
        SELECT 
            todosOsPedidosOrdenados.Cliente, 
            todosOsPedidosOrdenados.Status, 
            min(todosOsPedidosOrdenados.DataHora) as dataHoraAManter
        FROM 
            duplicados LEFT JOIN todosOsPedidosOrdenados 
            ON todosOsPedidosOrdenados.Cliente = duplicados.Cliente 
                AND todosOsPedidosOrdenados.Status = duplicados.Status
        group by todosOsPedidosOrdenados.Cliente, todosOsPedidosOrdenados.status
    )
, pedidoARemover as
    (
        SELECT  todosOsPedidosOrdenados.Cliente, 
            todosOsPedidosOrdenados.Status,
            todosOsPedidosOrdenados.DataHora
        FROM todosOsPedidosOrdenados LEFT JOIN pedidoAManter 
        ON  todosOsPedidosOrdenados.Cliente = pedidoAManter.Cliente 
            AND todosOsPedidosOrdenados.Status = pedidoAManter.Status 
        WHERE
            todosOsPedidosOrdenados.DataHora <> pedidoAManter.dataHoraAManter
    )

select * INTO #pedidoARemover_ FROM pedidoARemover;


delete a
from @Pedidos a
left Join #pedidoARemover_ b
on  a.Cliente = b.Cliente
and a.Status = b.Status
AND a.DataHora = b.DataHora
WHERE
    b.Cliente IS NOT NULL


SELECT * FROM #pedidoARemover_;

SELECT ROW_NUMBER() OVER(ORDER BY DataHora ASC) as linha , * from @pedidos;
DROP TABLE #pedidoARemover_;

Result Screenshot:

    
07.11.2017 / 18:38