Consider the following situation:
I have a database with two variables. The first is a variable with duplicate values (eg CPFxxx.xxx.xxx-xx appears 14 times, CPFxxx.xxx.xxx-xx appears 18 times, and so on). The second variable is the event occurrence dates (eg 2017-01-18, 2017-01-19 ...) associated with each CPF.
I use the following function to remove duplicate cases:
new<-dataset[!duplicated(dataset[c("CPFs")]),]
And I can remove duplicate lines.
My goal: remove duplicates in CPFs
, but in the other variable ( data
), make the most recent ones (or the oldest ones) remain tied to the CPF. That is, I need to establish an order at the time of the function execution.
So if I have the dates ( 2018-01-20; 2017-02-22
) attached to a CPF, the date bound to it would be: 2017-02-22
.
% dummy to answer the answer:
dataset=structure(list(CPFs = c(1234, 2345, 1234, 2345, 1234, 2345, 1234,
2345), date = c(1998, 1997, 1993, 1992, 1998, 1998, 1992, 1989
)), class = "data.frame", row.names = c(NA, -8L))
Desired result:
CPF date
1234 1992
2345 1989