Deleting Double Rows from a Data.Frame

4

I have data.Frame with the following behavior:

     values        ind
1  10.82000 2011-01-03
2  11.75000 2011-01-03
3  10.82000 2011-01-03
4  11.75000 2011-01-03
5  10.82000 2011-01-03
6  11.75000 2011-01-03
7  10.84048 2011-01-04
8  11.79000 2011-01-04
9  10.87095 2011-01-05
10 11.84000 2011-01-05
11 10.88928 2011-01-06
12 11.88000 2011-01-06
13 10.92000 2011-01-07
14 12.03000 2011-01-07
15 10.93984 2011-01-10
...
121 11.67614 2011-03-03
122 12.47000 2011-03-03
123 11.67481 2011-03-04
124 12.44000 2011-03-04
125 11.68514 2011-03-09
126 12.44000 2011-03-09
127 11.68514 2011-03-09
128 12.44000 2011-03-09
129 11.68514 2011-03-09
130 12.44000 2011-03-09
131 11.68514 2011-03-09
132 12.44000 2011-03-09
133 11.68514 2011-03-09
134 12.44000 2011-03-09
135 11.67746 2011-03-10

It is as follows, I have to delete lines 1 through 4, leaving lines 5 and 6. I need to delete lines 125 through 133 leaving lines 134 and 135.

Note that it would be interesting to delete the order. As soon as I see that it has repetitions two by two I would like to erase until I leave the last repetition.

Can you create something? Since I'm a beginner in R, I've been trying to create something since yesterday, but it's very difficult.

    
asked by anonymous 12.10.2015 / 16:42

2 answers

2

Another way to do this is by using the duplicated function.

Using the base recreated by Carlos.

df<- read.table(text = "values        ind
1  10.82000 2011-01-03
2  11.75000 2011-01-03
3  10.82000 2011-01-03
4  11.75000 2011-01-03
5  10.82000 2011-01-03
6  11.75000 2011-01-03
7  10.84048 2011-01-04
8  11.79000 2011-01-04
9  10.87095 2011-01-05
10 11.84000 2011-01-05
11 10.88928 2011-01-06
12 11.88000 2011-01-06
13 10.92000 2011-01-07
14 12.03000 2011-01-07
15 10.93984 2011-01-10
121 11.67614 2011-03-03
122 12.47000 2011-03-03
123 11.67481 2011-03-04
124 12.44000 2011-03-04
125 11.68514 2011-03-09
126 12.44000 2011-03-09
127 11.68514 2011-03-09
128 12.44000 2011-03-09
129 11.68514 2011-03-09
130 12.44000 2011-03-09
131 11.68514 2011-03-09
132 12.44000 2011-03-09
133 11.68514 2011-03-09
134 12.44000 2011-03-09
135 11.67746 2011-03-10")

The duplicated function finds rows with duplicate values.

duplicados <- duplicated(df,fromLast = TRUE)

The argument fromLast=TRUE causes the values considered to be duplicates to be the first occurrences.

The command below shows you which lines contain the duplicate values

which(duplicados)

To get the data frame without the duplicate values just make a subset with the command below.

df[!duplicados,]  
    
14.10.2015 / 22:53
6

You can use the unique command, it will leave only the unique observations of your data.frame . For example, rebuilding your database:

df<- read.table(text = "values        ind
1  10.82000 2011-01-03
2  11.75000 2011-01-03
3  10.82000 2011-01-03
4  11.75000 2011-01-03
5  10.82000 2011-01-03
6  11.75000 2011-01-03
7  10.84048 2011-01-04
8  11.79000 2011-01-04
9  10.87095 2011-01-05
10 11.84000 2011-01-05
11 10.88928 2011-01-06
12 11.88000 2011-01-06
13 10.92000 2011-01-07
14 12.03000 2011-01-07
15 10.93984 2011-01-10
121 11.67614 2011-03-03
122 12.47000 2011-03-03
123 11.67481 2011-03-04
124 12.44000 2011-03-04
125 11.68514 2011-03-09
126 12.44000 2011-03-09
127 11.68514 2011-03-09
128 12.44000 2011-03-09
129 11.68514 2011-03-09
130 12.44000 2011-03-09
131 11.68514 2011-03-09
132 12.44000 2011-03-09
133 11.68514 2011-03-09
134 12.44000 2011-03-09
135 11.67746 2011-03-10")

And applying unique .

unique(df)
      values        ind
1   10.82000 2011-01-03
2   11.75000 2011-01-03
7   10.84048 2011-01-04
8   11.79000 2011-01-04
9   10.87095 2011-01-05
10  11.84000 2011-01-05
11  10.88928 2011-01-06
12  11.88000 2011-01-06
13  10.92000 2011-01-07
14  12.03000 2011-01-07
15  10.93984 2011-01-10
121 11.67614 2011-03-03
122 12.47000 2011-03-03
123 11.67481 2011-03-04
124 12.44000 2011-03-04
125 11.68514 2011-03-09
126 12.44000 2011-03-09
135 11.67746 2011-03-10

Note that in this case he left the first observations. If you want to leave the last, as you described in your question, just put fromLast = TRUE .

unique(df, fromLast = TRUE)
      values        ind
5   10.82000 2011-01-03
6   11.75000 2011-01-03
7   10.84048 2011-01-04
8   11.79000 2011-01-04
9   10.87095 2011-01-05
10  11.84000 2011-01-05
11  10.88928 2011-01-06
12  11.88000 2011-01-06
13  10.92000 2011-01-07
14  12.03000 2011-01-07
15  10.93984 2011-01-10
121 11.67614 2011-03-03
122 12.47000 2011-03-03
123 11.67481 2011-03-04
124 12.44000 2011-03-04
133 11.68514 2011-03-09
134 12.44000 2011-03-09
135 11.67746 2011-03-10
    
12.10.2015 / 22:20