You do not have to complicate much to do this, you just have to count the occurrences of each level using table and then remove the rows where the occurrences are smaller than the limit. For example:
tb <- table(dataset$fatores)
rem <- !(dataset$fatores %in% names(tb[tb <= 2]))
dataset[rem, ]
# fatores V2 V3
# 2 5 -0.01619026 0.36458196
# 4 11 0.82122120 -0.11234621
# 5 3 0.59390132 0.88110773
# 6 11 0.91897737 0.39810588
# 7 12 0.78213630 -0.61202639
# 8 8 0.07456498 0.34111969
# 9 8 -1.98935170 -1.12936310
# 11 3 -0.05612874 1.98039990
# 12 3 -0.15579551 -0.36722148
# 14 5 -0.47815006 0.56971963
# 18 12 0.38767161 0.68973936
# 19 5 -0.05380504 0.02800216
# 21 12 -0.41499456 0.18879230
# 22 3 -0.39428995 -1.80495863
# 23 8 -0.05931340 1.46555486
# 26 5 -0.16452360 0.47550953
# 28 5 0.69696338 0.61072635
# 29 11 0.55666320 -0.93409763
# 30 5 -0.68875569 -1.25363340
In this case, all lines of c(1, 2, 4, 6, 7, 9, 10)
factors have been removed.
You can apply the same logic in other ways. Using sapply
to create a vector with the count, and then filter through it:
rem <- sapply(seq_len(nrow(dataset)), function(i) {
sum(dataset$fatores[i] == dataset$fatores)
}) > 2
dataset[rem, ]
Or by using dplyr
, counting line by line how many times that factor occurs and using this as the criteria for the filter.
library(dplyr)
dataset %>% rowwise() %>% filter(sum(fatores == .$fatores) > 2)
A tip: When creating random variables that should not represent numbers, it is better to use letters to make it easier to interpret the results. In your case, it could be letters[1:12]
.