I have two bases. One with the lines that would like to take the sample and the other with the sample size with the date. The first one that is the actual database that I need to sample, is exemplified below called "good":
CNPJ data
333333 201601
333333 201612
111111 201612
111111 201610
111111 201607
111111 201611
22222 201605
22222 201606
22222 201610
22222 201509
99999 201605
99999 201612
99999 201611
99999 201601
The second base is below called "tamamostra", it has only the sample size I need for each date, and this sample should be done with CNPJs that do not repeat:
data 201509 201510 201512 201601 201602 201603 201604 201605 201606 201607 201610 201611 201612 Total
ruins 1 1 1 6 4 3 2 4 3 5 5 4 6 45
bons 3 3 3 14 10 7 5 10 7 12 12 10 14 105
Total 4 4 4 20 14 10 7 14 10 17 17 14 20 155
I need to make a "good" size sample for each date without repeating the same CNPJ. That is, for 201509 I need a sample of size 3 with 3 different CNPJs and these CNPJs can not be repeated for the other dates, for 201601 I need a sample of size 14 with CNPJs that do not repeat on the previous date and so on , having in the end a full size sample 105 with unique CNPJs. It is worth mentioning that there are some CNPJs that do not have certain dates.
I tried using for with the sample to make this sample, but since I did not specify that the CNPJ could not be repeated, some CNPJs were repeated:
for(i in 2:14){
bons1[i]<-subset(bons,data==tamamostra[1,i])[sample(nrow(subset(bons,data==tamamostra[1,i])), tamamostra[3,i]), ]
}
How to do this in R? I believe the dplyr package should have some workaround.