Let X_1, X_2, ..., X_n be a sequence of numbers. Let X = X_1 + X_2 + ... + X_n. If I divide the value of each X_i by X, the sum X_1 / X + X_2 / X + ... + X_n / X will always have value 1. This is a kind of normalization. If I multiply each side of this equality by 200, I'll get the result I'm looking for.
So just apply this idea to R
to get the desired result. I created a function called amostra
which does this.
amostra <- function(x=1:20, size=20, replace=TRUE, limit=200){
estat <- sample(x, size, replace=replace)
estat <- round(estat/sum(estat)*limit)
if (sum(estat) == limit){
return(estat)
} else {
return(c(estat[1:(size-1)], limit-sum(estat[1:(size-1)])))
}
}
x <- amostra(1:20, 20, limit=200)
x
[1] 4 12 12 13 12 13 2 12 11 2 14 7 12 17 12 3 11 5 11 15
sum(x)
[1] 200
This function has 4 arguments:
x
: the possible values that the sample can take (integers from 1 to 20)
size
: the sample size to be created (default is 20)
replace
: indicates (default is reset)
limit
: the total sum limit (default is 200)
Due to rounding problems, I did a little trick in the algorithm. It draws n elements from the sample and tests whether the sum is equal to limit
. If it is the same, it returns the wanted sample.
If different, the last element is determined by the formula limit-sum(estat[1:(size-1)])
, which is the difference between the target sum and the sum of the first n-1 elements in the sample.
If this were not done, there would be no guarantee of the final sum of the elements being equal to limit
.
The command table
sorts the values and their respective frequencies:
table(x)
x
2 3 4 5 7 11 12 13 14 15 17
2 1 1 1 1 3 6 2 1 1 1
From this, finally, it is possible to calculate the desired statistics, creating a data frame with the answers:
as.data.frame(table(x))
x Freq
1 2 2
2 3 1
3 4 1
4 5 1
5 7 1
6 11 3
7 12 6
8 13 2
9 14 1
10 15 1
11 17 1