How to insert the values and their frequency into a data.frame, from a set obtained by SAMPLE?

2

First get a sequence of random values

set.seed(100)
estat <- sample(1:20, replace=TRUE)
estat
 [1]  7  6 12  2 10 10 17  8 11  4 13 18  6  8 16 14  5  8  8 14

The idea would be: 1 would be to impose on SAMPLE that the sum of the values obtained is 200? 2 sort the values and their frequencies in table format

The purpose is to set up a statistical table for simple calculation of Mean, Variance, DP, Mean Deviation, CV, Asymmetry and Curtosis.

Thus, all results would be performed and saved in the table.

    
asked by anonymous 10.01.2017 / 02:50

1 answer

1

Let X_1, X_2, ..., X_n be a sequence of numbers. Let X = X_1 + X_2 + ... + X_n. If I divide the value of each X_i by X, the sum X_1 / X + X_2 / X + ... + X_n / X will always have value 1. This is a kind of normalization. If I multiply each side of this equality by 200, I'll get the result I'm looking for.

So just apply this idea to R to get the desired result. I created a function called amostra which does this.

amostra <- function(x=1:20, size=20, replace=TRUE, limit=200){
  estat <- sample(x, size, replace=replace)
  estat <- round(estat/sum(estat)*limit)
  if (sum(estat) == limit){
    return(estat)
  } else {
    return(c(estat[1:(size-1)], limit-sum(estat[1:(size-1)])))
  }
}

x <- amostra(1:20, 20, limit=200)
x
[1]  4 12 12 13 12 13  2 12 11  2 14  7 12 17 12  3 11  5 11 15
sum(x)
[1] 200

This function has 4 arguments:

x : the possible values that the sample can take (integers from 1 to 20)

size : the sample size to be created (default is 20)

replace : indicates (default is reset)

limit : the total sum limit (default is 200)

Due to rounding problems, I did a little trick in the algorithm. It draws n elements from the sample and tests whether the sum is equal to limit . If it is the same, it returns the wanted sample.

If different, the last element is determined by the formula limit-sum(estat[1:(size-1)]) , which is the difference between the target sum and the sum of the first n-1 elements in the sample.

If this were not done, there would be no guarantee of the final sum of the elements being equal to limit .

The command table sorts the values and their respective frequencies:

table(x)
x
 2  3  4  5  7 11 12 13 14 15 17 
 2  1  1  1  1  3  6  2  1  1  1 

From this, finally, it is possible to calculate the desired statistics, creating a data frame with the answers:

as.data.frame(table(x))
    x Freq
1   2    2
2   3    1
3   4    1
4   5    1
5   7    1
6  11    3
7  12    6
8  13    2
9  14    1
10 15    1
11 17    1
    
10.01.2017 / 03:41