How to generate random values for a known distribution?

5

I am aware that, for example, runif(1000,0,3) generates 1000 random values for a uniform distribution for x between 0 and 3. But how to do this for any probability density function? Any clue is grateful!

    
asked by anonymous 09.12.2015 / 22:03

1 answer

5

Every statistical distribution can be defined by a cumulative distribution function F (x).

A well-known result states that if you have a random variable U with uniform distribution in the interval (0,1). Then followsthedistributiondefinedbyF.

Theproofofthisresultissimple:

Soifyoucangeneraterandomnumberswithevendistributionandknowthecumulativedistributionfunction,youcanalsogeneraterandomnumbersaccordingtothisdistribution.

NoR,thisisalreadyprogrammedforseveraldistributions:seethe list that Ailton posted it. But, it's not too complicated to program for another distribution if you can reverse your F (x).

If you only know the cumulative distribution function, you can write as an optimization problem. Define the cumulative distribution (Here is the cumulative distribution of the exponential):

dF <- function(x, lambda = 0.5){
  1 - exp(-lambda*x)
}

Draw a random number between 0 and 1 with even distribution. In my case I got:

n <- 0.917915

Next, you have to find x of the dF distribution that is closest to n. This can be done as follows:

op <- optim(runif(1), function(x){
  abs(dF(x, 0.5) - n)
}, method = "BFGS")

See:

> op$par
[1] 5

And that:

> dF(5, 0.5)
[1] 0.917915

This wikipedia article explains more fully what I said here: link

This procedure can be repeated to obtain a sample with cumulative distribution defined in the dF object from a random sample of a random variable with uniform distribution. In this case we use the exponential distribution:

dF <- function(x, lambda = 0.5){
  1 - exp(-lambda*x)
}
amostra <- runif(1000)

To obtain the sample with exponential distribution it is necessary to find the inverse value of dF for each random number generated. This can be accomplished as follows:

inverter_distribuicao <- function(x){
  m <- nlm(function(y){
      abs(dF(y, 0.5) - x)
    }, p = 1
  )
  return(m$estimate)
}
amostra_exp <- sapply(amostra, inverter_distribuicao)

Now see the histogram of the generated sample:

    
10.12.2015 / 12:18