I am aware that, for example, runif(1000,0,3)
generates 1000 random values for a uniform distribution for x between 0 and 3. But how to do this for any probability density function? Any clue is grateful!
I am aware that, for example, runif(1000,0,3)
generates 1000 random values for a uniform distribution for x between 0 and 3. But how to do this for any probability density function? Any clue is grateful!
Every statistical distribution can be defined by a cumulative distribution function F (x).
A well-known result states that if you have a random variable U with uniform distribution in the interval (0,1). Then followsthedistributiondefinedbyF.
Theproofofthisresultissimple:
Soifyoucangeneraterandomnumberswithevendistributionandknowthecumulativedistributionfunction,youcanalsogeneraterandomnumbersaccordingtothisdistribution.
NoR
,thisisalreadyprogrammedforseveraldistributions:seethe list that Ailton posted it. But, it's not too complicated to program for another distribution if you can reverse your F (x).
If you only know the cumulative distribution function, you can write as an optimization problem. Define the cumulative distribution (Here is the cumulative distribution of the exponential):
dF <- function(x, lambda = 0.5){
1 - exp(-lambda*x)
}
Draw a random number between 0 and 1 with even distribution. In my case I got:
n <- 0.917915
Next, you have to find x
of the dF
distribution that is closest to n. This can be done as follows:
op <- optim(runif(1), function(x){
abs(dF(x, 0.5) - n)
}, method = "BFGS")
See:
> op$par
[1] 5
And that:
> dF(5, 0.5)
[1] 0.917915
This wikipedia article explains more fully what I said here: link
This procedure can be repeated to obtain a sample with cumulative distribution defined in the dF
object from a random sample of a random variable with uniform distribution. In this case we use the exponential distribution:
dF <- function(x, lambda = 0.5){
1 - exp(-lambda*x)
}
amostra <- runif(1000)
To obtain the sample with exponential distribution it is necessary to find the inverse value of dF
for each random number generated. This can be accomplished as follows:
inverter_distribuicao <- function(x){
m <- nlm(function(y){
abs(dF(y, 0.5) - x)
}, p = 1
)
return(m$estimate)
}
amostra_exp <- sapply(amostra, inverter_distribuicao)
Now see the histogram of the generated sample: