Vector gradient for use in the optimum function

3

I have already looked a lot in various forums and many pages on parameter estimation via the optim function of R and found nothing on how to add the gradient function in a script, so that the derivatives are calculated via some function of the R is, _deriv()_ or _deriv3()_ or via package _NumDeriv_ or package _Deriv_ . I need to calculate the gradient in order to try to improve my estimation process and it is impossible to make the derivatives at hand since my likelihood log function is very extensive.

I can make the derivatives in _Maple 16_ , but I'd like to know how to get the gradient in _R_ more automated. The code below is a Birnbaum-Saunders parameter estimation for univariate parameters with two parameters, if I can implement the gradient in this script I could implement in my code more complicated!

When compiling the code below I have sometimes got negative parameters, which is absurd, since _alpha_ , _beta_ and _t_ are positive. I imagine with the gradient this might stop happening.

library(VGAM)
alpha<-2
beta <-1
truevalue <- c(alpha,beta)
n=1000
N=300
m=matrix(ncol=2,nrow=N)

for (i in 1:N){
x <- rnorm(n,mean = 0,sd=sqrt((alpha^2)/4))
t <- beta*(1+2*x^2+2*x*sqrt(1+x^2)) #t possui distribuição birnbaum-Saunders com parâmetros alpha e beta
#t <- rbisa(n, alpha, beta)
#sum(1*(t<0))

#Função Densidade da distribuição birnbaum-saunders

f <-function(theta){
  alpha <- abs(theta[1])
  beta  <- abs(theta[2])
  d <- (1/(2*alpha*beta*sqrt(2*pi)))*(sqrt(beta/t)+(beta/t)^(3/2))*exp(-    (1/(2*alpha^2))*((t/beta)+(beta/t)-2))
  return(d)
}

#Forma 1 da log verossimilhança

# log.ver <- function(theta){
#   alpha <- abs(theta[1])
#   beta  <- abs(theta[2])
#   l <- sum(log(f(theta)))
#   return(l)
# }

#Forma 2 da log verossimilhança

log.ver <- function(theta){
  alpha <- abs(theta[1])
  beta  <- abs(theta[2])
  l <- sum(log((1/(2*alpha*beta*sqrt(2*pi)))*(sqrt(beta/t)+    (beta/t)^(3/2))*exp(-(1/(2*alpha^2))*((t/beta)+(beta/t)-2))))
  return(l)
}

alpha_0 <- 3
beta_0 <- 4
start <- c(alpha_0,beta_0)
opt <- optim(start,log.ver,method="BFGS",hessian = F,control=list(fnscale=-1))
m[i,]=opt$par
}

#Calculating the average of each column of the array of parameters m
mest=colMeans(m)

#calculating the standard deviation of each column of the array of     parameters m
dest=apply(m,2,sd)

#root mean square error in the calculation of each column of the array of     parameters m in relation to the true value of the parameter
eqm=function(x,opt){ 
  N=length(x)
  sqrt(sum(((x-opt)^2))/N)}

#Estimated mean squared error of each parameter 
eqmest=c(eqm(x=m[,1],opt=alpha),
         eqm(x=m[,2],opt=beta))

# Table with the true values of the parameters and the average
# Standard deviation and mean square error of the estimated parameters
tab=data.frame(truevalue,mean=mest,sd=dest,eqm=eqmest)
tab

Thank you very much for the help!

    
asked by anonymous 30.10.2016 / 18:28

2 answers

1

I worked on the original code of your response and identified two issues:

  • By default, the optim function performs minimizations , which can be converted to a maximization problem if you switch the signal of the objective function. So the point that your optimization is returning is a minimum point. Anyway, I saw that you solved this in the second answer.

  • If your maximum likelihood estimator needs to satisfy conditions, you will have to use an algorithm that does the optimization taking these constraints into account. Providing a gradient will not solve this problem directly, it will only make your optimization faster and more accurate. This may eventually cause you to find optimum points only in the region you are interested in, but there are problems where unrestricted optimization even with the gradient will lead to absurd results. In fact, I think it's a good strategy not to implement gradients until you experience problems of convergence or inefficiency in the code. In your case, I do not think this is happening yet.

  • That said, my solution to your problem uses the constructptim function of the stats package, which solves minimization problems using the same optim < in> but ensures optimization will be done with constraints.

    The code below illustrates the use of the constructptim function in your problem, where I make alpha and beta positive through the parameters ui and ci in>.

    library(VGAM)
    alpha<-2
    beta <-1
    truevalue <- c(alpha,beta)
    n=1000
    N=300
    m=matrix(ncol=2,nrow=N)
    
    for (i in 1:N){
      x <- rnorm(n,mean = 0,sd=sqrt((alpha^2)/4))
      t <- beta*(1+2*x^2+2*x*sqrt(1+x^2)) #t possui distribuição birnbaum-Saunders com parâmetros alpha e beta
      #t <- rbisa(n, alpha, beta)
      #sum(1*(t<0))
    
      #Função Densidade da distribuição birnbaum-saunders
    
      f <-function(theta){
        alpha <- abs(theta[1])
        beta  <- abs(theta[2])
        d <- (1/(2*alpha*beta*sqrt(2*pi)))*(sqrt(beta/t)+(beta/t)^(3/2))*exp(-    (1/(2*alpha^2))*((t/beta)+(beta/t)-2))
        return(d)
      }
    
      #Forma 1 da log verossimilhança
    
      # log.ver <- function(theta){
      #   alpha <- abs(theta[1])
      #   beta  <- abs(theta[2])
      #   l <- sum(log(f(theta)))
      #   return(l)
      # }
    
      #Forma 2 da log verossimilhança
    
      log.ver <- function(theta){
        alpha <- abs(theta[1])
        beta  <- abs(theta[2])
        l <- -sum(log((1/(2*alpha*beta*sqrt(2*pi)))*(sqrt(beta/t)+(beta/t)^(3/2))*exp(-(1/(2*alpha^2))*((t/beta)+(beta/t)-2))))
        return(l)
      }
    
      alpha_0 <- 2
      beta_0 <- 3
      start <- c(alpha_0,beta_0)
      opt <- constrOptim(start, log.ver, grad = NULL, ui = diag(2), ci = rep(0,2), method = 'Nelder-Mead')
    
      m[i,]=opt$par
    }
    
    #Calculating the average of each column of the array of parameters m
    mest=colMeans(m)
    
    #calculating the standard deviation of each column of the array of     parameters m
    dest=apply(m,2,sd)
    
    #root mean square error in the calculation of each column of the array of     parameters m in relation to the true value of the parameter
    eqm=function(x,opt){ 
      N=length(x)
      sqrt(sum(((x-opt)^2))/N)}
    
    #Estimated mean squared error of each parameter 
    eqmest=c(eqm(x=m[,1],opt=alpha),
             eqm(x=m[,2],opt=beta))
    
    # Table with the true values of the parameters and the average
    # Standard deviation and mean square error of the estimated parameters
    tab=data.frame(truevalue,mean=mest,sd=dest,eqm=eqmest)
    tab
    
        
    31.10.2016 / 03:25
    1

    I got a solution for the gradient and I added Hessiana, with the three lines of code that are commented in the middle of the script you can check the consistency of the code!

        rm(list=ls())
    cat("4")
    library(VGAM)
    library(Deriv)
    n=100
    N=300
    m=matrix(ncol=2,nrow=N)
    f <- quote((log((1/(2*alpha*beta*sqrt(2*pi)))*(sqrt(beta/t)+(beta/t)^(3/2))*exp(-(1/(2*alpha^2))*((t/beta)+(beta/t)-2)))))
    fd1 <- Deriv(f,"alpha")
    fd2 <- Deriv(f,"beta")
    
    hd11 <- Deriv(fd1,"alpha")
    hd12 <- Deriv(fd1,"beta")
    hd21 <- Deriv(fd2,"alpha")
    hd22 <- Deriv(fd2,"beta")
    
    for (i in 1:N){
      alpha<-2
      beta <-1
    x <- rnorm(n,mean = 0,sd=sqrt((alpha^2)/4))
    t <- beta*(1+2*x^2+2*x*sqrt(1+x^2)) #t possui distribuição birnbaum-Saunders com parâmetros alpha e beta
    
    rm(alpha,beta)
    log.ver <- function(theta){
      alpha <- theta[1]
      beta  <- theta[2]
      l <- sum(log((1/(2*alpha*beta*sqrt(2*pi)))*(sqrt(beta/t)+(beta/t)^(3/2))*exp(-(1/(2*alpha^2))*((t/beta)+(beta/t)-2))))
      return(-l)
    }
    
    grad<-function(theta){
      alpha <- theta[1]
      beta  <- theta[2]
      gd <- cbind(eval(fd1),eval(fd2))
      return(-colSums(gd, na.rm = FALSE))
    }
    
    Hess<-function(theta){
      alpha <- theta[1]
      beta  <- theta[2]
      matrix(c(-sum(eval(hd11)),-sum(eval(hd12)),
             -sum(eval(hd21)),-sum(eval(hd22))),nrow = 2,ncol = 2)
    }
    
    # theta <- c(opt$par[1],opt$par[2])
    # Hess(theta)
    # opt$hessian
    
    alpha_0 <- 10
    beta_0 <- 10
    start <- c(alpha_0,beta_0)
    opt <- optim(start,log.ver,method="BFGS",gr=grad,hessian = T)
    #opt <- optim(start,log.ver,method="BFGS",hessian = F)
    
    m[i,]=opt$par
    }
    alpha<-2
    beta <-1
    #Calculating the average of each column of the array of parameters m
    mest=colMeans(m)
    
    #calculating the standard deviation of each column of the array of parameters m
    dest=apply(m,2,sd)
    
    #root mean square error in the calculation of each column of the array of parameters m in relation to the true value of the parameter
    eqm=function(x,opt){ 
      N=length(x)
      sqrt(sum(((x-opt)^2))/N)}
    
    #Estimated mean squared error of each parameter 
    eqmest=c(eqm(x=m[,1],opt=alpha),
             eqm(x=m[,2],opt=beta))
    
    # Table with the true values of the parameters and the average
    # Standard deviation and mean square error of the estimated parameters
    #m
    truevalue <- c(alpha,beta)
    tab=data.frame(truevalue,mean=mest,sd=dest,eqm=eqmest)
    tab
    
        
    31.10.2016 / 02:54