Numerical Optimization

Example

We use optimx to solve pseudo Poisson maximum likelihood estimation (PPML). If \(y_i\) is a continuous random variable, it obviously does not follow a Poisson distribution, whose support consists of non-negative integers. However, if the conditional mean model \[E[y_i | x_i] = \exp( x_i' \beta),\] is satisfied, we can still use the Poisson regression to obtain a consistent estimator of the parameter \(\beta\) even if \(y_i\) does not follow a conditional Poisson distribution.

If \(Z\) follows a Poisson distribution with mean \(\lambda\), the probability mass function \[ \Pr(Z = k) = \frac{\mathrm{e}^{-\lambda} \lambda^k}{k!}, \mbox{ for }k=0,1,2,\ldots, \] so that \[ \log \Pr(Y = y | x) = -\exp(x'\beta) + y\cdot x'\beta - \log k! \] Since the last term is irrelevant to the parameter, the log-likelihood function is \[ \ell(\beta) = \log \Pr( \mathbf{y} | \mathbf{x};\beta ) = -\sum_{i=1}^n \exp(x_i'\beta) + \sum_{i=1}^n y_i x_i'\beta. \] In addition, it is easy to write the gradient \[ s(\beta) =\frac{\partial \ell(\beta)}{\partial \beta} = -\sum_{i=1}^n \exp(x_i'\beta)x_i + \sum_{i=1}^n y_i x_i. \] and verify that the Hessian \[ H(\beta) = \frac{\partial^2 \ell(\beta)}{\partial \beta \partial \beta'} = -\sum_{i=1}^n \exp(x_i'\beta)x_i x_i' \] is negative definite. Therefore, \(\ell(\beta)\) is strictly concave in \(\beta\).

In operational reserach, the default optimization is minimization, although economics has utility maximization and cost minimization and statistics has maximum likelihood estimation and minimal least squared estimation. To follow this convention in operational research, here we formulate the negative log-likelihood.

# Poisson likelihood
poisson.loglik = function( b ) {
  b = as.matrix( b )
  lambda =  exp( X %*% b )
  ell = -sum( -lambda + y *  log(lambda) )
  return(ell)
}

To implement optimization in R, it is recommended to write the criterion as a function of the parameter. Data can be fed inside or outside of the function. If the data is provided as additional arguments, these arguments must be explicit. (In constrast, in Matlab the parameter must be the sole argument for the function to be optimized, and data can only be injected through a nested function.)

#implement both BFGS and Nelder-Mead for comparison.

library(AER)
library(numDeriv)
library(optimx)

## prepare the data
data("RecreationDemand")
y =  RecreationDemand$trips
X =  with(RecreationDemand, cbind(1, income) )

## estimation
b.init =  c(0,1)  # initial value
b.hat = optimx( b.init, poisson.loglik,
                method = c("BFGS", "Nelder-Mead"),
                 control = list(reltol = 1e-7,
                                abstol = 1e-7)  )
print( b.hat )

##                   p1          p2    value fevals gevals niter convcode  kkt1
## BFGS        1.177408 -0.09994070 261.1141     99     21    NA        0  TRUE
## Nelder-Mead 1.167261 -0.09703975 261.1317     53     NA    NA        0 FALSE
##             kkt2 xtime
## BFGS        TRUE 0.012
## Nelder-Mead TRUE 0.001

Given the conditional mean model, nonlinear least squares (NLS) is also consistent in theory. NLS minimizes \[ \sum_{i=1}^n (y_i - \exp(x_i \beta))^2 \] A natural question is: why do we prefer PPML to NLS? My argument is that, PPML’s optimization for the linear index is globally convex, while NLS is not. It implies that the numerical optimization of PPML is easier and more robust than that of NLS. I leave the derivation of the non-convexity of NLS as an exercise.

In practice no algorithm suits all problems. Simulation, where the true parameter is known, is helpful to check the accuracy of one’s optimization routine before applying to an empirical problem, where the true parameter is unknown. Contour plot is a useful tool to visualize the function surface/manifold in a low dimension.

Example

x.grid = seq(0, 1.8, 0.02)
x.length = length(x.grid)
y.grid = seq(-.5, .2, 0.01)
y.length = length(y.grid)

z.contour = matrix(0, nrow = x.length, ncol = y.length)

for (i in 1:x.length){
  for (j in 1:y.length){
    z.contour[i,j] = poisson.loglik( c( x.grid[i], y.grid[j] )  )
  }
}

contour( x.grid,  y.grid, z.contour, 20)

For problems that demand more accuracy, third-party standalone solvers can be invoked via interfaces to R. For example, we can access NLopt through the packages nloptr. However, standalone solvers usually have to be compiled and configured. These steps are often not as straightforward as installing commercial Windows software.

NLopt offers an extensive list of algorithms.

Example

We first carry out the Nelder-Mead algorithm in NLOPT.

library(nloptr)
## optimization with NLoptr

opts = list("algorithm"="NLOPT_LN_NELDERMEAD",
            "xtol_rel"=1.0e-7,
            maxeval = 500
)

res_NM = nloptr( x0=b.init,
                 eval_f=poisson.loglik,
                 opts=opts)
print( res_NM )

## 
## Call:
## nloptr(x0 = b.init, eval_f = poisson.loglik, opts = opts)
## 
## 
## Minimization using NLopt version 2.7.1 
## 
## NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because 
## xtol_rel or xtol_abs (above) was reached. )
## 
## Number of Iterations....: 113 
## Termination conditions:  xtol_rel: 1e-07 maxeval: 500 
## Number of inequality constraints:  0 
## Number of equality constraints:    0 
## Optimal value of objective function:  261.114078295329 
## Optimal value of controls: 1.177397 -0.09993981

# "SLSQP" is indeed the BFGS algorithm in NLopt,
# though "BFGS" doesn't appear in the name
opts = list("algorithm"="NLOPT_LD_SLSQP","xtol_rel"=1.0e-7)

To invoke BFGS in NLOPT, we must code up the gradient \(s(\beta)\), as in the function poisson.log.grad() below.

poisson.loglik.grad = function( b ) {
  b = as.matrix( b )
  lambda =  exp( X %*% b )
  ell = -colSums( -as.vector(lambda) * X + y *  X )
  return(ell)
}

We compare the analytical gradient with the numerical gradient to make sure the function is correct.

# check the numerical gradient and the analytical gradient
b = c(0,.5)
grad(poisson.loglik, b)

## [1]  6542.46 45825.40

poisson.loglik.grad(b)

##            income 
##  6542.46 45825.40

With the function of gradient, we are ready for BFGS.

res_BFGS = nloptr( x0=b.init,
                   eval_f=poisson.loglik,
                   eval_grad_f = poisson.loglik.grad,
                   opts=opts)
print( res_BFGS )

## 
## Call:
## 
## nloptr(x0 = b.init, eval_f = poisson.loglik, eval_grad_f = poisson.loglik.grad, 
##     opts = opts)
## 
## 
## Minimization using NLopt version 2.7.1 
## 
## NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because 
## xtol_rel or xtol_abs (above) was reached. )
## 
## Number of Iterations....: 41 
## Termination conditions:  xtol_rel: 1e-07 
## Number of inequality constraints:  0 
## Number of equality constraints:    0 
## Optimal value of objective function:  261.114078295325 
## Optimal value of controls: 1.177397 -0.09993984

Numerical Optimization

Spring 2023