Example

We use optimx to solve pseudo Poisson maximum likelihood estimation (PPML). If \(y_i\) is a continuous random variable, it obviously does not follow a Poisson distribution, whose support consists of non-negative integers. However, if the conditional mean model \[E[y_i | x_i] = \exp( x_i' \beta),\] is satisfied, we can still use the Poisson regression to obtain a consistent estimator of the parameter \(\beta\) even if \(y_i\) does not follow a conditional Poisson distribution.

If \(Z\) follows a Poisson distribution with mean \(\lambda\), the probability mass function \[ \Pr(Z = k) = \frac{\mathrm{e}^{-\lambda} \lambda^k}{k!}, \mbox{ for }k=0,1,2,\ldots, \] so that \[ \log \Pr(Y = y | x) = -\exp(x'\beta) + y\cdot x'\beta - \log k! \] Since the last term is irrelevant to the parameter, the log-likelihood function is \[ \ell(\beta) = \log \Pr( \mathbf{y} | \mathbf{x};\beta ) = -\sum_{i=1}^n \exp(x_i'\beta) + \sum_{i=1}^n y_i x_i'\beta. \] In addition, it is easy to write the gradient \[ s(\beta) =\frac{\partial \ell(\beta)}{\partial \beta} = -\sum_{i=1}^n \exp(x_i'\beta)x_i + \sum_{i=1}^n y_i x_i. \] and verify that the Hessian \[ H(\beta) = \frac{\partial^2 \ell(\beta)}{\partial \beta \partial \beta'} = -\sum_{i=1}^n \exp(x_i'\beta)x_i x_i' \] is negative definite. Therefore, \(\ell(\beta)\) is strictly concave in \(\beta\).

In operational reserach, the default optimization is minimization, although economics has utility maximization and cost minimization and statistics has maximum likelihood estimation and minimal least squared estimation. To follow this convention in operational research, here we formulate the negative log-likelihood.

# Poisson likelihood
poisson.loglik = function( b ) {
  b = as.matrix( b )
  lambda =  exp( X %*% b )
  ell = -sum( -lambda + y *  log(lambda) )
  return(ell)
}

To implement optimization in R, it is recommended to write the criterion as a function of the parameter. Data can be fed inside or outside of the function. If the data is provided as additional arguments, these arguments must be explicit. (In constrast, in Matlab the parameter must be the sole argument for the function to be optimized, and data can only be injected through a nested function.)

#implement both BFGS and Nelder-Mead for comparison.

library(AER)
library(numDeriv)
library(optimx)

## prepare the data
data("RecreationDemand")
y =  RecreationDemand$trips
X =  with(RecreationDemand, cbind(1, income) )

## estimation
b.init =  c(0,1)  # initial value
b.hat = optimx( b.init, poisson.loglik,
                method = c("BFGS", "Nelder-Mead"),
                 control = list(reltol = 1e-7,
                                abstol = 1e-7)  )
print( b.hat )
##                   p1          p2    value fevals gevals niter convcode  kkt1
## BFGS        1.177408 -0.09994070 261.1141     99     21    NA        0  TRUE
## Nelder-Mead 1.167261 -0.09703975 261.1317     53     NA    NA        0 FALSE
##             kkt2 xtime
## BFGS        TRUE 0.012
## Nelder-Mead TRUE 0.001

Given the conditional mean model, nonlinear least squares (NLS) is also consistent in theory. NLS minimizes \[ \sum_{i=1}^n (y_i - \exp(x_i \beta))^2 \] A natural question is: why do we prefer PPML to NLS? My argument is that, PPML’s optimization for the linear index is globally convex, while NLS is not. It implies that the numerical optimization of PPML is easier and more robust than that of NLS. I leave the derivation of the non-convexity of NLS as an exercise.

In practice no algorithm suits all problems. Simulation, where the true parameter is known, is helpful to check the accuracy of one’s optimization routine before applying to an empirical problem, where the true parameter is unknown. Contour plot is a useful tool to visualize the function surface/manifold in a low dimension.

Example

x.grid = seq(0, 1.8, 0.02)
x.length = length(x.grid)
y.grid = seq(-.5, .2, 0.01)
y.length = length(y.grid)

z.contour = matrix(0, nrow = x.length, ncol = y.length)

for (i in 1:x.length){
  for (j in 1:y.length){
    z.contour[i,j] = poisson.loglik( c( x.grid[i], y.grid[j] )  )
  }
}

contour( x.grid,  y.grid, z.contour, 20)