Example
We use optimx
to solve pseudo Poisson maximum likelihood
estimation (PPML). If \(y_i\) is a
continuous random variable, it obviously does not follow a Poisson
distribution, whose support consists of non-negative integers. However,
if the conditional mean model \[E[y_i | x_i]
= \exp( x_i' \beta),\] is satisfied, we can still use the
Poisson regression to obtain a consistent estimator of the parameter
\(\beta\) even if \(y_i\) does not follow a conditional Poisson
distribution.
If \(Z\) follows a Poisson distribution with mean \(\lambda\), the probability mass function \[ \Pr(Z = k) = \frac{\mathrm{e}^{-\lambda} \lambda^k}{k!}, \mbox{ for }k=0,1,2,\ldots, \] so that \[ \log \Pr(Y = y | x) = -\exp(x'\beta) + y\cdot x'\beta - \log k! \] Since the last term is irrelevant to the parameter, the log-likelihood function is \[ \ell(\beta) = \log \Pr( \mathbf{y} | \mathbf{x};\beta ) = -\sum_{i=1}^n \exp(x_i'\beta) + \sum_{i=1}^n y_i x_i'\beta. \] In addition, it is easy to write the gradient \[ s(\beta) =\frac{\partial \ell(\beta)}{\partial \beta} = -\sum_{i=1}^n \exp(x_i'\beta)x_i + \sum_{i=1}^n y_i x_i. \] and verify that the Hessian \[ H(\beta) = \frac{\partial^2 \ell(\beta)}{\partial \beta \partial \beta'} = -\sum_{i=1}^n \exp(x_i'\beta)x_i x_i' \] is negative definite. Therefore, \(\ell(\beta)\) is strictly concave in \(\beta\).
In operational reserach, the default optimization is minimization, although economics has utility maximization and cost minimization and statistics has maximum likelihood estimation and minimal least squared estimation. To follow this convention in operational research, here we formulate the negative log-likelihood.
# Poisson likelihood
poisson.loglik = function( b ) {
b = as.matrix( b )
lambda = exp( X %*% b )
ell = -sum( -lambda + y * log(lambda) )
return(ell)
}
To implement optimization in R
, it is recommended to
write the criterion as a function of the parameter. Data can be fed
inside or outside of the function. If the data is provided as additional
arguments, these arguments must be explicit. (In constrast, in
Matlab
the parameter must be the sole argument for the
function to be optimized, and data can only be injected through a nested
function.)
#implement both BFGS and Nelder-Mead for comparison.
library(AER)
library(numDeriv)
library(optimx)
## prepare the data
data("RecreationDemand")
y = RecreationDemand$trips
X = with(RecreationDemand, cbind(1, income) )
## estimation
b.init = c(0,1) # initial value
b.hat = optimx( b.init, poisson.loglik,
method = c("BFGS", "Nelder-Mead"),
control = list(reltol = 1e-7,
abstol = 1e-7) )
print( b.hat )
## p1 p2 value fevals gevals niter convcode kkt1
## BFGS 1.177408 -0.09994070 261.1141 99 21 NA 0 TRUE
## Nelder-Mead 1.167261 -0.09703975 261.1317 53 NA NA 0 FALSE
## kkt2 xtime
## BFGS TRUE 0.012
## Nelder-Mead TRUE 0.001
Given the conditional mean model, nonlinear least squares (NLS) is also consistent in theory. NLS minimizes \[ \sum_{i=1}^n (y_i - \exp(x_i \beta))^2 \] A natural question is: why do we prefer PPML to NLS? My argument is that, PPML’s optimization for the linear index is globally convex, while NLS is not. It implies that the numerical optimization of PPML is easier and more robust than that of NLS. I leave the derivation of the non-convexity of NLS as an exercise.
In practice no algorithm suits all problems. Simulation, where the true parameter is known, is helpful to check the accuracy of one’s optimization routine before applying to an empirical problem, where the true parameter is unknown. Contour plot is a useful tool to visualize the function surface/manifold in a low dimension.
Example
x.grid = seq(0, 1.8, 0.02)
x.length = length(x.grid)
y.grid = seq(-.5, .2, 0.01)
y.length = length(y.grid)
z.contour = matrix(0, nrow = x.length, ncol = y.length)
for (i in 1:x.length){
for (j in 1:y.length){
z.contour[i,j] = poisson.loglik( c( x.grid[i], y.grid[j] ) )
}
}
contour( x.grid, y.grid, z.contour, 20)