Second Order Methods

“Accelerate when the road is straight, slow down when it is curved” – Yuli

Another very useful information that guides the search of stepsize is the second order derivative (generalized to Hessian matrix in the multi-dimensional setting) at the current location of each step.

Newton’s method: calculate the 2nd order derivative (Hessian) at each step, and use its inverse as step size.

Quasi-Newton methods: calculating Hessian is expensive, how about we approximate it?

GaussNetwon: use the vector square of the 1st order derivative (Jacobian) to approximate Hessian, and inverse the approximated Hessian as a step size.

Levenberg Marquardt: similar to GaussNetwon, but add a regularization term (a multiple of Identity matrix) to the Hessian approximation at each step to stabilize the optimization process.

DFP: Starts the approximation of Hessian Inverse with Identity matrix, and updates it using step and gradient info in each step.

BFGS: Similar as DFP, with additional capability to preserves the positive symmetry of Hessian matrix.

References:

https://www.youtube.com/watch?v=Kln0ZQ7sX8k&list=LL&index=38

Second Order Methods

References:

Optimization Visualized

Error

References:

Templates:

Error