Second Order Methods
“Accelerate when the road is straight, slow down when it is curved” – Yuli
Another very useful information that guides the search of stepsize is the second order derivative (generalized to Hessian matrix in the multi-dimensional setting) at the current location of each step.
Newton’s method: calculate the 2nd order derivative (Hessian) at each step, and use its inverse as step size.
Quasi-Newton methods: calculating Hessian is expensive, how about we approximate it?
GaussNetwon: use the vector square of the 1st order derivative (Jacobian) to approximate Hessian, and inverse the approximated Hessian as a step size.
Levenberg Marquardt: similar to GaussNetwon, but add a regularization term (a multiple of Identity matrix) to the Hessian approximation at each step to stabilize the optimization process.
DFP: Starts the approximation of Hessian Inverse with Identity matrix, and updates it using step and gradient info in each step.
BFGS: Similar as DFP, with additional capability to preserves the positive symmetry of Hessian matrix.
References:
https://www.youtube.com/watch?v=Kln0ZQ7sX8k&list=LL&index=38