next up previous
Next: Nonlinear Least Squares Up: Math 833 - Notes Previous: Inexact Newton

Finite Differences and Derivative Approximations:

We base our work on the following approximations (basically, Taylor series):

$\displaystyle f(x+h) = f(x) + f'(x) h + f''(x) \frac{h^2}2 + f'''(x) \frac{h^3}6 + \cdots$ (4)

$\displaystyle f(x-h) = f(x) - f'(x) h + f''(x) \frac{h^2}2 - f'''(x) \frac{h^3}6 + \cdots$ (5)

From equation 4, we get the forward difference approximation:

$\displaystyle f'(x) = \frac{f(x+h) - f(x)}{h} + \mathcal{O}(h) $

From equation 5, we get the backward difference approximation:

$\displaystyle f'(x) = \frac{f(x) - f(x-h)}{h} + \mathcal{O}(h) $

If we subtract equation 5 from 4, we get

$\displaystyle f'(x) = \frac{f(x+h) - f(x-h)}{2h} + \mathcal{O}(h^2) $

This is the central difference formula.
Now, 4 plus 5 gives the Second Central Difference Approximation .

$\displaystyle f''(x) = \frac{f(x-h) - 2 f(x) + f(x+h)}{h^2} + \mathcal{O}(h^2) $

Suppose that the error of evaluation of $ f$ is $ \mathcal{O}(\epsilon )$. If $ f$ is not too complicated, then we can take $ \epsilon = \epsilon _{machine}$. So, in the forward difference, we really calculate

$\displaystyle \frac{f(x+h) - f(x) + \mathcal{O}(\epsilon )}{h} = \frac{f(x+h)-f(x)}{h} + \frac{\mathcal{O}(\epsilon )}{h} $

$\displaystyle = f'(x) + \mathcal{O}(h) + \frac{\mathcal{O}(\epsilon )}{h} $

The first error is a mathematical truncation error, while the second is the computational truncation error. We would like to ``balance" the errors -

$\displaystyle \mathcal{O}(h) = \frac{\mathcal{O}(\epsilon )}{h} $

$\displaystyle \mathcal{O}(h^2) = \mathcal{O}(\epsilon ) $

$\displaystyle \mathcal{O}(\frac{h^2}{\epsilon }) = \mathcal{O}(\epsilon ) $

So, perhaps

$\displaystyle \frac{h^2}{\epsilon } = 1 $

$\displaystyle h = \sqrt{\epsilon } $

In other words, it would be pointless to choose $ h$ to be less than $ \sqrt{\epsilon }$. So, if our error is $ 10^{-16}$, there is no point to choose $ h < 10^{-8}$. So, once we do this balancing, we get $ \mathcal{O}(h) = \mathcal{O}(\epsilon ^\frac12)$.
Hence, we can always expect our error to be around $ \mathcal{O}(\epsilon ^\frac12)$.

For central differences, doing this balancing again, we find that we should take

$\displaystyle h = \sqrt[\frac13]{\epsilon }. $

In this case, we get that the total truncation error is $ \mathcal{O}(\epsilon ^\frac23)$ (which is definitely smaller than $ \mathcal{O}(\epsilon ^\frac23)$. If we try to take $ h$ smaller than this, the resulting errors start to get larger!

For second central,

$\displaystyle \frac{\mathcal{O}(\epsilon )}{h^2} = \mathcal{O}(h^2) $

$\displaystyle h^4 = \mathcal{O}(\epsilon ) $

$\displaystyle h = \mathcal{O}(\epsilon ^\frac14) $

The truncation error will be about $ \mathcal{O}(\epsilon ^\frac12)$.

For the second derivative, we could do central differences done twice:

$\displaystyle f''(x) \approx \frac{f(x+2h) - 2f(x) + f(x-2h)}{4h^2}. $

To balance, we get

$\displaystyle h = \mathcal{O}(\epsilon ^\frac29), $

and we get a truncation error of $ \mathcal{O}(\epsilon ^\frac49)$.

Now for $ f(x), x = (x_1, \ldots, x_n) \in \mathbb{R}^n$:

$\displaystyle \nabla f(x) = \begin{bmatrix}\frac{\partial f}{\partial x_1}(x) \...
...)}{h} \\ \vdots \\ \frac{f(x + e_nh) - f(x)}{h} \end{bmatrix} + \mathcal{O}(h) $

Or we could write:

$\displaystyle \nabla f(x) =\frac{1}{2h} \begin{bmatrix}f(x + e_1h) - f(x - e_1h) \\ \vdots \\ f(x + e_nh) - f(x - e_nh) \end{bmatrix} + \mathcal{O}(h^2) $


For the Hessian,

$\displaystyle \nabla ^2 f(x) = [ \frac{\partial^2 f}{\partial x_i \partial x_j}(x) ] $

It just so happens that (from a 2d Taylor expansion):

$\displaystyle \frac{\partial^2 f(x,y)}{\partial x \partial y} = \frac{f(x+h,y+h) - f(x+h,y-h) - f(x-h,y+h) + f(x-h,y-h)}{4h^2} + \mathcal{O}(h^2) $

We already know how to do the second central approximation, so we can approximate the Hessian by filling in the appropriate formulas.

Wednesday, 4-6-2005:

One can show, using the Newton convergence proof and the Banach Lemma:
If matrix $ A$ is invertible and matrix $ \delta A$ is such that $ \Vert A^{-1} \cdot \delta A \Vert < 1$, then $ A + \delta A$ is invertble and

$\displaystyle \Vert(A + \delta A)^{-1} \Vert \leq \frac{\Vert A^{-1}\Vert}{1 - \Vert A^{-1} \delta A \Vert}. $

then one can show the following: If a Newton step

$\displaystyle x_{k+1} = x_k - \nabla ^2 f_k \cdot \nabla f_k $

is solved using $ \nabla ^2 f_k + \Delta_k$ and $ \nabla f_k + \delta_k$ in place of $ \nabla^2 f_k$ and $ \nabla f_k$, then for $ \Delta_k$ sufficiently small (in norm) and $ x_k$ sufficiently close to the local minimizer $ x^*$ at which the sufficiency conditions are satisfied,

$\displaystyle \Vert e_{k+1}\Vert \leq K(\Vert e_k\Vert^2 + \Vert \Delta_k \Vert\Vert e_k\Vert + \Vert\delta_k\Vert) $

for some positive constant $ K$.

Remarks:

  1. If both $ \Delta_k$ and $ \delta_k$ are zero, we just get the Newton estimate of quadratic convergence.
  2. If $ \delta_k \neq 0$ and $ \Vert\delta_k\Vert \not\rightarrow 0$, then there is no guarantee that the error terms will converge! (no way to overcome the term $ \Vert\delta_k\Vert$.
  3. If $ \Vert \Delta_k \Vert \neq 0$, then convergence is slowed down from quadratic to linear or superlinear if $ \Vert \Delta_k\Vert \rightarrow 0$. Thus, what is effected by $ \Delta_k$ is the rate of convergence.
  4. So, an inaccurate Hessian is not as important as an inaccurate gradient.


next up previous
Next: Nonlinear Least Squares Up: Math 833 - Notes Previous: Inexact Newton
Brian Bockelman 2005-06-09