next up previous
Next: `O' notation and Rates Up: Fundamentals of Unconstrained Optimization Previous: Digression: Topology in

Fundamental Theorems of Nonlinear Optimization

Theorem 2.2.1   Necessary $ 1^{st}$ order conditions:
If $ f \in \mathcal{C}^1(\Omega)$, $ \Omega$ open, has a local minimizer $ x^* \in \Omega$, then $ f(x^*) = 0$.

Caution: This is necessary, but not sufficient!

Proof. Suppose $ x^*$ is a local minimizer. Then there is a neighborhood $ N_r(x^*)$ in which $ f(x^*) \leq f(x)$ for all $ x \in N_r(x^*)$. Suppose that $ \nabla f(x^*) \neq 0$.
Then, set $ p = - \nabla f(x^*) \neq 0$ so

$\displaystyle \nabla f(x^*)^T p = \nabla f(x^*)^T(-f(x^*)) = - \Vert \nabla f(x^*) \Vert^2 < 0 $

As $ \nabla f(x^*) \neq 0, \Vert \nabla f(x^*)\Vert \neq 0$. Consider the function

$\displaystyle g(t) = \nabla f(x^* + tp)^T p $

where $ p = - \nabla f(x^*)$. Then, $ g(t)$ is a continuous function of one variable $ t$ and $ g(0) = - \Vert\nabla f(x^*)\Vert < 0$. Hence, we have that $ g(t) < 0$ for all $ 0 \leq t \leq T$.
Fix $ T < \frac{r}{\Vert p\Vert}$; if we take $ q = Tp$ and apply the MVT,

$\displaystyle f(x^* + q) - f(x^*) = \nabla f(x^* + sq)^T q $

for some $ 0 < s < 1$. Notice $ sq = sTp$, and $ x^* + sq \in N_r(x^*)$.

Friday, 1-21-2005

Also,

$\displaystyle \nabla f(x^* + sq)^Tq = T (\nabla f(x^* + tp)^T p) < 0. $

Hence,

$\displaystyle f(x^* + q) < f(x^*)$

This contradicts the fact that $ x^*$ is a local minimizer.
$ \qedsymbol$

Theorem 2.2.2   $ 2^{nd}$ order Necessary Conditions:
Let $ f \in \mathcal{C}^(\Omega)$, $ \Omega$ open in $ \mathbb{R}^n$ with local minimizer at $ x^* \in \Omega$. Then, $ \nabla f(x^*) = $ and matrix $ \nabla^2 f = \displaystyle \left[ \frac{\partial^2 f}{\partial x_i \partial x_j} \right]$ is positive semidefinite.

Remarks: This is a necessary condition, and not sufficient. For example, take $ f(x) = x^3$. Then,

$\displaystyle \nabla f(0) = 0 $

$\displaystyle \nabla^2 f(0) = 0 $

This make $ \nabla^2 f$ positive semidefinite at $ x = 0$. However, we know $ x = 0$ to be a saddle point, and not a local minimizer.

Proof. Start of it - go by contradiction as before. Find $ N_r(x^*)$ as before. Use

$\displaystyle g(t) = p^T \nabla^2 f(x^* + tp) p $

To arrive at a contradiction, select $ p$ such that $ p^T \nabla^2 f(x^*) p$ is negative. Now, use Taylor's Theorem for multivariate functions just like we used the MVT in the last proof. $ \qedsymbol$

Theorem 2.2.3   Sufficient Condition for Local Minimizers:
Let $ f \in \mathcal{C}^2(\Omega)$, $ \Omega$ open in $ \mathbb{R}^n$. Suppose that $ x^*$ is a stationary point: $ \nabla f(x^*) = 0$. If $ \nabla^2 f(x^*)$ is positive definite, then $ x^*$ is a local minimizer.

Remark: In fact, the proof, using Taylors Theorem similar to the previous theorem, shows $ f$ has a strict local minimizer at $ x^*$, i.e., $ f(x^*) < f(x)$ for $ x$ in some neighborhood of $ x^*$.

Theorem 2.2.4   Sufficient Condition for Local Minimizer: Let $ f$ be convex, $ f \in \mathcal{C}(\Omega$, $ \Omega$ open and convex
(a)
Any local minimizer is a global minimizer
(b)
If $ f \in \mathcal{C}^1(\Omega)$, then any stationary point is a global minimizer

Proof. Let $ f$ have a local minimizer at $ x^*$. Suppose $ f$ had a smaller value at $ z$. So, $ f(z) < f(x^*)$. Let $ x = \lambda x^* + (1- \lambda)z$, for $ \lambda \in (0,1)$. Then,

$\displaystyle f(x) = f(\lambda x^* + (1-\lambda)z) \leq \lambda f(x^*) + (1 - \lambda) f(z)$

$\displaystyle \leq \lambda f(x^*) (1 - \lambda) f(x^*) = f(x^*),$

which contradicts the fact that $ x^*$ is a local minimizer. $ \qedsymbol$

Monday, 1-24-2005
Assignment: 2.6, 2.9, 2.12, 2.13

Explained the 2nd derivative test from Calc III: Consider discriminant of $ \nabla^2$:

$\displaystyle D = \frac{\partial^2 f}{\partial x_1^2} \frac{\partial^2 f}{\partial x_2^2} - \left( \frac{\partial f}{\partial x_1 \partial x_2}\right)^2. $

If $ D < 0$, have saddle at $ x^*$. If $ D > 0$, have local max / min at $ x^*$. If $ D = 0$, test fails.
Explanation: Let $ x^*$ be a stationary point. Let $ H := \nabla^2 f(x^*)$. Then $ H$ is symmetric real. So, $ H$ is diagonalizable by orthogonal $ P$. In fact, $ P = [p_1, p_2]$, where $ p_1, p_2$ are the eigenvectors of $ H$ (correspond to real eigenvalues $ \lambda_1, \lambda_2$. Thus,

$\displaystyle P^T H P = P^{-1} A P = \begin{bmatrix}\lambda_1 & 0 \\ 0 &\lambda_2 \end{bmatrix} $

So,

$\displaystyle \vert H \vert = \vert P \vert \vert A \vert \vert P^T\vert = \ver...
...\vert \vert A\vert = \vert PP^T\vert \lambda_1 \lambda_2 = \lambda_1 \lambda_2 $

If $ D > 0$, both are either positive or negative. If positive, $ H$ is symmetric positive definite. By considering $ f$ or $ -f$, we can use our sufficient conditions from above to see that there is a corresponding local min / max.

If $ D(x^*) < 0$, then $ H = \nabla^2 f(x^*)$ has a positive and a negative eig (say $ \lambda_1 > 0, \lambda_2 < 0$). Label the corresponding eigenvectors $ p_1$ and $ p_2$. Now, recall Taylor's Theorem:

$\displaystyle f(x^* + p) = f(x^*) + \frac12 p^T \nabla f(x^* + tp) p $

for $ p$ small. Take $ p = p_1$. Then,

$\displaystyle \frac12 p_1^T \nabla f(x^*) p_1 = \frac12 p_1^T \cdot \lambda_1 p_1 = \frac12 \lambda_1 \Vert p_1 \Vert^2 = \frac12 \lambda_1 > 0 $

By continuity arguments, the $ 2^{nd}$ term of Taylor's expansion is positive. Thus, $ f(x + p_1)$ resembles a local min at $ x^*$. Similarly, $ f(x + p_2)$ looks like a maximum. So, we have a saddle point.


next up previous
Next: `O' notation and Rates Up: Fundamentals of Unconstrained Optimization Previous: Digression: Topology in
Brian Bockelman 2005-06-09