The final exam is on Wednesday, May 5, from 10:00am to noon. It will cover the
material from the entire course, with a slight emphasis on the material from this sheet.

**Chapter 4:** Eigenvalues

- § 3:
- Gram-Schmidt orthogonalization

The starting point is our old formula for the projection of one vector onto another;

Gram-Scmidt orthogonalization consists of repeatedly using this formula to replace a
collection of vectors with ones that are orthogonal to one, **without changing
their span**. Starting with a collection {v_{1},¼,v_{n}} of vectors in V,

let w_{1} = v_{1}, then let w_{2} = v_{2}-[( < w_{1},v_{2} > )/( < w_{1},w_{1} > )]w_{1} .

Then w_{1} and w_{2} are orthogonal, and since w_{2} is a linear combination of
w_{1} = v_{1} and v_{2}, while the above equation can also be rewritten to give
v_{2} as a linaear combination of w_{1} and w_{2}, the span is unchanged. Continuing,

let w_{3} =
v_{3}-[( < w_{1},v_{3} > )/( < w_{1},w_{1} > )]w_{1}-[( < w_{2},v_{3} > )/( < w_{2},w_{2} > )]w_{2} ; then since w_{1} and w_{2} are
orthogonal, it is not hard to check that w_{3} is orthogonal to **both** of them,
and using the same argument, the span is unchanged (in this case, span{w_{1},w_{2},w_{3}}
=span{w_{1},w_{2},v_{3}}=span{v_{1},v_{2},v_{3}}).

Continuing this, we let
w_{k} =
v_{k}-[( < w_{1},v_{k} > )/( < w_{1},w_{1} > )]w_{1}-¼-[( < w_{k-1},v_{k} > )/( < w_{k-1},w_{k-1} > )]w_{k-1}

Doing this all the way to n will replace v_{1},¼,v_{n} with orthogonal vectors
w_{1},¼,w_{n}, without changing the span.

One thing worth noting is that the if two vectors are orthogonal, then any scalar
multiples of them are, too. This means that if the coordinates of one of our
w_{k} are not to our satisfaction (having an ugly denomenator, perhaps),
we can scale it to change the coordinates to something more pleasant. It is interesting to
note that in so doing, the the later vectors w_{k} are unchanged, since our scalar, can
be pulled out of both the top inner product and the bottom one in later calculations,
and cancelled.

We've seen that if w_{1},¼,w_{n} is an **orthogonal basis** for a subspace W of V,
and w Î W, then
w =
[( < w_{1},w > )/( < w_{1},w_{1} > )]w_{1}+¼+[( < w_{k-1},w > )/( < w_{k-1},w_{k-1} > )]w_{k-1}

On the other hand, if v Î V , we can define the orthogonal projection

of v into W. This vector is in W, and by the Gram-Schmidt argument,
v-proj_{W}(v)
is orthogonal to all of the w_{i}, so it is orthogonal to every
linear combination, i.e., it is orthonal to every vector in W. As a result:

||v-proj_{W}(v)|| £ ||v-w|| for **every** vector w in W. (**)

In the case that the w_{i} are not just orthogonal but also *orthnormal*,
we can simplify this somewhat:

proj_{W}(v) = < w_{1},v > w_{1}+¼+ < w_{n},v > w_{n} = (w_{1}w_{1}^{T}+¼+w_{n}w_{n}^{T})v =
Pv ,

where P = (w_{1}w_{1}^{T}+¼+w_{n}w_{n}^{T}) is the **projection matrix** giving us
orthogonal projection.

This projection matrix has three useful properties: (1) since it has
the property (**), the matrix you get will be the same no matter what orthonormal
basis you will use to build it; (2) it is symmetric (P^{T} = P), and (3) it is
idempotent, meaning P^{2} = P (this is because the orthogonal projection of a vector
in W (e.g., Pv) is the same vector).

If we think of the vectors w_{i} as the columns of a matrix A, then W = \cal C(A),
and so the result (**) is talking about the least squares solution to the equation
Ax = v ! The closest vector Ax to v is then Pv, which, looking at what we did
before, means that P = A(A^{T}A)^{-1}A^{T}. This, however, makes sense even if the
columns of A are **not** orthogonal; if we picked orthonormal ones, and computed P,
we would **still** get the least squares solution, which this formula **also**
gives!

- § 4:
- Orthogonal matrices

An n×n matrix Q is called **orthogonal** if it's columns form an
orthonormal basis for R^{n}. This means
< (ith column of Q),(jth column of Q> = 1 if i = j, 0 otherwise .
This in turn means that Q^{T}Q = I, which in turn means Q^{T} = Q^{-1} !
So an orthogonal matrix is one whose inverse is equal to its own transpose.

A basic fact about an orthogonal matrix Q : for any v,w Î R^{n}, < Qv,Qw > = < v,w > .

A basic fact about a symmetric matrix A : if v_{1} and v_{2} are eigenvectors
for A with different eigenvalues l_{1},l_{2}, then v_{1} and v_{2}
are orthogonal.

This is a main ingredient needed to show: If A is a symmetric n×n matrix,
then A is always diagonalizable; in fact there is an orthonormal basis for
R^{n} consisting of eigenvectors of A. This means that the matrix P, with
AP = PD , whose columns are a basis of eigenvectors for A, can (when A is
symmetric) be chosen to be an **orthogonal** matrix.

Wow, short section.

- § 5:
- Orthogonal complements

Starting with Ax = 0, this can be interpreted as saying that < (every row of A),x > =0, i.e., x is orthogonal to every row of A. This in turn implies that x is orthogonal to every linear combination of rows of A, i.e., x is orthogonal to every vector in the row space of A.

This leads us to introduce a new concept: the **orthogonal complement**
of a subspace W in a vector space V, denoted W^{^}, is the collection
of vectors v with v^w for **every** vector w Î W. It is not hard to
see that these vectors form a subspace of V; the sum of two vectors orthogonal
to w, for example, is orthogonal to w, so the sum of two vectors
in W^{^} is also in W^{^} . The same is true for scalar multiples.

Some basic facts:

For every subspace W, WÇW^{^} = {0} (since anything in both is
orthogonal to *itself*, and only the 0-vector has that property).

Any vector v Î V can be written, uniquely, as v = w+w^{^}, for w Î W and
w^{^} Î W^{^} ; w in fact is proj_{W}(v) .
v-proj_{W}(v) will be in W^{^}, more or less by definition of
proj_{W}(v) . The uniqueness comes from
the result above about intersections.

Even further, a basis for W and a basis for W^{^} together form a basis for
V; this implies that dim(W)+dim(W^{^}) = dim(V) .

Finally, (W^{^})^{^} = W ; this is because W is contained in (W^{^})^{^}
(a vector in W is orthogonal to every vector that is orthogonal to things in W),
and the dimensions of the two spaces are the same.

The importance that this has to systems of equations stems from the following facts:

\cal N(A) = \cal R(A)^{^} (this is what we noted, actually, at the beginning
of this section!)

\cal R(A) = \cal N(A)^{^}

\cal C(A) = \cal N(A^{T})^{^}

So, for example, to compute a basis for W^{^}, start with a basis for W, writing
them as the columns of a matrix A, so W = \cal C(A), then
W^{^} = \cal C(A)^{^} = \cal R(A^{T})^{^} = \cal N(A^{T}), which we know how
to compute a basis for!

File translated from T