Chapter 4: Eigenvalues
A matrix which has an eigenvector has lots of them; if v is an eigenvector, then so is 2v, 3v, etc. On the other hand, a matrix does not have lots of eigenvalues:
If l is an evalue for A, then (lIA)v=0 for some nonzero vector v. So \cal N(lIA) ¹ {0}, so det(lIA) = 0. But det(tIA) = p_{A}(t), thought of as a function of t, is a polynomial of degree n, so has at most n roots. So A has at most n different eigenvalues.
p_{A}(t) = det(tIA) is called the characteristic polynomial of A.
\cal N(lIA) = \cal E_{l} (A) is (ignoring 0) the collection of all evectors for A with evalue l. it is called the eigenspace (or espace) for A corresponding to l. An eigensystem for a (square) matrix A is a list of all of its evalues, along with their corresponding espaces.
One somewhat simple case: if A is (upper or lower) triangular, then the evalues for A are exactly the diagonal entries of A, since tIA is also triangular, so its determinant is the product of its diaginal entries.
We call dim\cal N(lIA) the geometric multiplicity of l, and the number of times l is a root of p_{A}(t) (= number of times (tl) is a factor) = m(l) = the algebraic multiplicity of l .
Some basic facts:
The number of real eigenvalues for an n×n matrix is £ n .
counting multiplicity and complex root the number of eigenvalues =n .
For every evalue l, 1 £ the geometric multiplicity £ m(l)
If the matrix A is symmetric (i.e., A^{T} = A), then every eigenvalue of A is a real number (i.e., every complex root of p_{A}(t) is actually real).
The matrix A = (

 


(

 



 



 



 


where P is the matrix whose colummns are our evectors, and D is a diagonal matrix. Written slightly differently, this says A = PDP^{1} .
We say two matrices A and B are similar if there is an invertible matrix P so that AP = PB . (Equivalently, A = PBP^{1}, or B = P^{1}AP .) A matrix A is diagonalizable if it is similar to a diagonal matrix.
Why do we care? It is easy to check that if A = PBP^{1}, then A^{n} = PB^{n}P^{1} . If B^{n} is easy to calculate (e.g., if B is diagonal; B^{n} is then also diagonal, and its diagonal entries are the powers of B's diagonal entries), this means A^{n} is also fairly easy to calculate!
Also, if A and B are similar, then they have the same characteristic polynomial, so they have the same eigenvalues. They do, however, have different eigenvectors; in fact, if AP = PB and Bv = lv, then A(Pv) = l(Pv), i.e., the evectors of A are P times the evectors of B .
These facts in turn tell us when a matrix can be diagonalized. Since for a diagonal matrix D, each of the standard basis vectors e_{i} is an evector, R^{n} has a basis consisting of evectors for D. If A is similar to D, via P, then each of Pe_{i} = ith column of P is an evector. But since P is invertible, its columns form a basis for R^{n}, as well. SO there is a basis consisting of evectors of A. On the other hand, such a basis guarantees that A is diagonalizable (just run the above argument in reverse...), so we find that:
(The Diagonalization Theorem) An n×n matrix A is diagonalizable if and only if there is basis of R^{n} consisting of eigenvectors of A.
And one way to guarantee that such a basis exists: If A is n×n and has n distinct eigenvalues, then choosing an evector for each will always yield a linear independent coillection of vectors (so, since there are n od them, you get a basis for R^{n}). So:
If A is n×n and has n distinct (real) eigenvalues, A is diagonalizable. In fact, the dimensions of all of the eigenspaces for A (for real eigenvalues l) add up to n if and only if A is diagonalizable.
The main question we wish to study is: what happens to x_{n} as n gets larger and larger? It turns out that this question has a fairly straightforward answer when A is diagonalizable. The answer depends upon the value of the spectral radius of A, r(A), which is defined to be max{l_{i}}, where l_{i} ranges over all of the evalues of A. In essence, it is the size of the `largest' eigenvalue of A. Then we have:
If A is diagonalizable, and x_{0} is an initial state, then
If r(A) < 1, then A^{n}x_{0} goes to 0 as n goes to ¥ .
If r(A) = 1, then for some N, A^{n}x_{0} £ N for all n .
If r(A) = 1, A has evalue 1, and every other evalue has absolute value less than 1, then A^{n} x_{0} has a limit x_{¥} as n®¥, and either Ax_{¥} = 0 or Ax_{¥} = x_{¥} . (Usually, it equals x_{¥} .)
If r(A) > 1, then for nearly every x_{0}, A^{n}x_{0} goes to ¥ as n goes to ¥
A matrix A is called defective if for some evalue l, dim\cal N(lIA) < m(l) . It is fairly easy to show that a matrix is defective if and only if it is not diagonalizable (since the sum of dimensions of espaces will then be less than n).
What do we do if A isn't diagonalizable? Some of the statements (when r(A) = 1) fail to be true. But it turns out that the other two statements are true. This can be shown using Jordan normal forms
The idea is that being diagonalizable says that A is similar to a very simple matrix. It turns out that every matrix is similar to a `kind of' simple matrix. A Jordan block J_{l}(k) is a k×k matrix most of whose entries are 0, except along the diagonal the entires are equal to l, and just above the diagonal they are 1.
Every matrix is similar to a block diagonal matrix, i.e., a matrix whose entries are all 0 outside of a collection of square blocks whose diagonals sit on the main diagonal of A. Each block is a Jordan block, with possibly different l's. This matrix is the Jordan normal form for A. It is unique, up to reordering the blocks on the diagonal.
We can still talk about the spectral radius r(A) of a matrix, even if it isn't diagonalizable. With Jordan normal forms, it is possible to show that the first and last assertions of our theorem hold true, for every matrix A.
Chapter 5: Norms and inner products (again)
The idea of this section is that our familiar notion of length satisfies some fairly natural properties. What we will now do is assert that any function satisfying those properties is something that we can reasonably called a notion of length, or a norm.
A norm on a vector space V is a function ·:V® R which satisfies:
(1) for every v in V, v ³ 0, and v = 0 if and only if v = 0
(2) for every v in V and c in R, c·v = c·v
(3) for every v and w in V, v+w £ v+w (Triangle Inequality)
The pair (V,·) is called a normed linear space.
For example, on R^{n} there are lots of different norms: for every p ³ 1, the function
is a norm, called the pnorm . There is a similar norm for `p = ¥':
Also, for C[a,b] = the cts fcns from [a,b] to R,
is a norm. For many of these, especially the pnorms, proving the triangle inequality takes some work!
With a norm we can talk about convergence: v_{n}® v as n® ¥ means (as with the usual norm) that v_{n}v® 0 as n®¥ .
We can also talk about the ball of radius r around a vector v; it is all of the vectors w with wv < r .
An inner product on a vector space V is a function < ·,· > which takes pairs of vectors and hands you a number, which satisfies:
(1) for every v in V, < v,v > ³ 0, and < v,v > = 0 if and only if v = 0
(2) for every v and w in V, < v,w > = < w,v >
(3) for every v and w in V, and c in R, < cv,w > = c < v,w >
(4) for every u, v, and w in V, < u+v,w > = < u,w > + < v,w >
The pair (V, < ·,· > ) is called an inner product space.
Again, it turns out that there are lots of inner products on R^{n}, besides the usual one. For example, on R^{2}, < v,w > = 2v_{1}w_{1}+5v_{2}w_{2} is an inner product; you can check that the four properties hold. More generally, for any invertible n×n matrix A, the function
is an inner product on R^{n}. On C[a,b],
is an inner product.
It turns out that every inner product on V can be used to define a norm on V, by doing what we know is true for the usual norm and inner product:
Define v = ( < v,v > )^{1/2} . Property (1) for an inner product implies that property (1) for a norm holds; property (3) for an inner product implies property (2) for a norm holds; and finally, property (3) for this norm hold because
This is our (old) Schwartz inequality; but a look at the reasons why this was true for the ordinary inner product will convince you that all we need to know was the properties (1)(4) for the inner product. So our argument there carries over to this more general setting without any change!
So every inner product can be used to define a norm. But not every norm comes from an inner product! There are several properties (for example, u+v^{2}+uv^{2} = 2u^{2}+2v^{2}) which one can show always hold, if your norm comes from an inner product! By evaluating both sides suing specific vectors, however, one can show that such equalities don't hold, showing that the norms in question do not come from inner products!
Just as with the ordinary inner product, we say that two vectors v and w are orthogonal if < v,w > =0.
If the vectors v_{1},¼,v_{n} are all nonzero and all orthogonal to one another, and v is in the span of the v_{i}'s then it is easy to show that
v = [( < v_{1},v > )/( < v_{1},v_{1} > )]v_{1}+¼+[( < v_{n},v > )/( < v_{n},v_{n} > )]v_{n}
In fact,m this is the only way to write v as a linear combination of the v_{i}'s, implying that the v_{i}'s are linearly independent!