(Technically, everything covered on the first exam, plus)
Chapter 13: Differentiation
We write [(¶)/(¶x)]([(¶f)/(¶x)]) = [(¶2)/(¶x2)](f) = [(¶2 f)/(¶x2)]fxx = (fx)x , and similarly for y, and
[(¶)/(¶y)]([(¶f)/(¶x)]) = [(¶2 f)/(¶y¶x)][(¶2)/(¶y¶x)](f) = (fx)y = fxy , and similarly for [(¶2)/(¶x¶y)](these are called the mixed partial derivatives.
This leads to the slightly confusing convention that [(¶2 f)/(¶x¶y)]fyx while [(¶2 f)/(¶y¶x)]fxy, but as luck would have it:
Fact: If fxy and fyx are both continuous, then they are equal [[Mixed partials are equal.]] So while at first glance a function of two variables would seem to have four second partials, it `really' has only three. (Similarly, a function of three variables `really' has six second partials, and not nine.)
In one-variable calculus, the second derivative measures concavity, or the rate at which the graph of f bends. The second partials fxx and fyy measure the bending of the graph of f in the x- and y-directions, while fxy measures the rate at which the x-slope of f changes as you move in the y-direction, i.e., the amount that the graph is twisting as you walk in the y direction. The statement that fxyfyx then says that the amount of twisting in the y-direction is always the same as the amount of twisting in the x-direction, at any point, which is by no means obvious!
pn(x) = f(a) + f¢(a)(x-a) + ¼ + [(f(n)(a))/n!](x-a)n
Functions of two variables are not much different; we just replace the word `derivative' with `partial derivative'! So for example, the degree one Taylor polynomial is
L(x,y) = f(a,b) + fx(a,b)(x-a) + fy(a,b)(y-b)
which is nothing more than our old formula for the tangent plane to the graph of f at the point (a,b,f(a,b)) .
We will soon need the second degree version (which for simplicity we will write for the point (a,b) = (0,0)) :
Q(x,y) = L(x,y) + [(fxx(0,0))/2]x2 + fxy(0,0)xy+ [(fyy(0,0))/2]y2 = L(x,y) + Ax2+Bxy+Cy2
As before, L and Q are the `best' linear and quadratic approximations to f, near the point (a,b), in a sense that can be made precise; basically, L-f shrinks to 0 like a quadratic, near (a,b), while Q-f shrinks like a cubic (which shrinks to 0 faster, when your input is small).
A function of several variables is differentiable at a point if the tangent plane to the graph of f at that point makes a good approximation to the function, near the point of tangency. In the words of the previous paragraph, L-f shrinks to 0 faster than a linear function would.
The basic fact, that we keep using, is that if the partial derivatives of f don't just exist at a point, but are also continuous near the point, then f is differentiable in this more precise sense.
Chapter 14: Optimization: Local and Global Extrema
The basic idea is that at a max or min for f, then, thinking of f just as a function of x, we would still think we were at a max or min, so the derivative, as a function of x, will be 0 (if it is defined). In other words, fx similarly, we would find that fy, as well. following one-variable theory, therefore, we say that
A point (a,b) is a critical point for the function f if fx(a,b) and fy(a,b) are each either 0 or undefined. (A similar notion would hold for functions of more than two variables.)
Just as with the one-variable theory, then, if we wish to find the max or min of a function, what we first do is find the critical points; if thew function has a max or min, it will occur at a critical point.
and just as before, we have a `Second Derivative Test' for figuring out the difference between a (local) max and a (local) min (or neither, which we will call a saddle point). The point is that at a critical point, f looks like its second degree Taylor polynomial, which (simplifying things somewhat) is described as Q(x,y) = Ax2+Bxy+Cy2 (since the first derivatives are 0). The actual shape of the graph of Q is basically described by one number, called the descriminant, which (in terms of partial derivatives) is given by
D = fxxfyy-(fxy)2
(Basically, Q looks like one of x2+y2 (local min), -x2-y2 (local max), or x2-y2 (saddle), and D tells you if the signs are the same (D > 0) or opposite (D < 0) . More specifically, if, at a critical point (a,b),
D > 0 and fxx > 0 then (a,b) is a local min; if
D > 0 and fxx < 0 then (a,b) is a local max; and if
D < 0, then (a,b) is a saddle point
(We get no information if D = 0.)
For two variables, we do (essentially) exactly the same thing:
(1) Identify the domain
(2) Find critical points in the interior of the domain
(3) Identify the (potential) max and min values on the boundary of the domain (more about this later!)
(4) Plug the critical points, and your potential points on the boundary
(5) biggest is max, smnallewst is min
This works if the domain is closed and bounded (think, e.g., of a closed interval in the x direction and a closd intervasl in the y direction, or the inside of a circle in the plane). Usually, in practice, we don't have such nice domains; but we usually know from physical considerations that our function has a max or min (e.g., find the maximum volume you can enclose in a box made from 300 square inches of cardboard...), and so we still know that it has to occur at a critical point of our function.
Finding critical points involves solving two (or more) equations simultaneously. This can be very difficult; a different approach gradient search, use the idea of `walking to' the maximum (or minimum), as an approach to aaproximating local extrema. The basic idea is to start at a point, and walk in the direction the the function goes up the fastest, i.e., in the direction of the gradient at that point. Symbolically, if we start with an initial `guess' of (x0,y0) for a max of a function F, the idea is to look at the vaules of f as we walk in the direction of Ñf(x0,y0), i.e., look at the function of one variable
at t = 0, g has positive derivative (what is it?), and so for awhile g increases; we can determine when it will stop increasing by finding its (first positive) critical point. At this point we can no longer guarantee that continuing on f will continue to increase, so instead we stop at this pojnt (x1,y1), take stock, and pick a new direction to go to make f increase, namely, in the direction of Ñf(x1,y1). Then we look at
We then follow along this function until it stops going up, take stock again, and head off in a new direction again. The idea is that if we keep going up, and our function has a max, then eventually this procedure will land us in the vicinity of that max. This isn't really true: if the sequence of points we find ourselves at converges, it's probably converging to a local max, but maybe not the global one. But this is very straighforward procedure, easy to implement of a computer, and can do a good job of finding candidates for maximums. By starting the process at lots of different points, we can collect alot of candidates for max's, increasing our chances of finding the (approximation to the) real global max.
The basic idea is that if we think of our constraint as describing a level curve (or surface) of a function g, then we are trying to maximize or minimize f among all the points of the level curve. If the level curves of f are cutting across our level curve of g, it's easy to see that we can icrease or decrease f while still staying on the level curve of g. So at a max or min, the level curve of f has to be tangent to our constraining level curve of g. This in turn means:
At a max or min of f subject to the constraint g, Ñf = lÑg (for some real number l)
We must also satisfy the constraint : g(x,y) = c.
So to solve a constrained optimization problem (m,ax.min of f subject to the constraint g(x,y) = c) we solve
Ñf = lÑg and g(x,y) = c
This in turn allows us to finish our procedure for finding global extrema, since step (3) can be interpreted as a constrained optimization problem (max or min on the boundary). In these terms,
To optimize f subject to the condition g(x,y) £ c, we (1) solve Ñf = 0 and g(x,y) < c, (2) solve Ñf = lÑg and g(x,y) = c, (3) plug all of these points into f, (4) the biggest is the max, the smallest is the min.
[This works fine, unless the region g(x,y) £ c runs off to infinity; but often, physical considerations will still tell us that one of our critical points is an optimum.]
Chapter 15: Integrating Functions of Several Variables
For functions of two variables, we do the exact same thing. To integrate a function f over a rectangle in the plane, we cut the rectangle into lots of tiny rectangles, with side lengths Dxi and Dyj, pick a point in each rectangle, and then add up f(xi,yj)DxiDyj . This gives an approximation to the actual integral; letting the little side length go to zero, we arrive at what we would call the integral of f over the rectangle R, which we denote by
òR f dA (where dA denotes the `differential of area' dxdy (or dydx)
The idea is that if we think of f as measuring height above the rectangle, then f(xi,yj)DxiDyj is the volume of a thin rectangular box; letting the D's go to zero, the integral would then measure the volume under the graph of f, lying over the rectangle R.
If the region R isn't a rectangle, we can still use this method of defining an integral; we simply cover R with tiny rectangles, take the same sum, and let the D's go to 0.
Of course, we have no reason to believe that as the D's go to 0, this collection of sums will converge to a single number. But it is a basic fact that if the function f is continuous, and the region R isn't too ugly, then these sums always will converge.
The idea is that we already know how to compute volumes, and so we implicitly know how to compute double integrals! We can compute the volume of a region by integrating the area of a slice. You can do this two ways; (thinking in terms of the region R in the plane) you can slice R into horizontal lines, and integrate the area of the slices dy, or you can slice R into vertical lines, and integrate the slices dx.
but each slice can be interpreted as an integral; the area of a horizontal slice is the integral of f, thought of as just a function of x, and the area of a vertical slice is the integral of f, thought of as just a function of y. This leads to two ways to compute our integral:
òR f dA = òcd(òab f(x,y) dx) dy (for horiz slices) = òab(òcd f(x,y) dy) dx (for vert slices)
In each case, the inner integral is thought of as the integral of a function of one variable. It just happens to be a different variable in each case. In the case of a rectangle, the limits of integration are just numbers, as we have written it. In the case of a more complicated region R, the inner limits of integration might depend on where we cut. The idea is that a slice along a horizontal line is a slice along y = constant, and the endpoints of the integral might depend on y; for a slice along a vertical line (x = constant), the endpoints might depend on x .
So, e.g., to integrate a function f over the region lying between the graphs of y = 4x and y = x3, we would compute either
ò02(òx34x f(x,y) dy) dx or ò08(òy/4y1/3 f(x,y) dx) dy
Which should we compute? Whichever one is easier! They give the same number!