Steven R. Dunbar
Department of Mathematics
203 Avery Hall
University of Nebraska-Lincoln
Lincoln, NE 68588-0130
Stochastic Processes and
Advanced Mathematical Finance
Models of Stock Market Prices
Mathematically Mature: may contain mathematics beyond calculus with proofs.
What would be some desirable characteristics for a stochastic process model of a security price?
is another measure of variation on the time scale of the sequence of prices.
is another measure of variation on the time scale of the sequence of prices.
Let be a sequence of prices on a stock or a portfolio of stocks, measured at regular intervals, say day to day. A natural question is how best to measure the variation of the prices on the time scale of the regular measurement. A ﬁrst natural deﬁnition of variation is the proportional return at time
The proportional return is usually just called the return, and often it is expressed as a percentage. A beneﬁt of using returns versus prices is normalization: measuring all variables in a comparable metric, essentially percentage variation. Using proportional returns allows consistent comparison among two or more securities even though their price sequences may diﬀer by orders of magnitude. Having comparable variations is a requirement for many multidimensional statistical analyses. For example, interpreting a covariance is meaningful when the variables are measured in percentages.
Deﬁne the log-return
as another measure of variation on the time scale of the sequence of prices. For small returns, the diﬀerence between returns and log-returns is small. Notice that
More generally, a statistic calculated from a sequence of prices is the compounding return at time over periods, deﬁned as
Taking logarithms, a simpliﬁcation arises,
the logarithm of the compounding return over periods is just the diﬀerence between the logarithm of the price at the ﬁnal and initial periods. Furthermore,
So, the advantage of using log-returns is that they are additive. Recall that the sum of independent normally-distributed random variables is normal. Therefore, if we assume that log-returns are normally distributed, then the logarithm of the compounding return is normally distributed. However the product of normally-distributed variables has no easy distribution, in particular, it is not normal. So even if we make the simplifying assumption that the returns are normally distributed, there is no corresponding result for the compounded return.
Using Brownian Motion for modeling stock prices varying over continuous time has two obvious problems:
Nobel prize-winning economist Paul Samuelson proposed a solution to both problems in 1965 by modeling stock prices as a Geometric Brownian Motion.
Let be the continuous-time stock process. The following assumptions about price increments are the foundation for a model of stock prices.
The ﬁrst assumption is based on the observation that stock prices have a general overall growth (or decay if ) rate due to economic conditions. For mathematical simplicity, we take to be constant, because we know that in the absence of randomness, this leads to the exponential function, a simple mathematical function. The second observation is based on the observation that stock prices vary both up and down on short times. The change is apparently random due to a variety of inﬂuences, and that the distribution of changes appears to be normally distributed. The normal distribution has many mathematically desirable features, so for simplicity the randomness is taken to be normal. The proportionality constants are taken to be constant for mathematical simplicity.
These assumptions can be mathematically modeled with a stochastic diﬀerential equation
We already have the solution of this stochastic diﬀerential equation as Geometric Brownian Motion:
At each time the Geometric Brownian Motion has a lognormal distribution with parameters and . The mean stock price at any time is . The variance of the stock price at any time is
Note that with this model, the log-return over a period from to is . The log-return is normally distributed with mean and variance characterized by the parameters associated with the security. This is consistent with the assumptions about the distribution of log-returns of regular sequences of security processes.
The constant is often called the drift and is called the volatility. Drifts and volatility are usually reported on an annual basis. Therefore, some care and attention is necessary when converting this annual rate to a daily rate. An “annual basis” in ﬁnance often means days per year because there are that many trading days, not counting weekends and holidays in the -day calendar. In order to be consistent in this text with most other applied mathematics we will use days for annual rates. If necessary, the conversion of annual rates to ﬁnancial-year annual rates is a straightforward matter of dividing annual rates by and then multiplying by .
The Wilshire 5000 Total Market Index, or more simply the Wilshire 5000, is an index of the market value of all stocks actively traded in the United States. The index is intended to measure the performance of publicly traded companies headquartered in the United States. Stocks of extremely small companies are excluded.
In spite of the name, the Wilshire 5000 does not have exactly 5000 stocks. Developed in the summer of 1974, the index had just shy of the 5,000 issues at that time. The membership count has ranged from 3,069 to 7,562. The member count was 3,818 as of September 30, 2014.
The index is computed as
where is the price of one share of issue included in the index, is the number of shares of issue , is the number of member companies included in the index, and is a ﬁxed scaling factor. The base value for the index was points on base date December 31, 1980, when it had a total market capitalization of $1,404.596 billion. On that date, each one-index-point change in the index was equal to $1 billion. However, index divisor adjustments due to index composition changes have changed the relationship over time, so that by 2005 each index point reﬂected a change of about $1.2 billion in the total market capitalization of the index.
The index was renamed the “Dow Jones Wilshire 5000” in April 2004, after Dow Jones & Company assumed responsibility for its calculation and maintenance. On March 31, 2009 the partnership with Dow Jones ended and the index returned to Wilshire Associates.
The Wilshire 5000 is the weighted sum of many stock values, each of which we may reasonably assume is a random variable presumably with a ﬁnite variance. If the random variables are independent, then the Central Limit Theorem would suggest that the index should be normally distributed. Therefore, a reasonable hypothesis is that the Wilshire 5000 is a normal random variable, although we do not know the mean or variance in advance. (The assumption of independence is probably too strong, since general economic conditions aﬀect most stocks similarly.)
Data for the Wilshire 5000 is easy to obtain. For example, the Yahoo Finance page for W5000. provides a download with the Date, Open, Close, High, Low, Volume and Adjusted Close values of the index in reverse order from today to April 1, 2009, the day Wilshire Associates resumed calculation of the index. (The Adjusted Close is an adjusted price for dividends and splits that does not aﬀect this analysis.) The data comes in the form of a comma-separated-value text ﬁle. This ﬁle format is well-suited as input for many programs, especially spreadsheets and data analysis programs such as R. This analysis uses R.
The data from December 31, 2014 back to April 1, 2009 provides records with seven ﬁelds each. This analysis uses the logarithm of each of the Close prices. Reversing them and then taking the diﬀerences gives daily log-returns. The mean of the daily log-returns is and the variance of daily log-returns is . Assume that the log-returns are normally distributed so that the mean change over a year of trading days is and the variance over a year of trading days is: . Then the annual standard deviation is . The initial value on April 1, 2009 is .
Use the values and with initial value in the stochastic diﬀerential equation
The resulting Geometric Brownian Motion is a model for the Wilshire 5000 index. Figure 1 is a plot of the actual data for the Wilshire 5000 over the period April 1, 2009 to December 31, 2014 along with a simulation using a Geometric Brownian Motion with these parameters.
Comparison of the graphs of the actual Wilshire 5000 data with the Geometric Brownian Motion suggests that Geometric Brownian Motion gives plausible results. However, deeper investigation to ﬁnd if the fundamental modeling hypotheses are actually satisﬁed is important.
Consider again the daily log-returns. The log-returns are normalized by subtracting the mean and dividing by the standard deviation of the 1448 changes. The maximum of the 1448 normalized changes is and the minimum is . Already we have a hint that the distribution of the data is not normally distributed, since the likelihood of seeing normally distributed data varying to standard deviations from the mean is negligible.
In R, the hist command on the normalized data gives an empirical density histogram. For simplicity, here the histogram is taken over the 14 one-standard-deviation intervals from to . For this data, the density histogram of the normalized data has the values in Table 1.
This means, for example, that of the log-returns, that is , occurred in the interval and a fraction , or points, fall in the interval . The normal distribution gives the expected density on the same intervals. The ratio between the empirical density and normal density gives an indication of the deviation from normality. For data within standard deviations, the ratios are about what we would expect. This is reassuring that for reasonably small changes, the log-returns are approximately normal. However, the ratio on the interval is approximately , and the ratio on the interval is approximately . Each of these is much greater than we expect, that is, the extreme tails of the empirical density have much greater probability than expected.
A quantile-quantile (q-q) plot is a graphical technique for determining if two data sets come from populations with a common distribution. A q-q plot is a plot of the quantiles of the ﬁrst data set against the quantiles of the second data set. By a quantile, we mean the fraction (or percent) of points below the given value. That is, the (or 30%) quantile is the data point at which 30% percent of the data fall below and 70% fall above that value. As another example, the median is the quantile.
A q-q plot is formed by plotting estimated quantiles from data set 2 on the vertical axis against estimated quantiles from data set 1 on the horizontal axis. Both axes are in units of their respective data sets. For a given point on the q-q plot, we know the quantile level is the same for both points, but not what the quantile level actually is.
If the data sets have the same size, the q-q plot is essentially a plot of sorted data set 1 against sorted data set 2. If the data sets are not of equal size, the quantiles are usually picked to correspond to the sorted values from the smaller data set and then the quantiles for the larger data set are interpolated from the data.
The quantiles from the normal distribution are the values from the inverse cumulative distribution function. These quantiles are tabulated, or they may be obtained from statistical software. For example, in R the function qnorm gives the quantiles, e.g. qnorm(0.25) = -0.6744898, and qnorm(0.90) = 1.281552. Estimating the quantiles for the normalized Wilshire 5000 data is more laborious but is not diﬃcult in principle. Sorting the data and then ﬁnding the value for which % of the data is less and % of the data is greater gives the percentile for . For example, the -quantile for the scaled and normalized daily changes is and the -quantile for the scaled and normalized daily changes is . Then for the q-q plot, plot the values and . Using many quantiles gives the full q-q plot.
In R, to create a q-q plot of the log-changes for the Wilshire 5000 data with a reference line against the normal distribution use the two commands qqnorm() and qqline() on the data set. In the q-q plot in Figure 2, the “twist” of the plot above and below the reference line indicates that the tails of the normalized Wilshire 5000 data are more dispersed than the standard normal data. The low quantiles of the normalized Wilshire 5000 quantiles occur at more negative values than the standard normal distribution. The high quantiles occur at values greater than the standard normal distribution. However, for quantiles near the median, the data does seem to follow the normal distribution. The plot is a graphical representation of the fact that extreme events are more likely to occur than would be predicted by a normal distribution.
Using log-returns for stock price has the simplifying advantage of additivity. If we assume that log-returns are normally distributed, then the logarithm of the compounding return is normally distributed.
Two problems with using Brownian Motion for modeling stock prices varying over continuous time lead to modeling stock prices with the stochastic diﬀerential equation for Geometric Brownian Motion. The log-return over a period from to is . The log-return is normally distributed with mean and variance characterized by the parameters associated with the security.
The modeling assumptions can be tested against data for the Wilshire 5000 Index. The Geometric Brownian Motion models using the parameters for the Wilshire 5000 Index resemble the actual data. However, deeper statistical investigation of the log-returns shows that while log-returns within standard deviations from the mean are normally distributed, extreme events are more likely to occur than would be predicted by a normal distribution.
This section is adapted from: “Measuring historical volatility”, Cathy O’Neil, July 24, 2011, http://mathbabe.org/2011/08/30/why-log-returns/ “Why Log Returns”, Quantitivity, February 11, 2011, https://quantivity.wordpress.com/2011/02/21/why-log-returns/ “Geometric Brownian Motion and the Eﬃcient Market Hypothesis”, Ronald W. Shonkwiler, in Finance with Monte Carlo, 2012. Information about the Wilshire 5000 comes from . The explanations of q-q plots is adapted from the NIST Engineering Statistics Handbook, .
This script simulates the Wilshire 5000 Index over the period April 1, 2009 to December 31, 2014 using Geometric Brownian Motion with drift and standard deviation parameters calculated from the data over that same period. The resulting simulation is plotted over that year time period. Set the drift , variance , time interval , and starting value . Set the number of time divisions and compute the time increment . Compute a Wiener process simulation, then apply the exponential to create a Geometric Brownian Motion with the parameters. Plot the simulation on the time interval, or output the data to a ﬁle for later plotting or use.
R script for Stock Market Model..
I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable eﬀort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.
I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don’t guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reﬂects the thoughts, interests and opinions of its author. They do not explicitly represent oﬃcial positions or policies of my employer.
Information on this website is subject to change without notice.
Steve Dunbar’s Home Page, http://www.math.unl.edu/~sdunbar1
Email to Steve Dunbar, sdunbar1 at unl dot edu
Last modiﬁed: Processed from LATEX source on August 8, 2016