Steven R. Dunbar
Department of Mathematics
203 Avery Hall
University of Nebraska-Lincoln
Lincoln, NE 68588-0130
http://www.math.unl.edu
Voice: 402-472-3731
Fax: 402-472-8466

Stochastic Processes and
Advanced Mathematical Finance

__________________________________________________________________________

Models of Stock Market Prices

_______________________________________________________________________

Note: These pages are prepared with MathJax. MathJax is an open source JavaScript display engine for mathematics that works in all browsers. See http://mathjax.org for details on supported browsers, accessibility, copy-and-paste, and other features.

_______________________________________________________________________________________________

Rating

Rating

Mathematically Mature: may contain mathematics beyond calculus with proofs.

_______________________________________________________________________________________________

Section Starter Question

Section Starter Question

What would be some desirable characteristics for a stochastic process model of a security price?

_______________________________________________________________________________________________

Key Concepts

Key Concepts

  1. A natural definition of variation of a stock price st is the proportional return rt at time t
    rt = (st st1)st1.

  2. The log-return
    ρi = log(stst1)

    is another measure of variation on the time scale of the sequence of prices.

  3. For small returns, the difference between returns and log-returns is small.
  4. The advantage of using log-returns is that they are additive.
  5. Using Brownian Motion for modeling stock prices varying over continuous time has two obvious problems:
    1. Brownian Motion can attain negative values.
    2. Increments in Brownian Motion have certain variance on a given time interval, so do not reflect proportional changes.
  6. Modeling security price changes with a stochastic differential equation leads to a Geometric Brownian Motion model.
  7. Deeper statistical investigation of the log-returns shows that while log-returns within 4 standard deviations from the mean are normally distributed, extreme events are more likely to occur than would be predicted by a normal distribution.

__________________________________________________________________________

Vocabulary

Vocabulary

  1. A natural definition of variation of a stock price st is the proportional return rt at time t
    rt = (st st1)st1.

  2. The log-return
    ρi = log(stst1)

    is another measure of variation on the time scale of the sequence of prices.

  3. The compounding return at time t over n periods is
    st stn = (1 + rt)(1 + rt1)(1 + rtn+1)

  4. The Wilshire 5000 Total Market Index, or more simply the Wilshire 5000, is an index of the market value of all stocks actively traded in the United States.
  5. A quantile-quantile (q-q) plot is a graphical technique for determining if two data sets come from populations with a common distribution.

__________________________________________________________________________

Mathematical Ideas

Mathematical Ideas

Returns, Log-Returns, Compound Returns

Let st be a sequence of prices on a stock or a portfolio of stocks, measured at regular intervals, say day to day. A natural question is how best to measure the variation of the prices on the time scale of the regular measurement. A first natural definition of variation is the proportional return rt at time t

rt = (st st1)st1.

The proportional return is usually just called the return, and often it is expressed as a percentage. A benefit of using returns versus prices is normalization: measuring all variables in a comparable metric, essentially percentage variation. Using proportional returns allows consistent comparison among two or more securities even though their price sequences may differ by orders of magnitude. Having comparable variations is a requirement for many multidimensional statistical analyses. For example, interpreting a covariance is meaningful when the variables are measured in percentages.

Define the log-return

ρt = log(stst1)

as another measure of variation on the time scale of the sequence of prices. For small returns, the difference between returns and log-returns is small. Notice that

1 + rt = st st1 = exp log st st1 = exp(ρt).

Therefore

ρt = log(1 + rt) rt, for rt 1.

More generally, a statistic calculated from a sequence of prices is the compounding return at time t over n periods, defined as

st stn = (1 + rt)(1 + rt1)(1 + rtn+1).

Taking logarithms, a simplification arises,

log st stn = log(st) log(stn)

the logarithm of the compounding return over n periods is just the difference between the logarithm of the price at the final and initial periods. Furthermore,

log st stn = log (1 + rt)(1 + rt1)(1 + rtn+1) = ρ1 + ρ2 + + ρn.

So, the advantage of using log-returns is that they are additive. Recall that the sum of independent normally-distributed random variables is normal. Therefore, if we assume that log-returns are normally distributed, then the logarithm of the compounding return is normally distributed. However the product of normally-distributed variables has no easy distribution, in particular, it is not normal. So even if we make the simplifying assumption that the returns are normally distributed, there is no corresponding result for the compounded return.

Modeling from Stochastic Differential Equations

Using Brownian Motion for modeling stock prices varying over continuous time has two obvious problems:

  1. Even if started from a positive value X0 > 0, at each time there is a positive probability that the process attains negative values, this is unrealistic for stock prices.
  2. Stocks selling at small prices tend to have small increments in price over a given time interval, while stocks selling at high prices tend to have much larger increments in price on the same interval. Brownian Motion has a variance which depends on a time interval but not on the process value, so this too is unrealistic for stock prices.

Nobel prize-winning economist Paul Samuelson proposed a solution to both problems in 1965 by modeling stock prices as a Geometric Brownian Motion.

Let S(t) be the continuous-time stock process. The following assumptions about price increments are the foundation for a model of stock prices.

  1. Stock price increments have a deterministic component. In a short time, changes in price are proportional to the stock price itself with constant proportionality rate r.
  2. The stock price increments have a random component. In a short time, changes in price are jointly proportional to the stock price, a standard normal random variable, and the time increment with constant proportionality rate σ.

The first assumption is based on the observation that stock prices have a general overall growth (or decay if r < 0) rate due to economic conditions. For mathematical simplicity, we take r to be constant, because we know that in the absence of randomness, this leads to the exponential function, a simple mathematical function. The second observation is based on the observation that stock prices vary both up and down on short times. The change is apparently random due to a variety of influences, and that the distribution of changes appears to be normally distributed. The normal distribution has many mathematically desirable features, so for simplicity the randomness is taken to be normal. The proportionality constants are taken to be constant for mathematical simplicity.

These assumptions can be mathematically modeled with a stochastic differential equation

dS(t) = rSdt + σSdW(t),S(0) = S0.

We already have the solution of this stochastic differential equation as Geometric Brownian Motion:

S(t) = S0 exp((r (12)σ2)t + σW(t)).

At each time the Geometric Brownian Motion has a lognormal distribution with parameters (ln(S0) + rt (12)σ2t) and σt. The mean stock price at any time is 𝔼 X(t) = S0 exp(rt). The variance of the stock price at any time is

Var X(t) = S02 exp(2rt)[exp(σ2t) 1].

Note that with this model, the log-return over a period from t n to t is (r σ22)n + σ[W(t) W(t n)]. The log-return is normally distributed with mean and variance characterized by the parameters associated with the security. This is consistent with the assumptions about the distribution of log-returns of regular sequences of security processes.

The constant r is often called the drift and σ is called the volatility. Drifts and volatility are usually reported on an annual basis. Therefore, some care and attention is necessary when converting this annual rate to a daily rate. An “annual basis” in finance often means 252 days per year because there are that many trading days, not counting weekends and holidays in the 365-day calendar. In order to be consistent in this text with most other applied mathematics we will use 365 days for annual rates. If necessary, the conversion of annual rates to financial-year annual rates is a straightforward matter of dividing annual rates by 365 and then multiplying by 252.

Testing the Assumptions on Data

The Wilshire 5000 Total Market Index, or more simply the Wilshire 5000, is an index of the market value of all stocks actively traded in the United States. The index is intended to measure the performance of publicly traded companies headquartered in the United States. Stocks of extremely small companies are excluded.

In spite of the name, the Wilshire 5000 does not have exactly 5000 stocks. Developed in the summer of 1974, the index had just shy of the 5,000 issues at that time. The membership count has ranged from 3,069 to 7,562. The member count was 3,818 as of September 30, 2014.

The index is computed as

W = α i=1MN iPi

where Pi is the price of one share of issue i included in the index, Ni is the number of shares of issue i, M is the number of member companies included in the index, and α is a fixed scaling factor. The base value for the index was 1404.60 points on base date December 31, 1980, when it had a total market capitalization of $1,404.596 billion. On that date, each one-index-point change in the index was equal to $1 billion. However, index divisor adjustments due to index composition changes have changed the relationship over time, so that by 2005 each index point reflected a change of about $1.2 billion in the total market capitalization of the index.

The index was renamed the “Dow Jones Wilshire 5000” in April 2004, after Dow Jones & Company assumed responsibility for its calculation and maintenance. On March 31, 2009 the partnership with Dow Jones ended and the index returned to Wilshire Associates.

The Wilshire 5000 is the weighted sum of many stock values, each of which we may reasonably assume is a random variable presumably with a finite variance. If the random variables are independent, then the Central Limit Theorem would suggest that the index should be normally distributed. Therefore, a reasonable hypothesis is that the Wilshire 5000 is a normal random variable, although we do not know the mean or variance in advance. (The assumption of independence is probably too strong, since general economic conditions affect most stocks similarly.)

Data for the Wilshire 5000 is easy to obtain. For example, the Yahoo Finance page for W5000. provides a download with the Date, Open, Close, High, Low, Volume and Adjusted Close values of the index in reverse order from today to April 1, 2009, the day Wilshire Associates resumed calculation of the index. (The Adjusted Close is an adjusted price for dividends and splits that does not affect this analysis.) The data comes in the form of a comma-separated-value text file. This file format is well-suited as input for many programs, especially spreadsheets and data analysis programs such as R. This analysis uses R.

The data from December 31, 2014 back to April 1, 2009 provides 1449 records with seven fields each. This analysis uses the logarithm of each of the Close prices. Reversing them and then taking the differences gives 1448 daily log-returns. The mean of the 1448 daily log-returns is 0.0006675644 and the variance of 1448 daily log-returns is 0.0001178775. Assume that the log-returns are normally distributed so that the mean change over a year of 252 trading days is 0.1682262 and the variance over a year of 252 trading days is: 0.02970512. Then the annual standard deviation is 0.1722922. The initial value on April 1, 2009 is 8242.38.

Use the values r = 0.1682262 and σ = 0.1722922 with initial value S0 = 8242.38 in the stochastic differential equation

dS(t) = rSdt + σSdW(t),S(0) = S0.

The resulting Geometric Brownian Motion is a model for the Wilshire 5000 index. Figure 1 is a plot of the actual data for the Wilshire 5000 over the period April 1, 2009 to December 31, 2014 along with a simulation using a Geometric Brownian Motion with these parameters.


stockmarketmodel-1.png

Figure 1: The Wilshire 5000 Index from April 1, 2009 to December 31, 2014 plotted in blue along with a Geometric Brownian Motion having the same mean, variance and starting value in red.

Testing the Normality Hypothesis

Comparison of the graphs of the actual Wilshire 5000 data with the Geometric Brownian Motion suggests that Geometric Brownian Motion gives plausible results. However, deeper investigation to find if the fundamental modeling hypotheses are actually satisfied is important.

Consider again the 1448 daily log-returns. The log-returns are normalized by subtracting the mean and dividing by the standard deviation of the 1448 changes. The maximum of the 1448 normalized changes is 4.58 and the minimum is 6.81. Already we have a hint that the distribution of the data is not normally distributed, since the likelihood of seeing normally distributed data varying 4 to 6 standard deviations from the mean is negligible.

In R, the hist command on the normalized data gives an empirical density histogram. For simplicity, here the histogram is taken over the 14 one-standard-deviation intervals from 7 to 7. For this data, the density histogram of the normalized data has the values in Table 1.


(7,6](6,5](5,4](4,3](3,2](2,1](1, 0]
0.00069 0.000000.002760.006220.026930.087710.35428
(0, 1] (1, 2](2, 3](3, 4](4, 5](5, 6](6, 7]
0.40884 0.087710.017960.004830.002070.000000.00000

Table 1: Density histogram of the normalized Wilshire 5000 data.

This means, for example, that 0.00069 of the 1448 log-returns, that is 1, occurred in the interval (7,6] and a fraction 0.4088397790, or 592 points, fall in the interval (0, 1]. The normal distribution gives the expected density on the same intervals. The ratio between the empirical density and normal density gives an indication of the deviation from normality. For data within 4 standard deviations, the ratios are about what we would expect. This is reassuring that for reasonably small changes, the log-returns are approximately normal. However, the ratio on the interval (7, 6] is approximately 70,000, and the ratio on the interval (4, 5] is approximately 66. Each of these is much greater than we expect, that is, the extreme tails of the empirical density have much greater probability than expected.

Quantile-Quantile Plots

A quantile-quantile (q-q) plot is a graphical technique for determining if two data sets come from populations with a common distribution. A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second data set. By a quantile, we mean the fraction (or percent) of points below the given value. That is, the 0.3 (or 30%) quantile is the data point at which 30% percent of the data fall below and 70% fall above that value. As another example, the median is the 0.5 quantile.

A q-q plot is formed by plotting estimated quantiles from data set 2 on the vertical axis against estimated quantiles from data set 1 on the horizontal axis. Both axes are in units of their respective data sets. For a given point on the q-q plot, we know the quantile level is the same for both points, but not what the quantile level actually is.

If the data sets have the same size, the q-q plot is essentially a plot of sorted data set 1 against sorted data set 2. If the data sets are not of equal size, the quantiles are usually picked to correspond to the sorted values from the smaller data set and then the quantiles for the larger data set are interpolated from the data.

The quantiles from the normal distribution are the values from the inverse cumulative distribution function. These quantiles are tabulated, or they may be obtained from statistical software. For example, in R the function qnorm gives the quantiles, e.g. qnorm(0.25) = -0.6744898, and qnorm(0.90) = 1.281552. Estimating the quantiles for the normalized Wilshire 5000 data is more laborious but is not difficult in principle. Sorting the data and then finding the value for which 100k% of the data is less and 100(1 k)% of the data is greater gives the k percentile for 0 k 1. For example, the 0.25-quantile for the scaled and normalized daily changes is 0.4342129 and the 0.90-quantile for the scaled and normalized daily changes is 1.072526. Then for the q-q plot, plot the values (0.6744898,0.4342129) and (1.281552, 1.072526). Using many quantiles gives the full q-q plot.


PIC

Figure 2: The q-q plot for the normalized log-changes from the Wilshire 5000.

In R, to create a q-q plot of the log-changes for the Wilshire 5000 data with a reference line against the normal distribution use the two commands  qqnorm() and  qqline() on the data set. In the q-q plot in Figure 2, the “twist” of the plot above and below the reference line indicates that the tails of the normalized Wilshire 5000 data are more dispersed than the standard normal data. The low quantiles of the normalized Wilshire 5000 quantiles occur at more negative values than the standard normal distribution. The high quantiles occur at values greater than the standard normal distribution. However, for quantiles near the median, the data does seem to follow the normal distribution. The plot is a graphical representation of the fact that extreme events are more likely to occur than would be predicted by a normal distribution.

Summary

Using log-returns for stock price has the simplifying advantage of additivity. If we assume that log-returns are normally distributed, then the logarithm of the compounding return is normally distributed.

Two problems with using Brownian Motion for modeling stock prices varying over continuous time lead to modeling stock prices with the stochastic differential equation for Geometric Brownian Motion. The log-return over a period from t n to t is (r σ22)n + σ[W(t) W(t n)]. The log-return is normally distributed with mean and variance characterized by the parameters associated with the security.

The modeling assumptions can be tested against data for the Wilshire 5000 Index. The Geometric Brownian Motion models using the parameters for the Wilshire 5000 Index resemble the actual data. However, deeper statistical investigation of the log-returns shows that while log-returns within 4 standard deviations from the mean are normally distributed, extreme events are more likely to occur than would be predicted by a normal distribution.

Sources

This section is adapted from: “Measuring historical volatility”, Cathy O’Neil, July 24, 2011,  http://mathbabe.org/2011/08/30/why-log-returns/ “Why Log Returns”, Quantitivity, February 11, 2011,  https://quantivity.wordpress.com/2011/02/21/why-log-returns/ “Geometric Brownian Motion and the Efficient Market Hypothesis”, Ronald W. Shonkwiler, in Finance with Monte Carlo, 2012. Information about the Wilshire 5000 comes from [2]. The explanations of q-q plots is adapted from the NIST Engineering Statistics Handbook, [1].

_______________________________________________________________________________________________

Algorithms, Scripts, Simulations

Algorithms, Scripts, Simulations

Algorithm

This script simulates the Wilshire 5000 Index over the period April 1, 2009 to December 31, 2014 using Geometric Brownian Motion with drift and standard deviation parameters calculated from the data over that same period. The resulting simulation is plotted over that 5.75 year time period. Set the drift r, variance σ, time interval T, and starting value S0. Set the number of time divisions n and compute the time increment Δ. Compute a Wiener process simulation, then apply the exponential to create a Geometric Brownian Motion with the parameters. Plot the simulation on the time interval, or output the data to a file for later plotting or use.

Scripts

R

R script for Stock Market Model..

1mu <- 0.1682262 
2sigma <- 0.1722922 
3T <- 5.75 
4# length of the interval [0, T] in time units of years 
5S0 <- 8242.38 
6 
7N <- 1448 
8# number of end-points of the grid including T 
9Delta <- T/N 
10# time increment, 
11 
12t <- seq(0, T, length = N + 1) 
13W <- c(0, cumsum(sqrt(Delta) * rnorm(N)))  # Wiener process, 
14GBM <- S0 * exp(mu * t + sigma * W) 
15 
16plot(t, GBM, type = "l", xaxt = "n", ylab = "Simulated Wilshire 5000 Index") 
17axis(1, at = c(0.75, 1.75, 2.75, 3.75, 4.75, 5.75), label = c("2010", "2011", "2012", 
18    "2013", "2014", "2015"))
Octave

Octave script for Stock Market Model..

1mu = 0.168226; 
2sigma = 0.1722922; 
3T = 5.75; 
4# length of the interval [0, T] in time units of years 
5S0 = 8242.38; 
6 
7N = 1448; 
8# number of end - points of the grid including T 
9Delta = T / N; 
10# time increment 
11 
12W = zeros(1, N + 1); 
13# initialization of the vector W approximating 
14# Wiener process 
15t = linspace(0, T, N + 1); 
16W(2:N + 1) = cumsum(sqrt(Delta) * stdnormal_rnd(1, N)); 
17GBM = S0 * exp(mu * t + sigma * W); 
18 
19plot(t, GBM) 
20set (gca, xtick, [0.75, 1.75, 2.75, 3.75, 4.75, 5.75]) 
21set (gca, xticklabel, {2010, 2011, 2012, 2013, 2014, 2015})
Perl

Perl PDL script for Stock Market Model..

1use PDL::NiceSlice; 
2 
3$mu    = 0.1682262; 
4$sigma = 0.1722922; 
5$T     = 5.75; 
6 
7# length of the interval [0, T] in time units 
8$S0 = 8242.38; 
9 
10$N = 1448; 
11 
12# number of end-points of the grid including T 
13$Delta = $T / $N; 
14 
15# time increment 
16 
17$W = zeros( $N + 1 ); 
18 
19# initialization of the vector W approximating 
20# Wiener process 
21$t = ones( $N + 1 ) * zeros( $N + 1 )->xlinvals( 0, $T ); 
22 
23# Note the use of PDL dim 1 threading rule (PDL name for R recycling) 
24$W ( 1 : $N ) .= cumusumover( sqrt($Delta) * grandom($N) ); 
25 
26$GBM = $S0 * exp( $mu * $t + $sigma * $W ); 
27 
28# file output to use with external plotting programming 
29# such as gnuplot, R, octave, etc. 
30# Start gnuplot, then from gnuplot prompt 
31#    plot "stockmarketmodel.dat" with lines;\ 
32#    set xtic ("2010" 0.75, "2011" 1.75, "2012" 2.75, "2013" 3.75, "2014" 4.75, "2015" 5.75) 
33 
34open( F, ">stockmarketmodel.dat" ) || die "cannot write: $! "; 
35foreach $j ( 0 .. $N ) { 
36    print F $t->range( [$j] ), " ", $GBM->range( [$j] ), "\n"; 
37} 
38close(F);
SciPy

Scientific Python script for Stock Market Model..

1import scipy 
2 
3mu = 0.1682262 
4sigma = 0.1722922 
5T = 5.75 
6 
7# length of the interval [0, T] in time units 
8 
9S0 = 8242.38 
10 
11N = 1448 
12 
13# number of end-points of the grid including T 
14 
15Delta = T / N 
16 
17# time increment 
18 
19W = scipy.zeros(N + 1, dtype=float) 
20 
21# initialization of the vector W approximating 
22# Wiener process 
23 
24t = scipy.ones(N + 1, dtype=float) * scipy.linspace(0, T, N + 1) 
25 
26# Note the use of recycling 
27 
28W[1:N + 1] = scipy.cumsum(scipy.sqrt(Delta) 
29                          * scipy.random.standard_normal(N)) 
30 
31GBM = S0 * scipy.exp(mu * t + sigma * W) 
32 
33# optional file output to use with external plotting programming 
34# such as gnuplot, R, octave, etc. 
35# Start gnuplot, then from gnuplot prompt 
36#    plot "stockmarketmodel.dat" with lines;\ 
37#    set xtic ("2010" 0.75, "2011" 1.75, "2012" 2.75, "2013" 3.75, "2014" 4.75, "2015" 5.75) 
38 
39f = open(stockmarketmodel.dat, w) 
40for j in range(0, N): 
41    f.write(str(t[j]) +   + str(GBM[j]) + \n) 
42f.close()

__________________________________________________________________________

Problems to Work

Problems to Work for Understanding

  1. Show that
    ρi = log(1 + ri) ri, for r 1 .

  2. If standard Brownian Motion is started from a positive value X0 > 0, write the expression for the positive probability that the process can attain negative values at time t. Write the expression for the probability that Brownian Motion can ever become negative.
  3. Choose a stock index such as the S & P 500, the Dow Jones Industrial Average etc., and obtain closing values of that index for a year-long (or longer) interval of trading days. Find the growth rate and variance of the closing values and create a Geometric Brownian Motion on the same interval with the same initial value, growth rate, and variance. Plot both sets of data on the same axes, as in Figure 1. Discuss the similarities and differences.
  4. Choose an individual stock or a stock index such as the S & P 500, the Dow Jones Industrial Average, etc., and obtain values of that index at regular intervals such as daily or hourly for a long interval of trading. Find the log-changes, and normalize by subtracting the mean and dividing by the standard deviation. Create a q-q plot for the data as in Figure 2. Discuss the similarities and differences.

__________________________________________________________________________

Books

Reading Suggestion:

References

[1]   National Institute of Standards and Technology. Engineering statistics handbook. http://www.itl.nist.gov/div898/handbook/index.htm, October 2013.

[2]   Robert Waid. Wilshore 5000: Myths and misconceptions. http://wilshire.com/media/34276/wilshire5000myths.pdf, November 2014.

__________________________________________________________________________

Links

Outside Readings and Links:

__________________________________________________________________________

I check all the information on each page for correctness and typographical errors. Nevertheless, some errors may occur and I would be grateful if you would alert me to such errors. I make every reasonable effort to present current and accurate information for public use, however I do not guarantee the accuracy or timeliness of information on this website. Your use of the information from this website is strictly voluntary and at your risk.

I have checked the links to external sites for usefulness. Links to external websites are provided as a convenience. I do not endorse, control, monitor, or guarantee the information contained in any external website. I don’t guarantee that the links are active at all times. Use the links here with the same caution as you would all information on the Internet. This website reflects the thoughts, interests and opinions of its author. They do not explicitly represent official positions or policies of my employer.

Information on this website is subject to change without notice.

Steve Dunbar’s Home Page, http://www.math.unl.edu/~sdunbar1

Email to Steve Dunbar, sdunbar1 at unl dot edu

Last modified: Processed from LATEX source on August 8, 2016