## Howard Rowlee Lecture

### Shakespeare and the case of the suspicious statisticians

Bradley Efron, Stanford University

On November 15 1985 a Shakespearean scholar discovered a nine stanza poem attributed to Shakespeare in a bound volume that had been lying in Oxford's Bodlian Library since 1755. If authentic it would be thhe first work of Shakespeare discovered since the 17th century. Did Shakespeare really write the poem? More to the point for this lecture, could statistical analysis of the text shed light on its true authorship? A statistical theory originally developed to estimate missing butterfly species turns out to have something to say about the disputed poem.

## Invited Papers

### Scales of evidence for model selection: Fisher versus Jeffreys

Bradley Efron, Stanford University

Model selection refers to a data-based choice among competing statistical models, for example choosing between a linear and a quadratic regression function. The most popular model selection techniques are based on interpretations of p-values, using a scale originally suggested by Fisher: .05 is moderate evidence against the smaller model, .01 is strong evidence, etc. Recent Bayesian literature, going back to work by Jeffreys, suggests a quite different answer to the model selection problem. Jeffreys provided an intrepetive scale for Bayes factors, a scale which can be implemented in practice by use of the BIC (Bayesian Information Criterion.) The Jeffreys scale often produces much more conservative results, especially in large samples, so for instance a .01 p-value may correspond to barely any evidence at all against the smaller model. I will review both the frequentist and Bayesian approaches, and try to say why they can give such different answers. The talk connects the two scales of evidence, and shows that frequentist methods have a Bayesian justification, but one that assumes stronger (more informative) priors than those used by the BIC.

This is joint work with Alan Gous.

### Model Checking, Model Selection, and Random Effects

Hal S. Sterns, Iowa State University

Hierarchical models incorporating random effects parameters are applied widely in fields like education, medicine, animal breeding, epidemiology. Methods for assessing the fit of hierarchical models are reviewed, focusing on the Gaussian-Gaussian hierarchical model. Simulations are used to demonstrate strengths and weaknesses of the various approaches.

### Model Selection for Estimating Small-Area Proportions

Michael Larsen, Harvard University

State-wide household surveys of alcohol and drug use often have inadequate sample sizes for estimating prevalences using traditional design-based survey estimates in individual counties and select demographic groups. Demographic information and social indicators can be used in hierarchical and regression models to produce biased, but less variable estimates for small areas. Methods of selecting variables to be used in models for small-area estimation techniques that use survey weights and are appropriate when proportions are small will be described and illustrated. Data are from the Gallup Organization.

### On Pearson-$\chi^{2}$ Testing with Unobservable Cell Frequencies and Mixed Model Diagnostics

Jiming Jiang, Case Western Reserve University

Partha Lahiri, University of Nebraska at Lincoln

Chien-Hua Wu, Abbot Laboratories, Chicago

We consider Pearson's $\chi^{2}$-test in which not only the cell probabilities are estimated, the cell frequencies are also based on estimated observations. Asymptotic null-distribution of the test statistic is derived under the assumption that the estimator of the unknown vector of parameters is independent of the observations that are used to compute the cell frequencies. Based on the result, we develop a method of checking arbitrary distributional assumptions about the random effects and errors in a mixed linear model. The method is further examined by a simulation study.

### "p-values for Bayesian model checking"

Susie Bayarri, Univ. of Valencia, Spain

The problem of investigating compatibility of an assumed model with the data is investigated, in the situation when the assumed model has unknown parameters. The most frequently used measures of compatibility are $p$-values, based on statistics $T$ for which large values are deemed to indicate incompatibility of the data and the model. When the null model has unknown parameters, $p$-values are not uniquely defined. The proposals for computing a $p$-value in such a situation include the {\it plug-in} and {\it similar} $p$-values on the frequentist side, and the {\it predictive} and {\it posterior predictive} $p$-values on the Bayesian side. We propose two alternatives, the {\it conditional predictive $p$-value} and the {\it partial posterior predictive $p$-value}, and indicate their advantages from both Bayesian and frequentist perspectives.

### On an Asymptotic Theory of Conditional and Unconditional Coverage Probabilities of Empirical Bayes Confidence Intervals

Gauri Sankar Datta, University of Georgia, Athens

Malay Ghosh, University of Florida, Gainesville

Partha Lahiri, University of Nebraska, Lincoln

Empirical Bayes (EB) methodology is now widely used in statistics. However, construction of EB confidence intervals is still very limited. Following Cox (1975), Hill (1990) and Carlin and Gelfand (1990, 1991), we consider EB confidence intervals, which are adjusted so that the actual coverage probabilities asymptotically meet the target coverage probabilities up to the second order. We consider both unconditional and conditional coverage, conditioning being done with respect to an ancillary statistic.

### Bayes Factors and Posterior Probabilities for Evaluating Mixed Models

Donna Pauler, Harvard University

Bayes factors, defined as the ratio of posterior to prior odds, and posterior probabilities provide consistent and easily interpretable procedures for model selection. However, sensitivity to prior specification and difficulty in calculation often limits their feasibility for practical applications, particularly for those involving multiple hypothesis tests. In this talk I derive simple asymptotic and simulation-based approximations to Bayes factors for evaluating a range of hypotheses within the context of mixed effects models. These include selection of covariates, assessment of population heterogeneity, and determination of component membership in mixed populations. I illustrate the procedures using examples in meta-analyses, cancer screening, and clinical trial compliance monitoring. For tests of fixed parameters I use Laplace's method to obtain a highly accurate approximation to the Bayes factor. For tests of variance components, both asymptotic and simulation-based estimators break down since the null hypothesis lies on the boundary of the parameter space. As a result, I propose an alternative importance sampler estimator of the Bayes factor based on a rejection algorithm, which is still straightforward to implement. Finally, for longitudinal data for which there are several competing individual-level models, I obtain individual and population posterior probabilities of component membership as by-products of a reversible jump sampling algorithm. As the examples I present illustrate, approximate Bayes factors and posterior probabilities are flexible and viable options for evaluating complex models.

### Bayesian Model Selection via the Expected Posterior Prior

James Berger, Duke University

Recently developed automatic Bayesian methods of model selection, such as the "intrinsic Bayes factor" and the "fractional Bayes factor," have proven to be highly effective but are often difficult to work with. In particular, intrinsic Bayes factors can be challenging to compute, while fractional Bayes factors require considerable care in definition and use. A highly promising new approach to the problem is based on developing explicit default priors for the models under consideration, called "expected posterior priors." These are strongly related to "intrinsic priors" arising from the intrinsic Bayes factor approach, but have the advantages of being explicitly given and being relatively easy to use in MCMC computational schemes. A variety of examples of use of expected posterior priors will be given, including an application to analysis of a mixture model arising in an astrophysical problem.

### Bayesian Wavelet Regression on Curves with Application to a Spectroscopic Calibration Problem

Marina Vannucci, Texas A&M University

Motivated by calibration problems in spectroscopy, we consider the linear regression setting where the many predictor variables arise from sampling an essentially continuous curve at equally spaced points, and where there may be multiple predictands. We tackle this regression problem by calculating the wavelet transforms of the discretized curves, and then applying a Bayesian variable selection method using mixture priors to the multivariate regression of predictands on wavelet coefficients. In an application to near infrared spectroscopy this approach is able to find regressions using small subsets of the wavelet coefficients that have good predictive performance.

### Problems In The Application Of Model Selection Methods To Survey Nonresponse

John L. Eltinge, Texas A&M University

In the analysis of complex survey data, nonresponse adjustments often are based on explicit or implicit models for response probabilities. In selection of a model for response probabilities, evaluation criteria depend on the principal purpose of the survey analysis. In some cases, analytic interest focuses on inference regarding population means or related parameters. Consequently, practical model selection issues center on the effect of the model in adjustment of survey weights; on the associated residual bias of the adjusted mean estimators; and on the resulting degradation of power curves for tests of these means. In other cases, analytic goals are somewhat broader, involving exploratory analysis of the distribution functions for specified continuous survey variables. Selection of an inadequate response probability model, and the resulting misspecification of survey weights, will tend to distort estimators of the distribution function of the survey variables. The magnitude of the resulting distortions can be assessed through quantile plots and associated offset-function plots. This talk will develop the abovementioned ideas, and will discuss some relationships with customary model-selection criteria. The proposed diagnostics are applied to data from the U.S. Third National Health and Nutrition Examination Survey (NHANES III).

### Exchangeability in Hierarchical Models

M. Elizabeth Halloran, Emory University

Exchangeability was defined by de Finetti as the assumption that the joint distribution of a collection of units or random quantities is invariant to random permutation of their indices. The assumption holds if we have no information to distinguish among the units or random quantities. We review the concept of exchangeability, its use in hierarchical models, and some of the controversies and pitfalls related to it. We illustrate issues of exchangeability and their consequences using examples from multicenter vaccine trials and independent vaccine trials conducted in different populations.

### Some Bootstrap Methods for Model Selection

J. Sunil Rao, Department of Biostatistics, Case Western Reserve University

We survey some recent uses of bootstrap resampling for the purpose of model selection. We show how the bootstrap can be used to not only select a best candidate model, but can also be used to evaluate a model selection procedure. These problems are attacked through the derivation of various bootstrap estimates of prediction error. Furthermore, judicious use of bootstrap training points leads to a selection procedure that mimics Bayesian approaches but requires less computation and fewer subjective choices. Other possibilities for using bootstrap model information will be discussed including model averaging and editing training sets for enhanced prediction accuracy. Some of the work discussed has been done jointly with Rob Tibshirani.

### "FITTING HIERARCHICAL MODELS"

Carl N. Morris, Statistics Department, Harvard University

Many statisticians and other data analysts now recognize when their data are best described as distributed according to several levels of variation. Thus, they choose and fit a multi-level (hierarchical) model. "Fit" has two meanings: 1. Choosing a fitting procedure that provides valid inferences for the assumed model; and 2. Choosing a model that reliably describes the data.

1. Finding an adequate fitting procedure, especially when just a few units are considered, is no easy task. In such cases asymptotic approximations (e.g. maximum likelihood) can produce significant biases and misleading inferences. Despite that, many statistical packages use such approximations for these models without warning. While fully Bayesian procedures can overcome some small-sample problems, their repeated-sampling properties depend on the prior (or reference) distribution assumed for the model's population parameters. The repeated-sampling properties of a procedure often are unknown and go unevaluated.

2. Fitting a model, i.e. finding appropriate assumptions, relies only partly on the data. Exchangeability in the second level of a hierarchical model may be the most crucial distributional assumption. Misspecifying exchangeability leads to poor inferences, even in large samples.

We review these and other issues, hoping to encourage the reliable use of hierarchical models in practice.

### Empirical Bayes Methods For The Variable Selection Problem

Edward George, University of Texas at Austin

For the problem of variable selection for the normal linear model, a wide variety of selection criteria including Cp, AIC, BIC and RIC are shown to correspond to a particular hierarchical Bayes formulation in the sense that model comparison via the criteria corresponds to model comparison via posterior probabilities under specific fixed hyperparameter values. Empirical Bayes methods which treat these hyperparameters as unknown are shown to yield selection criteria which share the asymptotic consistency of BIC, while using the data to adaptively improve performance across different problems. Both exact and approximate maximum marginal likelihood methods, and extensions to heavy-tailed error distributions are considered. For the problem of data compression and denoising with wavelets, these methods are seen to offer improved performance over several Bayes and classical estimators. (This is joint work Merlise Clyde of Duke University and Dean Foster of the University of Pennsylvania).

## Poster Session

### An Empirical Bayes Estimator of Seismic Events using Wavelet Packet Bases

Paul Gendron, Graduate Student, ECE Department Worcester Polytechnic Institute

Balgobin Nandram, Department of Mathematical Sciences, Worcester Polytechnic Institute

Wavelet estimation based on the discrete wavelet transform (DWT) has been shown to be effective in denoising functions. We propose an empirical Bayes estimator for adaptive wavelet packet representations based on a notion that improved sparsity over the DWT leads to more realistic prior assignments. The algorithm steps through the data in sections determining a most sparse bases. Adaptation results from the recursive estimation of sub-band noise variances. The algorithm is tested on synthetic seismic events and on real seismic data for its capacity for noise rejection and signal fidelity. These functions often exhibit localizations at unknown frequencies. We compare this algorithm with a similar discrete wavelet transform algorithm and show that the best basis choice paradigm of wavelet packets leads to increased performance for block recursivley denoising data.

### A Hierarchical Bayesian Model for Polychotomous Data from a Two-Stage Cluster Sample

Michael E. Schuckers and Hal S. Stern Iowa State University

We consider a hierarchical probability model for a two-stage cluster sample. We show how to address missing data concerns withing the hieararchical framework assuming that the values are missing at random. We compare our results to those from a design-baased frequentist approach using data from the 1990 Slovenian Public Opinion Survey. Finally, we consider extensions to multi-stage cluster samples.

### Approximations to the Bayes Factor in Model Selection problems and consistency issues

Nitai Mukhopadhyay, Department of Statistics, Purdue University

Stone(1979) showed that BIC can fail to be asymptotically consistent. Note, however, that BIC was developed as an asymptotic approximation to Bayes factors between models, and that the approximation is valid only under certain conditions. The counterexamples of Stone arise in situations in which BIC is not an adequate approximation. We develop some new approximations to Bayes factors, valid in considerably more general situations than is BIC, and discuss their consistency properties.

### Combining Data From Experiments That May be Similar

Richard Evans

Given data from L experiments or observational studies initially believed to be similar, it is desired to estimate the mean corresponding to an experiment or observational study, Ej, of particular interest. It is often profitable to use the data from related studies to sharpen the estimate corresponding to experiment Ej. However, it is essential that all of the data that are combined be concordant with the data from Ej. We improve the methodology first proposed by Malec and Sedransk (1992) which uses the observed data to determine the nature and amount of the pooling of the data. We do this by eliminating the need to specify a scale parameter, and by showing how the technique can accommodate unknown variance components. We analyze a data set from six clinical trials that studied the effect of using aspirin following a myocardial infarction. Our analysis is useful because it has a perspective that is different from the other analyses of these data that have been published. We show the efficacy of the method by (a) describing an asymptotic result about the posterior probability function associated with all partitions of the experiment means, u1, =85, uL, into subsets, and (b) carrying out a numerical investigation. The latter shows that our method provides sensible estimates, in contrast to some alternatives in common use, and exhibits the large gains in precision that are possible.

### Investigating the Determinants of Customer's Upgrade Decision: A Bayesian Analysis of Ordered Regression Model

Zhaohui Qin, University of Michigan, Ann Arbor

A full Bayesian procedure is proposed in this paper to study regression model with discrete responses. We modified the method proposed by Albert and Chib (1993) by allowing missing data in the covariate matrix. Gibbs sampler is applied, and latent variables are introduced to simplify sampling complicated full conditionals. We demonstrate this method in a research which studied how to analyze and understand customer satisfaction survey data in software industry. We identified software upgrade decisions by customer as the most important factor for software manufacturers. then we use the survey data to assess the relative importance of various determinants of this upgrade decisions. A unique two level hierarchical model is developed to do the empirical assessment. The importance of those determinants is derived since both the average impact as well as the variability in the impact of these determinants across the customer population can be assessed in our analysis.

### Flowgraph Models and Diabetic Retinopathy

C. L. Yau and Aparna V. Huzurbazar, Department of Mathematics and Statistics University of New Mexico

Multi-state stochastic models have been widely accepted for statistical analysis, especially in survival analysis. The focus has been on Markov modes. Generally, these assume independent exponential distributions for the waiting times. However, for most chronic diseases such as cancer, HIV/AIDS, leukemia, and diabetes, these assumptions are violated. In such cases, alternative methods are required. Flowgraph models, which relax the exponetial assumption, are one such alternative. Given a stochastic process with conditionally independent states, a flowgraph model allows for the use of virtually any waiting time distribution to model the different states. The resulting distribution of the process is a convolution or finite mixture distribution, which is obtained by solving the flowgraph model. These methods will be illustrated using the data of two hundred seventy-seven subjects with insulin-dependent (type I) diabetes mellitus (IDDM), which were collected at the Eye-Kidney Clinic of the Barbara Davis Center for Childhood Diabetes at the University of Colorado Health Sciences Center.

### Model uncertainty in factor analysis

Hedibert Freitas Lopes, Universidade Federal do Rio de Janeiro and Duke University

Mike West, Duke University

Bayesian inference in factor analytic models has received renewed attention in recent years, partly due to computational advances but also partly to applied focuses generating factor structures as exemplified by recent work in financial time series modeling. The focus of our current work is on exploring questions of uncertainty about the number of latent factors in a multivariate factor model, combined with methodological and computational issues of model specification and model fitting. We explore reversible jump MCMC methods that build on sets of parallel Gibbs sampling-based analyses to generate suitable empirical proposal distributions and that address the challenging problem of finding efficient proposals in high-dimensional models. Alternative MCMC methods based on bridge sampling are discussed, and these fully Bayesian MCMC approaches are compared with a collection of popular model selection methods in empirical studies. Various additional computational issues are discussed, and the methods explored in studies of some simulated data sets and an econometric time series example.

### Mixed models, variable selection, and outliers: modeling farm investment decisions

Chad Hart, Center for Agricultural and Rural Development, Iowa State University, Ames

Alicia Carriquiry, Department of Statistics, Iowa State University, Ames

We examine investment decisions in machinery and equipment for Iowa farms, during the 1991–1995 period. The investment regression model is a linear mixed model, with random effects representing farms and years, and is constructed by combining aspects from the accelerator and the neoclassical investment models. Other influential regressors are chosen stochastically, using Geweke's (1996) variable selection algorithm. Because of the presence of outliers, the residuals were modeled as a mixture of normal random variables with unknown parameters and mixing weights. We use the Individual Farm Analysis data set collected by the Iowa Farm Business Association. Data are available for approximately 600 farms over four years. Each record contains information on about 700 variables, including total farm resources and liabilities, asset depreciation, and crop and livestock revenues and expenses. The analyses were conducted within a Bayesian framework. Sensitivity analyses to the prior distributions and to other assumptions were carried out.

### Often neglected assumption in multiple imputation

Jae-kwang Kim, Iowa State University

Multiple Imputation proposed by Rubin is a simulation-based Bayesian method of compensating for missing values. The statistical properties of the estimators based on multiple imputation depend on the set of assumptions. We point out an often neglected assumption for the validity of multiple imputation and illustrate possible dangers when the assumption is not satisfied. A remedy will be also given.

### ESTIMATION OF THE EXTREME FLOW DISTRIBUTIONS BY STOCHASTIC MODELS

Yuanzhang Li, Allied Technology Group, Walter Reed Army Inst. of Research Washington, D.C.

K.M. Lal Saxena, Dept. of Math. and Statistics, University of Nebraska–Lincoln

Shuzheng Cong, National Weather Service

### Stability Analysis of Pharmacutical products in Multiple Batches:Preliminary test and shrinkage Approach

A.K.Md.Ehsanes Saleh,Carleton University,Ottawa

Pharmacutical products are routinely monitored for their stability over time.Stability studies generally consists of a random sample of dosage units from a batch or several batches placed in a storage room and periodically assayed on their drug content.The degration of the drug product is modeled, and according to the FDA,the shelf-life is calculated as the time -point at which the lower 95% confidence limit above the fitted regression line crosses the lowest acceptable limit of the drug content.When multiple batches are concerned ,preliminary test procedure is applied for testing the equality of the slopes of the fitted lines deciding to pool or not to pool the slopes.The shelf-life is then computed based on the resulting decision.In this talk we apply the shrikage estimation procedure for the slopes and determine the shelf-life based on confidence set using shrinkage estimators.Some display of data analysis is also presented to illustrate the procedure.

### Nonnparametric Empirical Bayes Estimation via Wavelets

Marianna Pensky, University of Central Florida

We consider a traditional nonparametric empirical Bayes setting with a parameter $\theta$ being a location or a scale parameter. EB estimators are constructed and are shown to provide adaptation to the unknown degree of smoothness of $g(\theta)$. Examples are given for familiar families of conditional distributions. The advantages of using wavelets are discussed.

### A Simulation Study of a Proposed Graphical Diagnostic for Assessing Goodness of Fit

Luiz A. Peternelli, Iowa State University

Carlos H. O. Silva, North Carolina State University

The question we address is how one can choose a final model from among suitable models being evaluated. Kaiser et al. (JASA, 1994) proposed a graphical goodness-of-fit diagnostic to address the question of whether a model provides an acceptable description of the data. The method is based on the probability integral transformation and on measures such as the Kolmogorov-Smirnov one sample goodness-of-fit test. Working under the Generalized Liner Models framework, we simulate data from lognormal and gamma distributions with the same mean and variance. For each distribution a correct and a misspecified model in terms of random component and link function are fitted. We evaluate the proposed graphical diagnostic tool as a guide for model selection, specifically when used along with the Kolmogorov-Smirnov one sample goodness-of-fit test measure as a guide to decision making.