Statuglifereg

SAS/STAT
9.2 Users Guide

The LIFEREG Procedure
(Book Excerpt)
SAS
Documentation
This document is an individual chapter from SAS/STAT
9.2 Users Guide.

The correct bibliographic citation for the complete manual is as follows: SAS Institute Inc. 2008. SAS/STAT
9.2
Users Guide. Cary, NC: SAS Institute Inc.
Copyright 2008, SAS Institute Inc., Cary, NC, USA
All rights reserved. Produced in the United States of America.
For a Web download or e-book: Your use of this publication shall be governed by the terms established by the vendor
at the time you acquire this publication.
U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related documentation
by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set forth in FAR 52.227-19,
Commercial Computer Software-Restricted Rights (June 1987).
SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513.
1st electronic book, March 2008
2nd electronic book, February 2009
SAS
Publishing provides a complete selection of books and electronic products to help customers use SAS software to
its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-copy books, visit the
SAS Publishing Web site at support.sas.com/publishing or call 1-800-727-3228.
SAS
and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute
Inc. in the USA and other countries. indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies.
Chapter 48
Contents
Overview: LIFEREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 2990
Getting Started: LIFEREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . 2993
Modeling Right-Censored Failure Time Data . . . . . . . . . . . . . . . . . 2993
Bayesian Analysis of Right-Censored Data . . . . . . . . . . . . . . . . . . 2997
Syntax: LIFEREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3003
PROC LIFEREG Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 3004
BAYES Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3005
BY Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3015
CLASS Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3015
INSET Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3016
MODEL Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3018
OUTPUT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3023
PROBPLOT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3025
WEIGHT Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3034
Details: LIFEREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3034
Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3034
Model Specication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3034
Computational Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3035
Supported Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3037
Predicted Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3041
Condence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3042
Fit Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3043
Probability Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3044
INEST= Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3048
OUTEST= Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3049
XDATA= Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3049
Computational Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 3050
Bayesian Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3051
Displayed Output for Classical Analysis . . . . . . . . . . . . . . . . . . . 3054
Displayed Output for Bayesian Analysis . . . . . . . . . . . . . . . . . . . 3056
ODS Table Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3058
ODS Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3060
Examples: LIFEREG Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . 3061
Example 48.1: Motorette Failure . . . . . . . . . . . . . . . . . . . . . . . 3061
2990 ! Chapter 48: The LIFEREG Procedure
Example 48.2: Computing Predicted Values for a Tobit Model . . . . . . . . 3067
Example 48.3: Overcoming Convergence Problems by Specifying Initial Values 3071
Example 48.4: Analysis of Arbitrarily Censored Data with Interaction Effects 3076
Example 48.5: Probability PlottingRight Censoring . . . . . . . . . . . . 3081
Example 48.6: Probability PlottingArbitrary Censoring . . . . . . . . . . 3083
Example 48.7: Bayesian Analysis of Clinical Trial Data . . . . . . . . . . . 3086
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3094
Overview: LIFEREG Procedure
The LIFEREG procedure ts parametric models to failure time data that can be uncensored, right
censored, left censored, or interval censored. The models for the response variable consist of a linear
effect composed of the covariates and a random disturbance term. The distribution of the random
disturbance can be taken from a class of distributions that includes the extreme value, normal,
logistic, and, by using a log transformation, the exponential, Weibull, lognormal, loglogistic, and
three-parameter gamma distributions.
The model assumed for the response y is
y = X o
where y is a vector of response values, often the log of the failure times, X is a matrix of covariates
or independent variables (usually including an intercept term), is a vector of unknown regression
parameters, o is an unknown scale parameter, and is a vector of errors assumed to come from a
known distribution (such as the standard normal distribution). If an offset variable O is specied,
the form of the model is y = X O o, where O is a vector of values of the offset variable
O. The distribution might also depend on additional shape parameters. These models are equivalent
to accelerated failure time models when the log of the response is the quantity being modeled. The
effect of the covariates in an accelerated failure time model is to change the scale, and not the
location, of a baseline distribution of failure times.
The LIFEREG procedure estimates the parameters by maximum likelihood with a Newton-Raphson
algorithm. PROC LIFEREG estimates the standard errors of the parameter estimates from the
inverse of the observed information matrix.
The accelerated failure time model assumes that the effect of independent variables on an event time
distribution is multiplicative on the event time. Usually, the scale function is exp(x
0
c
c
), where x
c
is the vector of covariate values (not including the intercept term) and
c
is a vector of unknown
parameters. Thus, if T
0
is an event time sampled from the baseline distribution corresponding to
values of zero for the covariates, then the accelerated failure time model species that, if the vector
of covariates is x
c
, the event time is T = exp(x
0
c
c
)T
0
. If . = log(T ) and .
0
= log(T
0
), then
. = x
0
c
c
.
0
This is a linear model with .
0
as the error term.
Overview: LIFEREG Procedure ! 2991
In terms of survival or exceedance probabilities, this model is
Pr(T > t [ x
c
) = Pr(T
0
> exp(x
0
c
c
)t )
The probability on the left-hand side of the equal sign is evaluated given the value x
c
for the covari-
ates, and the right-hand side is computed using the baseline probability distribution but at a scaled
value of the argument. The right-hand side of the equation represents the value of the baseline
survival function evaluated at exp(x
0
c
c
)t .
Models usually have an intercept parameter and a scale parameter. In terms of the original untrans-
formed event times, the effects of the intercept term and the scale term are to scale the event time
and to raise the event time to a power, respectively. That is, if
log(T
0
) = u o log(T
e
)
then
T
0
= exp(u)T
c
e
Although it is possible to t these models to the original response variable by using the NOLOG
option, it is more common to model the log of the response variable. Because of this log trans-
formation, zero values for the observed failure times are not allowed unless the NOLOG option is
specied. Similarly, small values for the observed failure times lead to large negative values for
the transformed response. The NOLOG option should be used only if you want to t a distribution
appropriate for the untransformed response, such as the extreme value instead of the Weibull. If
you specify the normal or logistic distributions, the responses are not log transformed; that is, the
NOLOG option is implicitly assumed.
Parameter estimates for the normal distribution are sensitive to large negative values, and care must
be taken that the tted model is not unduly inuenced by them. Large negative values for the normal
distribution can occur when tting the lognormal distribution by log transforming the response,
and some response values are near zero. Likewise, values that are extremely large after the log
transformation have a strong inuence in tting the Weibull distribution (that is, the extreme value
distribution for log responses). You should examine the residuals and check the effects of removing
observations with large residuals or extreme values of covariates on the model parameters. The
logistic distribution gives robust parameter estimates in the sense that the estimates have a bounded
inuence function.
The standard errors of the parameter estimates are computed from large sample normal approxi-
mations by using the observed information matrix. In small samples, these approximations might
be poor. Refer to Lawless (2003) for additional discussion and references. You can sometimes
construct better condence intervals by transforming the parameters. For example, large sample
theory is often more accurate for log(o) than o. Therefore, it might be more accurate to construct
condence intervals for log(o) and transform these into condence intervals for o. The parameter
estimates and their estimated covariance matrix are available in an output SAS data set and can be
used to construct additional tests or condence intervals for the parameters. Alternatively, tests of
parameters can be based on log-likelihood ratios. Refer to Cox and Oakes (1984) for a discussion
of the merits of some possible test methods including score, Wald, and likelihood ratio tests. Like-
lihood ratio tests are generally more reliable for small samples than tests based on the information
matrix.
The log-likelihood function is computed using the log of the failure time as a response. This log
likelihood differs from the log likelihood obtained using the failure time as the response by an ad-
ditive term of

log(t
i
), where the sum is over the uncensored failure times. This term does not
depend on the unknown parameters and does not affect parameter or standard error estimates. How-
ever, many published values of log likelihoods use the failure time as the basic response variable
and, hence, differ by the additive term from the value computed by the LIFEREG procedure.
The classic Tobit model also ts into this class of models but with data usually censored on the left.
The data considered by Tobin (1958) in his original paper came from a survey of consumers where
the response variable is the ratio of expenditures on durable goods to the total disposable income.
The two explanatory variables are the age of the head of household and the ratio of liquid assets
to total disposable income. Because many observations in this data set have a value of zero for the
response variable, the model t by Tobin is
y = max(x
0
c. 0)
which is a regression model with left censoring, where x
0
= (1. x
0
c
).
Bayesian analysis of parametric survival models can be requested by using the BAYES statement
in the LIFEREG procedure. In Bayesian analysis, the model parameters are treated as random vari-
ables, and inference about parameters is based on the posterior distribution of the parameters, given
the data. The posterior distribution is obtained using Bayes theorem as the likelihood function of
the data weighted with a prior distribution. The prior distribution enables you to incorporate knowl-
edge or experience of the likely range of values of the parameters of interest into the analysis. If
you have no prior knowledge of the parameter values, you can use a noninformative prior distribu-
tion, and the results of the Bayesian analysis will be very similar to a classical analysis based on
maximum likelihood. A closed form of the posterior distribution is often not feasible, and a Markov
chain Monte Carlo method by Gibbs sampling is used to simulate samples from the posterior dis-
tribution. See Chapter 7, Introduction to Bayesian Analysis Procedures, for an introduction to
the basic concepts of Bayesian statistics. Also see the section Bayesian Analysis: Advantages and
Disadvantages on page 149 for a discussion of the advantages and disadvantages of Bayesian anal-
ysis. Refer to Ibrahim, Chen, and Sinha (2001) and Gilks, Richardson, and Spiegelhalter (1996) for
more information about Bayesian analysis, including guidance in choosing prior distributions.
For Bayesian analysis, PROC LIFEREG generates a Gibbs chain for the posterior distribution of the
model parameters. Summary statistics (mean, standard deviation, quartiles, HPD and credible in-
tervals, correlation matrix) and convergence diagnostics (autocorrelations; Gelman-Rubin, Geweke,
Raftery-Lewis, and Heidelberger and Welch tests; and the effective sample size) are computed for
each parameter, as well as the correlation matrix of the posterior sample. Trace plots, posterior den-
sity plots, and autocorrelation function plots that are created using ODS Graphics are also provided
for each parameter.
The LIFEREG procedure now uses ODS Graphics to create graphs as part of its output. For general
information about ODS Graphics, see Chapter 21, Statistical Graphics Using ODS.
Getting Started: LIFEREG Procedure ! 2993
Getting Started: LIFEREG Procedure
The following examples demonstrate how you can use the LIFEREG procedure to t a parametric
model to failure time data.
Suppose you have a response variable y that represents failure time; a binary variable, censor, with
censor=0 indicating censored values; and two linearly independent variables, x1 and x2. The fol-
lowing statements perform a typical accelerated failure time model analysis. Higher-order effects
such as interactions and nested effects are allowed in the independent variables list, but they are not
shown in this example.
proc lifereg;
model y
*
censor(0) = x1 x2;
run;
PROC LIFEREG can t models to interval-censored data. The syntax for specifying interval-
censored data is as follows:
proc lifereg;
model (begin, end) = x1 x2;
run;
You can also model binomial data by using the events/trials syntax for the response, as illustrated
in the following statements:
proc lifereg;
model r/n=x1 x2;
run;
The variable n represents the number of trials, and the variable r represents the number of events.
Modeling Right-Censored Failure Time Data
The following example demonstrates how you can use the LIFEREG procedure to t a model to
right-censored failure time data.
Suppose you conduct a study of two headache pain relievers. You divide patients into two groups,
with each group receiving a different type of pain reliever. You record the time taken (in minutes)
for each patient to report headache relief. Because some of the patients never report relief for the
entire study, some of the observations are censored.
The following DATA step creates the SAS data set headache:
data Headache;
input Minutes Group Censor @@;
datalines;
11 1 0 12 1 0 19 1 0 19 1 0
19 1 0 19 1 0 21 1 0 20 1 0
21 1 0 21 1 0 20 1 0 21 1 0
20 1 0 21 1 0 25 1 0 27 1 0
30 1 0 21 1 1 24 1 1 14 2 0
16 2 0 16 2 0 21 2 0 21 2 0
23 2 0 23 2 0 23 2 0 23 2 0
25 2 1 23 2 0 24 2 0 24 2 0
26 2 1 32 2 1 30 2 1 30 2 0
32 2 1 20 2 1
;
The data set Headache contains the variable Minutes, which represents the reported time to headache
relief; the variable Group, the group to which the patient is assigned; and the variable Censor, a
binary variable indicating whether the observation is censored. Valid values of the variable Censor
are 0 (no) and 1 (yes). Figure 48.1 shows the rst ve records of the data set Headache.
Figure 48.1 Headache Data
Obs Minutes Group Censor
1 11 1 0
2 12 1 0
3 19 1 0
4 19 1 0
5 19 1 0
The following statements invoke the LIFEREG procedure:
proc lifereg data=Headache;
class Group;
model Minutes
*
Censor(1)=Group;
output out=New cdf=Prob;
run;
The CLASS statement species the variable Group as the classication variable. The MODEL state-
ment syntax indicates that the response variable Minutes is right censored when the variable Censor
takes the value 1. The MODEL statement species the variable Group as the single explanatory vari-
able. Because the MODEL statement does not specify the DISTRIBUTION= option, the LIFEREG
procedure ts the default type 1 extreme-value distribution by using log(Minutes) as the response.
This is equivalent to tting the Weibull distribution.
The OUTPUT statement creates the output data set New. In addition to containing the variables
in the original data set Headache, the SAS data set New also contains the variable Prob. This
new variable is created by the CDF= option to contain the estimates of the cumulative distribution
function evaluated at the observed response.
Modeling Right-Censored Failure Time Data ! 2995
The results of this analysis are displayed in the following gures.
Figure 48.2 Model Fitting Information from the LIFEREG Procedure
Model Information
Data Set WORK.HEADACHE
Dependent Variable Log(Minutes)
Censoring Variable Censor
Censoring Value(s) 1
Number of Observations 38
Noncensored Values 30
Right Censored Values 8
Left Censored Values 0
Interval Censored Values 0
Name of Distribution Weibull
Log Likelihood -9.37930239
Class Level Information
Name Levels Values
Group 2 1 2
Figure 48.2 displays the class level information and model tting information. There are 30 uncen-
sored observations and 8 right-censored observations. The log likelihood for the Weibull distribu-
tion is 9.3793. The log-likelihood value can be used to compare the goodness of t for different
models.
Figure 48.3 Model Parameter Estimates from the LIFEREG Procedure
Analysis of Maximum Likelihood Parameter Estimates
Standard 95% Confidence Chi-
Parameter DF Estimate Error Limits Square Pr > ChiSq
Intercept 1 3.3091 0.0589 3.1938 3.4245 3161.70 <.0001
Group 1 1 -0.1933 0.0786 -0.3473 -0.0393 6.05 0.0139
Group 2 0 0.0000 . . . . .
Scale 1 0.2122 0.0304 0.1603 0.2809
Weibull Shape 1 4.7128 0.6742 3.5604 6.2381
The table of parameter estimates is displayed in Figure 48.3. Both the intercept and the slope
parameter for the variable group are signicantly different from 0 at the 0.05 level. Because the
variable group has only one degree of freedom, parameter estimates are given for only one level of
the variable group (group=1). However, the estimate for the intercept parameter provides a baseline
for group=2.
The resulting model is as follows:
log(minutes) =
3.30911843 0.1933025 for group = 1

3.30911843 for group = 2
Note that the Weibull shape parameter for this model is the reciprocal of the extreme-value scale
parameter estimate shown in Figure 48.3 (10.21219 = 4.7128).
The following statements produce a graph of the cumulative distribution values versus the variable
Minutes.
proc sgplot data=New;
scatter x=Minutes y=Prob / group=Group;
discretelegend;
run;
Figure 48.4 displays the estimated cumulative distribution function values contained in the output
data set New for each group.
Figure 48.4 Plot of the Estimated Cumulative Distribution Function
Bayesian Analysis of Right-Censored Data ! 2997
Bayesian Analysis of Right-Censored Data
Nelson (1982) describes a study of the lifetimes of locomotive engine fans. This example shows
how to use PROC LIFEREG to carry out a Bayesian analysis of the engine fan data. In this example,
a lognormal distribution is used to model the engine lifetimes, but other survival time distributions,
such as the Weibull, can also be used.
The following SAS statements create the SAS data set Fan. This data set contains a censoring
indicator variable and right-censored survival times for the 70 locomotive engine fans in the study.
data Fan;
input Lifetime Censor@@;
datalines;
450 0 460 1 1150 0 1150 0 1560 1
1600 0 1660 1 1850 1 1850 1 1850 1
1850 1 1850 1 2030 1 2030 1 2030 1
2070 0 2070 0 2080 0 2200 1 3000 1
3000 1 3000 1 3000 1 3100 0 3200 1
3450 0 3750 1 3750 1 4150 1 4150 1
4150 1 4150 1 4300 1 4300 1 4300 1
4300 1 4600 0 4850 1 4850 1 4850 1
4850 1 5000 1 5000 1 5000 1 6100 1
6100 0 6100 1 6100 1 6300 1 6450 1
6450 1 6700 1 7450 1 7800 1 7800 1
8100 1 8100 1 8200 1 8500 1 8500 1
8500 1 8750 1 8750 0 8750 1 9400 1
9900 1 10100 1 10100 1 10100 1 11500 1
;
run;
Some of the fans had not failed at the time the data were collected, and the unfailed units have
right-censored lifetimes. The variable Lifetime represents either a failure time or a censoring time.
The variable Censor is equal to 0 if the value of Lifetime is a failure time, and it is equal to 1 if the
value is a censoring time.
The following SAS statements specify a Bayesian analysis that uses a lognormal model for the
engine lifetimes. There are no covariates, so the model is an intercept-only model. The OUT-
POST= option saves the samples from the posterior distribution in the SAS data set Post for further
processing.
ods graphics on;
proc lifereg data=Fan;
model Lifetime
*
Censor( 1 )= / dist=lognormal;
bayes seed=1 outpost=Post;
run;
ods graphics off;
The SEED= option is specied to maintain reproducibility; no other options are specied in the
BAYES statement. By default, a uniform prior distribution is assumed for the intercept coefcient.
The uniform prior is a at prior on the real line with a distribution that reects ignorance of the loca-
tion of the parameter, placing equal probability on all possible values the regression coefcient can
take. Using the uniform prior in the following example, you would expect the Bayesian estimates
to resemble the classical results of maximizing the likelihood. If you can elicit an informative prior
on the regression coefcients, you should use the COEFFPRIOR= option to specify it. A default
noninformative gamma prior is used for the lognormal scale parameter o.
You should make sure that the posterior distribution samples have achieved convergence before
using them for Bayesian inference. If you do not specify additional options, PROC LIFEREG pro-
duces by default three convergence diagnostics: autocorrelations of the posterior sample, effective
sample size, and the Geweke statistic. See the section Assessing Markov Chain Convergence
on page 156 for information about assessing the convergence of the chain of posterior samples.
Trace plots, posterior density plots, and autocorrelation function plots that are created using ODS
Graphics are also provided for each parameter. See the section Visual Analysis via Trace Plots
on page 156 for help in interpreting these plots.
The Analysis of Maximum Likelihood Parameter Estimates table in Figure 48.5 summarizes max-
imum likelihood estimates of the lognormal intercept and scale parameters.
Figure 48.5 Maximum Likelihood Estimates from the LIFEREG Procedure
Bayesian Analysis
Standard 95% Confidence
Parameter DF Estimate Error Limits
Intercept 1 10.1432 0.5211 9.1219 11.1646
Scale 1 1.6796 0.3893 1.0664 2.6453
Since no prior distribution for the intercept was specied, the default uniform improper distribution
shown in the Uniform Prior for Regression Coefcients table in Figure 48.6 is used.
Noninformative prior distributions are appropriate if you have no prior knowledge of the likely range
of values of the parameters, and if you want to make probability statements about the parameters
or functions of the parameters. Refer, for example, to Ibrahim, Chen, and Sinha (2001) for more
information about choosing prior distributions.
The default noninformative gamma prior distribution for the lognormal scale parameter is shown in
the Independent Prior Distributions for Model Parameters table in Figure 48.6.
Figure 48.6 Noninformative Prior Distributions
Bayesian Analysis
Uniform Prior for Regression Coefficients
Parameter Prior
Intercept Constant
Independent Prior Distributions for Model Parameters
Prior
Parameter Distribution Hyperparameters
Scale Gamma Shape 0.001 Inverse Scale 0.001
By default, posterior mode estimates of the model parameters are used as the starting value for the
simulation. These are listed in the Initial Values of the Chain table in Figure 48.7.
Figure 48.7 Markov Chain Initial Values
Initial Values of the Chain
Chain Seed Intercept Scale
1 1 10.0501 1.59544
Summary statistics for the posterior sample are displayed in the Fit Statistics, Descriptive Statis-
tics for the Posterior Sample, Interval Statistics for the Posterior Sample, and Posterior Correla-
tion Matrix tables in Figure 48.8. Since noninformative prior distributions were used, these results
are consistent with the maximum likelihood estimates shown in Figure 48.5.
Figure 48.8 Posterior Sample Summary Statistics
Fit Statistics
DIC (smaller is better) 87.245
pD (effective number of parameters) 1.823
Figure 48.8 continued
Bayesian Analysis
Posterior Summaries
Standard Percentiles
Parameter N Mean Deviation 25% 50% 75%
Intercept 10000 10.4196 0.6172 9.9670 10.3259 10.7959
Scale 10000 1.9196 0.4809 1.5675 1.8476 2.1931
Posterior Intervals
Parameter Alpha Equal-Tail Interval HPD Interval
Intercept 0.050 9.4477 11.8994 9.3216 11.6752
Scale 0.050 1.1906 3.0570 1.1104 2.8834
Posterior Correlation Matrix
Parameter Intercept Scale
Intercept 1.0000 0.8297
Scale 0.8297 1.0000
By default, PROC LIFEREG computes three convergence diagnostics: the lag1, lag5, lag10, and
lag50 autocorrelations; the Geweke diagnostic; and the effective sample size. These are displayed
in Figure 48.9. There is no indication that the Markov chain has not converged. See the section
Assessing Markov Chain Convergence on page 156 for more information about convergence
diagnostics and their interpretation.
Figure 48.9 Posterior Sample Summary Statistics
Bayesian Analysis
Posterior Autocorrelations
Parameter Lag 1 Lag 5 Lag 10 Lag 50
Intercept 0.6973 0.1765 0.0190 -0.0017
Scale 0.6955 0.1713 0.0172 -0.0002
Geweke Diagnostics
Parameter z Pr > |z|
Intercept -0.9183 0.3585
Scale -0.9233 0.3559
Figure 48.9 continued
Effective Sample Sizes
Correlation
Parameter ESS Time Efficiency
Intercept 1772.8 5.6408 0.1773
Scale 1805.0 5.5400 0.1805
Summary statistics of the posterior distribution samples are produced by default. However, these
statistics might not be sufcient for carrying out your Bayesian inference. The samples from the
posterior distribution saved in the SAS data set Post created with the OUTPOST= option can be
used for further analysis.
Trace, autocorrelation, and density plots for the three model parameters shown in Figure 48.10 and
Figure 48.11 are useful in diagnosing whether the Markov chain of posterior samples has converged.
These plots show no evidence that the chain has not converged. See the section Visual Analysis via
Trace Plots on page 156 for more information about interpreting these types of diagnostic plots.
Figure 48.10 Diagnostic Plots
Figure 48.11 Diagnostic Plots
The fraction failing in the rst 8000 hours of operation might be a quantity of interest. This kind
of information could be useful, for example, in determining whether to improve the reliability of
the engine components due to warranty considerations. The following SAS statements compute
the mean and percentiles of the distribution of the fraction failing in the rst 8000 hours from the
posterior sample data set Post:
data Prob;
set Post;
Frac = ProbNorm(( log(8000) - Intercept ) / Scale );
label Frac= Fraction Failing in 8000 Hours;
run;
proc means data = Prob(keep=Frac) n mean p10 p25 p50 p75 p90;
run;
The mean fraction of failures in the rst 8000 hours, shown in Figure 48.12, is about 0.24, which
could be used in further analysis of warranty costs. The 10th percentile is about 0.16 and the 90th
percentile is about 0.32, which gives an assessment of the probable range of the fraction failing in
the rst 8000 hours.
Syntax: LIFEREG Procedure ! 3003
Figure 48.12 Fraction Failing in 8000 Hours
The MEANS Procedure
Analysis Variable : Frac Fraction Failing in 8000 Hours
N Mean 10th Pctl 25th Pctl 50th Pctl 75th Pctl
--------------------------------------------------------------------------------
10000 0.2381467 0.1628591 0.1953691 0.2336756 0.2766051
--------------------------------------------------------------------------------
Analysis Variable : Frac Fraction Failing in 8000 Hours
90th Pctl
------------
0.3190883
------------
Syntax: LIFEREG Procedure
The following statements are available in PROC LIFEREG:
PROC LIFEREG <options > ;
BY variables ;
CLASS variables ;
INSET <keyword-list > </ options > ;
MODEL response=<effects > </ options > ;
OUTPUT <OUT=SAS-data-set > <keyword=name . . . keyword=name > <options > ;
PROBPLOT </ options > ;
WEIGHT variable ;
The PROC LIFEREG statement invokes the procedure. The MODEL statement is required and
species the variables used in the regression part of the model as well as the distribution used for
the error, or random, component of the model. Only a single MODEL statement can be used with
one invocation of the LIFEREG procedure. If multiple MODEL statements are present, only the
last is used. Main effects and interaction terms can be specied in the MODEL statement, as in
the GLM procedure. Initial values can be specied in the MODEL statement or in an INEST=
data set. If no initial values are specied, the starting estimates are obtained by ordinary least
squares. The CLASS statement determines which explanatory variables are treated as categorical.
The WEIGHT statement identies a variable with values that are used to weight the observations.
Observations with zero or negative weights are not used to t the model, although predicted values
can be computed for them. The OUTPUT statement creates an output data set containing predicted
values and residuals.
PROC LIFEREG Statement
PROC LIFEREG <options > ;
The PROC LIFEREG statement invokes the procedure. You can specify the following options in
the PROC LIFEREG statement.
COVOUT
writes the estimated covariance matrix to the OUTEST= data set if convergence is attained.
DATA=SAS-data-set
species the input SAS data set used by PROC LIFEREG. By default, the most recently
created SAS data set is used.
GOUT=graphics-catalog
species a graphics catalog in which to save graphics output.
INEST=SAS-data-set
species an input SAS data set that contains initial estimates for all the parameters in the
model. See the section INEST= Data Set on page 3048 for a detailed description of the
contents of the INEST= data set.
NAMELEN=n
species the length of effect names in tables and output data sets to be n characters, where n
is a value between 20 and 200. The default length is 20 characters.
NOPRINT
suppresses the display of the output. Note that this option temporarily disables the Output
Delivery System (ODS). For more information, see Chapter 20, Using the Output Delivery
System.
ORDER=DATA | FORMATTED | FREQ | INTERNAL
species the sorting order for the levels of the classication variables (specied in the CLASS
statement). This ordering determines which parameters in the model correspond to each level
in the data. The following table illustrates how PROC LIFEREG interprets values of the
ORDER= option.
Value of ORDER= Levels Sorted By
DATA order of appearance in the input data set
FORMATTED formatted value
FREQ descending frequency count; levels with the
most observations come rst in the order
INTERNAL unformatted value
By default, ORDER=FORMATTED. For FORMATTED and INTERNAL, the sort order is
machine dependent. For more information about sorting order, see the chapter on the SORT
procedure in the Base SAS Procedures Guide, and the discussion of BY-group processing in
SAS Language Reference: Concepts.
BAYES Statement ! 3005
OUTEST=SAS-data-set
species an output SAS data set containing the parameter estimates, the maximized log like-
lihood, and, if the COVOUT option is specied, the estimated covariance matrix. See the
section OUTEST= Data Set on page 3049 for a detailed description of the contents of the
OUTEST= data set.
PLOTS=NONE | PROBPLOT
species the following graphics options:
NONE suppresses any graphics specied in other LIFEREG statements,
such as the BAYES or PROBPLOT statement.
PROBPLOT creates a default probability plot based on information in the
MODEL statement. If a PROBPLOT option is also specied, the
probability plot specied in the PROBPLOT statement is created,
and this option is ignored.
XDATA=SAS-data-set
species an input SAS data set that contains values for all the independent variables in the
MODEL statement and variables in the CLASS statement for probability plotting. If there
are covariates specied in a MODEL statement and a probability plot is requested with a
PROBPLOT statement, you specify xed values for the effects in the MODEL statement
with the XDATA= data set. See the section XDATA= Data Set on page 3049 for a detailed
description of the contents of the XDATA= data set.
BAYES Statement
BAYES <options > ;
The BAYES statement requests a Bayesian analysis of the regression model by using Gibbs sam-
pling. The Bayesian posterior samples (also known as the chain) for the regression parameters are
not tabulated. The Bayesian posterior samples (also known as the chain) for the model parameters
can be output to a SAS data set.
Table 48.1 summarizes the options available in the BAYES statement.
Table 48.1 BAYES Statement Options
Option Description
Monte Carlo Options
INITIAL= species initial values of the chain
INITIALMLE species that maximum likelihood estimates be used as
initial values of the chain
METROPOLIS= species the use of a Metropolis step
NBI= species the number of burn-in iterations
NMC= species the number of iterations after burn-in
SEED= species the random number generator seed
THINNING= controls the thinning of the Markov chain
Table 48.1 (continued)
Option Description
Model and Prior Options
COEFFPRIOR= species the prior of the regression coefcients
EXPONENTIALSCALEPRIOR= species the prior of the exponential scale parameter
GAMMASHAPEPRIOR= species the prior of the three-parameter gamma shape pa-
rameter
SCALEPRIOR= species the prior of the scale parameter
WEIBULLSCALEPRIOR= species the prior of the Weibull scale parameter
WEIBULLSHAPEPRIOR= species the prior of the Weibull shape parameter
Summary Statistics and Convergence Diagnostics
DIAGNOSTICS= displays convergence diagnostics
PLOTS= displays diagnostic plots
STATISTICS= displays summary statistics of the posterior samples
Posterior Samples
OUTPOST= names a SAS data set for the posterior samples
The following list describes these options and their suboptions.
COEFFPRIOR=UNIFORM | NORMAL <(normal-options) >
CPRIOR=UNIFORM | NORMAL <(option) >
COEFF=UNIFORM | NORMAL <(option) >
species the prior distribution for the regression coefcients. The default is
COEFFPRIOR=UNIFORM. The available prior distributions are as follows:
NORMAL<(normal-option) >
species a normal distribution. The normal-options include the following:
CONDITIONAL
species that the normal prior, conditional on the current Markov chain value of
the location-scale model precision parameter t =
1
c
2
, is N(. t
1
), where
and are the mean and covariance of the normal prior specied by other normal
options.
INPUT= SAS-data-set
species a SAS data set that contains the mean and covariance information of the
normal prior. The data set must have a _TYPE_ variable to represent the type of
each observation and a variable for each regression coefcient. If the data set also
contains a _NAME_ variable, the values of this variable are used to identify the
covariances for the _TYPE_=COV observations; otherwise, the _TYPE_=COV
observations are assumed to be in the same order as the explanatory variables in
the MODEL statement. PROC LIFEREG reads the mean vector from the ob-
servation with _TYPE_=MEAN and reads the covariance matrix from observa-
tions with _TYPE_=COV. For an independent normal prior, the variances can
be specied with _TYPE_=VAR; alternatively, the precisions (inverse of the
variances) can be specied with _TYPE_=PRECISION.
RELVAR<=c >
species the normal prior N(0. cJ), where J is a diagonal matrix with diagonal
elements equal to the variances of the corresponding ML estimator. By default,
c = 10
6
.
VAR<=c>
species the normal prior N(0. cI), where I is the identity matrix.
If you do not specify an option, the normal prior N(0. 10
6
I), where I is the identity
matrix, is used. See the section Normal Prior on page 3052 for more details.
UNIFORM
species a at priorthat is, the prior that is proportional to a constant
(](
1
. . . . .
k
) 1 for all o<
i
< o).
DIAGNOSTICS=ALL | NONE | (keyword-list)
DIAG=ALL | NONE | (keyword-list)
controls the number of diagnostics produced. You can request all the following diagnostics
by specifying DIAGNOSTICS=ALL. If you do not want any of these diagnostics, specify
DIAGNOSTICS=NONE. If you want some but not all of the diagnostics, or if you want to
change certain settings of these diagnostics, specify a subset of the following keywords. The
default is DIAGNOSTICS=(AUTOCORR ESS GEWEKE).
AUTOCORR <(LAGS= numeric-list ) >
computes the autocorrelations of lags given by LAGS= list for each parameter. El-
ements in the list are truncated to integers and repeated values are removed. If the
LAGS= option is not specied, autocorrelations of lags 1, 5, 10, and 50 are computed
for each variable. See the section Autocorrelations on page 169 for details.
ESS
computes Carlins estimate of the effective sample size, the correlation time, and the
efciency of the chain for each parameter. See the section Effective Sample Size on
page 169 for details.
MCERROR
computes an estimate of the Monte Carlo standard error for each parameter. See the
section Standard Error of the Mean Estimate on page 170 for details.
HEIDELBERGER <(heidel-options) >
computes the Heidelberger and Welch diagnostic for each variable, which consists of a
stationarity test of the null hypothesis that the sample values form a stationary process.
If the stationarity test is not rejected, a halfwidth test is then carried out. Optionally,
you can specify one or more of the following heidel-options:
SALPHA=value
species the level (0 < < 1) for the stationarity test.
HALPHA=value
species the level (0 < < 1) for the halfwidth test.
EPS=value
species a positive number c such that if the halfwidth is less than c times the
sample mean of the retained iterates, the halfwidth test is passed.
See the section Heidelberger and Welch Diagnostics on page 165 for details.
GELMAN <(gelman-options) >
computes the Gelman and Rubin convergence diagnostics. You can specify one or more
of the following gelman-options:
NCHAIN=number
N=number
species the number of parallel chains used to compute the diagnostic, and must
be 2 or larger. The default is NCHAIN=3. If an INITIAL= data set is used,
NCHAINdefaults to the number of rows in the INITIAL= data set. If any number
other than this is specied with the NCHAIN= option, the NCHAIN= value is
ignored.
ALPHA=value
species the signicance level for the upper bound. The default is ALPHA=0.05,
resulting in a 97.5% bound.
See the section Gelman and Rubin Diagnostics on page 161 for details.
GEWEKE <(geweke-options) >
computes the Geweke spectral density diagnostics, which are essentially a two-sample
t test between the rst }
1
portion and the last }
2
portion of the chain. The default is
}
1
= 0.1 and }
2
= 0.5, but you can choose other fractions by using the following
geweke-options:
FRAC1=value
species the fraction }
1
for the rst window.
FRAC2=value
species the fraction }
2
for the second window.
See the section Geweke Diagnostics on page 163 for details.
RAFTERY<(raftery-options) >
computes the Raftery and Lewis diagnostics that evaluate the accuracy of the estimated
quantile (
0
O
for a given O (0. 1)) of a chain.

0
O
can achieve any degree of accuracy
when the chain is allowed to run for a long time. A stopping criterion is when the
estimated probability

1
O
= Pr(0 _

0
O
) reaches within 1 of the value O with
probability S; that is, Pr(O1 _

1
O
_ O1) = S. The following raftery-options
enable you to specify O. 1. S, and a precision level c for the test:
QUANTILE | Q=value
species the order (a value between 0 and 1) of the quantile of interest. The
default is 0.025.
ACCURACY | R=value
species a small positive number as the margin of error for measuring the accu-
racy of estimation of the quantile. The default is 0.005.
PROBABILITY | S=value
species the probability of attaining the accuracy of the estimation of the quan-
tile. The default is 0.95.
EPSILON | EPS=value
species the tolerance level (a small positive number) for the stationary test. The
default is 0.001.
See the section Raftery and Lewis Diagnostics on page 166 for details.
EXPSCALEPRIOR=GAMMA<(options) > | IMPROPER
ESCALEPRIOR=GAMMA<(options) > | IMPROPER
ESCPRIOR=GAMMA<(options) > | IMPROPER
species that Gibbs sampling be performed on the exponential distribution scale parameter
and the prior distribution for the scale parameter. This prior distribution applies only when
the exponential distribution and no covariates are specied.
A gamma prior G(a. b) with density }(t ) =
b(bt)
a1
e
bt
I(o)
is specied by EXP-
SCALEPRIOR=GAMMA, which can be followed by one of the following gamma-options
enclosed in parentheses. The hyperparameters a and b are the shape and inverse-scale param-
eters of the gamma distribution, respectively. See the section Gamma Prior on page 3052
for more details. The default is G(10
4
. 10
4
).
RELSHAPE<=c >
species independent G(c . c) distribution, where is the MLE of the exponential
scale parameter. With this choice of hyperparameters, the mean of the prior distribution
is and the variance is
O
c
2
. By default, c=10
4
.
SHAPE=a
and
ISCALE=b
specify the G(a. b) prior.
SHAPE=c
species the G(c. c) prior.
ISCALE=c
An improper prior with density }(t ) proportional to t
1
is specied with EXP-
SCALEPRIOR=IMPROPER.
GAMMASHAPEPRIOR=NORMAL<(options) >
GAMASHAPEPRIOR=NORMAL<(options) >
SHAPE1PRIOR=NORMAL<(options) >
species the prior distribution for the gamma distribution shape parameter. If you do not
specify any options in a gamma model, the N(0. 10
6
) prior for the shape is used. You can
specify MEAN= and VAR= or RELVAR= options, either alone or together, to specify the
mean and variance of the normal prior for the gamma shape parameter.
MEAN=a
species a normal prior N(a. 10
6
). By default, a=0.
RELVAR<=b >
species the normal prior N(0. bJ), where J is the variance of the MLE of the shape
parameter. By default, b=10
6
.
VAR=c
species the normal prior N(0. c). By default, c=10
6
.
INITIAL=SAS-data-set
species the SAS data set that contains the initial values of the Markov chains. The INITIAL=
data set must contain all the variables of the model. You can specify multiple rows as the
initial values of the parallel chains for the Gelman-Rubin statistics, but posterior summaries,
diagnostics, and plots are computed only for the rst chain. If the data set also contains the
variable _SEED_, the value of the _SEED_ variable is used as the seed of the random number
generator for the corresponding chain.
INITIALMLE
species that maximum likelihood estimates of the model parameters be used as initial values
of the Markov chain. If this option is not specied, estimates of the mode of the posterior
distribution obtained by optimization are used as initial values.
METROPOLIS=YES
METROPOLIS=NO
species the use of a Metropolis step to generate Gibbs samples for posterior distributions
that are not log concave. The default value is METROPOLIS=YES.
NBI=number
species the number of burn-in iterations before the chains are saved. The default is 2000.
NMC=number
species the number of iterations after the burn-in. The default is 10000.
OUTPOST=SAS-data-set
OUT=SAS-data-set
names the SAS data set that contains the posterior samples. See the section OUTPOST=
Output Data Set on page 3054 for more information. Alternatively, you can create the output
data set by specifying an ODS OUTPUT statement as follows:
ODS OUTPUT PosteriorSample = SAS-data-set ;
PLOTS<(global-plot-options) >= plot-request
PLOTS<(global-plot-options) >= (plot-request < . . . plot-request >)
controls the display of diagnostic plots. Three types of plots can be requested: trace plots,
autocorrelation function plots, and kernel density plots. By default, the plots are displayed
in panels unless the global plot option UNPACK is specied. Also, when specifying more
than one type of plots, the plots are displayed by parameters unless the global plot option
GROUPBY is specied. When you specify only one plot request, you can omit the parenthe-
ses around the plot request. For example:
plots=none
plots(unpack)=trace
plots=(trace autocorr)
You must enable ODS Graphics before requesting plots. For example, the following SAS
statements enable ODS Graphics:
ods graphics on;
proc lifereg;
model y=x;
bayes plots=trace;
run;
end;
ods graphics off;
The global plot options are as follows:
FRINGE
creates a fringe plot on the X axis of the density plot.
GROUPBY=PARAMETER
GROUPBY=TYPE
species how the plots are grouped when there is more than one type of plot.
GROUPBY=TYPE
species that the plots be grouped by type.
GROUPBY=PARAMETER
species that the plots be grouped by parameter.
GROUPBY=PARAMETER is the default.
LAGS=n
species that autocorrelations be plotted up to lag n. If this option is not specied,
autocorrelations are plotted up to lag 50.
SMOOTH
displays a tted penalized B-spline curve for each trace plot.
UNPACKPANEL
UNPACK
species that all paneled plots be unpacked, meaning that each plot in a panel is dis-
played separately.
The plot requests include the following:
ALL
species all types of plots. PLOTS=ALL is equivalent to specifying PLOTS=(TRACE
AUTOCORR DENSITY).
AUTOCORR
displays the autocorrelation function plots for the parameters.
DENSITY
displays the kernel density plots for the parameters.
NONE
suppresses all diagnostic plots.
TRACE
displays the trace plots for the parameters. See the section Visual Analysis via Trace
Plots on page 156 for details.
SCALEPRIOR=GAMMA<(options) >
species that Gibbs sampling be performed on the location-scale model scale parameter and
the prior distribution for the scale parameter.
b(bt)
a1
e
bt
I(o)
is specied by
SCALEPRIOR=GAMMA, which can be followed by one of the following gamma-options
for details. The default is G(10
4
. 10
4
).
RELSHAPE<=c >
species independent G(c o. c) distribution, where o is the MLE of the scale parameter.
With this choice of hyperparameters, the mean of the prior distribution is o and the
variance is
O c
c
. By default, c=10
4
.
SHAPE=a
and
ISCALE=b
SHAPE=c
ISCALE=c
SEED=number
species an integer seed in the range 1 to 2
31
1 for the random number generator in the
simulation. Specifying a seed enables you to reproduce identical Markov chains for the same
specication. If the SEED= option is not specied, or if you specify a nonpositive seed, a
random seed is derived from the time of day.
STATISTICS <(global-options) > = ALL | NONE | keyword | (keyword-list)
STATS <(global-statoptions) > = ALL | NONE | keyword | (keyword-list)
controls the number of posterior statistics produced. Specifying STATISTICS=ALL is equiv-
alent to specifying STATISTICS= (SUMMARY INTERVAL COV CORR). If you do not
want any posterior statistics, you specify STATISTICS=NONE. The default is STATIS-
TICS=(SUMMARY INTERVAL). See the section Summary Statistics on page 170 for
details. The global-options include the following:
ALPHA=numeric-list
controls the probabilities of the credible intervals. The ALPHA= values must be be-
tween 0 and 1. Each ALPHA= value produces a pair of 100(1ALPHA)% equal-tail
and HPD intervals for each parameters. The default is ALPHA=0.05, which yields the
95% credible intervals for each parameter.
PERCENT=numeric-list
requests the percentile points of the posterior samples. The PERCENT= values must
be between 0 and 100. The default is PERCENT=25, 50, 75, which yields the 25th,
50th, and 75th percentile points, respectively, for each parameter.
The list of keywords includes the following:
CORR
produces the posterior correlation matrix.
COV
produces the posterior covariance matrix.
SUMMARY
produces the means, standard deviations, and percentile points for the posterior sam-
ples. The default is to produce the 25th, 50th, and 75th percentile points, but you can
use the global PERCENT= option to request specic percentile points.
INTERVAL
produces equal-tail credible intervals and HPD intervals. The defult is to produce the
95% equal-tail credible intervals and 95% HPD intervals, but you can use the global
ALPHA= option to request intervals of any probabilities.
NONE
suppresses printing all summary statistics.
THINNING=number
THIN=number
controls the thinning of the Markov chain. Only one in every k samples is used when
THINNING=k, and if NBI=n
0
and NMC=n, the number of samples kept is
n
0
n
k
n
0
k
where [a] represents the integer part of the number a. The default is THINNING=1.
WEIBULLSCALEPRIOR=GAMMA<(options) >
WSCALEPRIOR=GAMMA<(options) >
WSCPRIOR=GAMMA<(options) >
species that Gibbs sampling be performed on the Weibull model scale parameter and the
prior distribution for the scale parameter. This option applies only when a Weibull distribution
and no covariates are specied. When this option is specied, PROC LIFEREG performs
Gibbs sampling on the Weibull scale parameter, which is dened as exp(u), where u is the
intercept term.
A gamma prior G(a. b) is specied by WEIBULLSCALEPRIOR=GAMMA, which can be
followed by one of the following gamma-options enclosed in parentheses. The gamma proba-
bility density is given by g(t ) =
b(bt)
a1
e
bt
I(o)
. The hyperparameters a and b are the shape and
inverse-scale parameters of the gamma distribution, respectively. See the section Gamma
Prior on page 3052 for details about the gamma prior. The default is G(10
4
. 10
4
).
RELSHAPE<=c >
species independent G(c . c) distribution, where is the MLE of the Weibull scale
parameter. With this choice of hyperparameters, the mean of the prior distribution is
and the variance is
O
c
. By default, c=10
4
.
SHAPE=a
and
ISCALE=b
SHAPE=c
ISCALE=c
WEIBULLSHAPEPRIOR=GAMMA<(options) >
WSHAPEPRIOR=GAMMA<(options) >
WSHPRIOR=GAMMA<(options) >
species that Gibbs sampling be performed on the Weibull model shape parameter and the
prior distribution for the shape parameter. When this option is specied, PROC LIFEREG
performs Gibbs sampling on the Weibull shape parameter, which is dened as o
1
, where o
is the location-scale model scale parameter.
b(bt)
a1
e
bt
I(o)
is specied by WEIBULL-
SHAPEPRIOR=GAMMA, which can be followed by one of the following gamma-options
for details about the gamma prior. The default is G(10
4
. 10
4
).
RELSHAPE<=c >
species independent G(c

. c) distribution, where

is the MLE of the Weibull shape
BY Statement ! 3015
parameter. With this choice of hyperparameters, the mean of the prior distribution is

and the variance is

O
c
. By default, c=10
4
.
SHAPE<=a >
and
ISCALE=b
SHAPE=c
ISCALE=c
BY Statement
BY variables ;
You can specify a BY statement with PROC LIFEREG to obtain separate analyses on observations
in groups dened by the BY variables. When a BY statement appears, the procedure expects the
input data set to be sorted in order of the BY variables.
If your input data set is not sorted in ascending order, use one of the following alternatives:
v Sort the data by using the SORT procedure with a similar BY statement.
v Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for
the LIFEREG procedure. The NOTSORTED option does not mean that the data are unsorted
but rather that the data are arranged in groups (according to values of the BY variables) and
that these groups are not necessarily in alphabetical or increasing numeric order.
v Create an index on the BY variables by using the DATASETS procedure.
For more information about the BY statement, see SAS Language Reference: Concepts. For more
information about the DATASETS procedure, see the Base SAS Procedures Guide.
CLASS Statement
CLASS variables ;
Variables that are classication variables rather than quantitative numeric variables must be listed
in the CLASS statement. For each explanatory variable listed in the CLASS statement, indicator
variables are generated for the levels assumed by the CLASS variable. If the CLASS statement is
used, it must appear before the MODEL statement.
INSET Statement
INSET <keyword-list > </ options > ;
The box or table of summary information produced on plots made with the PROBPLOT statement
is called an inset. You can use the INSET statement to customize the information that is displayed
in the inset box as well as to customize the appearance of the inset box. To supply the information
that is displayed in the inset box, you specify keywords corresponding to the information that you
want shown. For example, the following statements produce a probability plot with the number
of observations, the number of right-censored observations, the name of the distribution, and the
estimated Weibull shape parameter in the inset:
proc lifereg data=epidemic;
model life = dose / dist = Weibull;
probplot ;
inset nobs right dist shape;
run;
By default, inset entries are identied with appropriate labels. However, you can provide a cus-
tomized label by specifying the keyword for that entry followed by the equal sign (=) and the label
in quotes. For example, the following INSET statement produces an inset containing the number
of observations and the name of the distribution, labeled Sample Size and Distribution in the
inset:
inset nobs=Sample Size dist=Distribution;
If you specify a keyword that does not apply to the plot you are creating, then the keyword is
ignored.
If you specify more than one INSET statement, only the rst one is used.
Table 48.2 lists keywords available in the INSET statement to display summary statistics, distribu-
tion parameters, and distribution tting information.
INSET Statement ! 3017
Table 48.2 INSET Statement Keywords
Keyword Description
CONFIDENCE condence coefcient for all condence intervals
DIST name of the distribution
INTERVAL number of interval-censored observations
LEFT number of left-censored observations
NOBS number of observations
NMISS number of observations with missing values
RIGHT number of right-censored observations
SCALE value of the scale parameter
SHAPE value of the shape parameter
UNCENSORED number of uncensored observations
The following options control the appearance of the box when you use traditional graphics. These
options are not available if ODS Graphics is enabled. All options are specied after the slash (/) in
the INSET statement.
CFILL=color
species the color for the lling box.
CFILLH=color
species the color for the lling box header.
CFRAME=color
species the color for the frame.
CHEADER=color
species the color for text in the header.
CTEXT=color
species the color for the text.
FONT=font
species the software font for the text.
HEIGHT=value
species the height of the text.
HEADER=quoted string
species the text for the header or box title.
NOFRAME
omits the frame around the box.
POS=value <DATA | PERCENT>
determines the position of the inset. The value can be a compass point (N, NE, E, SE, S,
SW, W, NW) or a pair of coordinates (x, y) enclosed in parentheses. The coordinates can be
specied in screen percentage units or axis data units. The default is screen percentage units.
REFPOINT=name
species the reference point for an inset that is positioned by a pair of coordinates with the
POS= option. You use the REFPOINT= option in conjunction with the POS= coordinates.
The REFPOINT= option species which corner of the inset frame you have specied with
coordinates (x, y), and it can take the value of BR (bottom right), BL (bottom left), TR (top
right), or TL (top left). The default is REFPOINT=BL. If the inset position is specied as a
compass point, then the REFPOINT= option is ignored.
MODEL Statement
<label:> MODEL response<*censor(list) >=effects </ options > ;
<label:> MODEL (lower,upper)=effects </ options > ;
<label:> MODEL events/trials=effects </ options > ;
Only a single MODEL statement can be used with one invocation of the LIFEREG procedure. If
multiple MODEL statements are present, only the last is used. The optional label is used to label
the model estimates in the output SAS data set and OUTEST= data set.
The rst MODEL syntax is appropriate for right censoring. The variable response is possibly right
censored. If the response variable can be right censored, then a second variable, denoted censor,
must appear after the response variable with a list of parenthesized values, separated by commas or
blanks, to indicate censoring. That is, if the censor variable takes on a value given in the list, the
response is a right-censored value; otherwise, it is an observed value.
The second MODEL syntax species two variables, lower and upper, that contain values of the
endpoints of the censoring interval. If the two values are the same (and not missing), it is assumed
that there is no censoring and the actual response value is observed. If the lower value is missing,
then the upper value is used as a left-censored value. If the upper value is missing, then the lower
value is taken as a right-censored value. If both values are present and the lower value is less than the
upper value, it is assumed that the values specify a censoring interval. If the lower value is greater
than the upper value or both values are missing, then the observation is not used in the analysis,
although predicted values can still be obtained if none of the covariates are missing. The following
table summarizes the ways of specifying censoring.
MODEL Statement ! 3019
lower upper Comparison Interpretation
not missing not missing equal no censoring
not missing not missing lower < upper censoring interval
missing not missing upper used as left-
censoring value
not missing missing lower used as right-
censoring value
not missing not missing lower > upper observation not used
missing missing observation not used
The third MODEL syntax species two variables that contain count data for a binary response. The
value of the rst variable, events, is the number of successes. The value of the second variable,
trials, is the number of tries. The values of both events and (trials-events) must be nonnegative, and
trials must be positive for the response to be valid. The values of the two variables do not need to
be integers and are not modied to be integers.
The effects following the equal sign are the covariates in the model. Higher-order effects, such as
interactions and nested terms, are allowed in the list, similar to the GLM procedure. Variable names
and combinations of variable names representing higher-order terms are allowed to appear in this
list. Classication, or CLASS, variables can be used as effects, and indicator variables are generated
for the class levels. If you do not specify any covariates following the equal sign, an intercept-only
model is t.
Examples of three valid MODEL statements follow:
a: model time
*
flag(1,3)=temp;
b: model (start, finish)=;
c: model r/n=dose;
MODEL statement a indicates that the response is contained in a variable named time and that, if the
variable ag takes on the values 1 or 3, the observation is right censored. The explanatory variable
is temp, which could be a CLASS variable. MODEL statement b indicates that the response is
known to be in the interval between the values of the variables start and nish and that there are no
covariates except for a default intercept term. MODEL statement c indicates a binary response, with
the variable r containing the number of responses and the variable n containing the number of trials.
The following options can appear in the MODEL statement.
Task Option
Model specication
set the signicance level ALPHA=
specify distribution type for failure time DISTRIBUTION=
request no log transformation of response NOLOG
initial estimate for intercept term INTERCEPT=
hold intercept term xed NOINT
initial estimates for regression parameters INITIAL=
initialize scale parameter SCALE=
hold scale parameter xed NOSCALE
initialize rst shape parameter SHAPE1=
hold rst shape parameter xed NOSHAPE1
Model tting
set convergence criterion CONVERGE=
set maximum iterations MAXITER=
set tolerance for testing singularity SINGULAR=
Output
display estimated correlation matrix CORRB
display estimated covariance matrix COVB
display iteration history, nal gradient, ITPRINT
and second derivative matrix
ALPHA=value
sets the signicance level for the condence intervals for regression parameters and estimated
survival probabilities. The value must be between 0 and 1. By default, ALPHA=0.05.
CONVERGE=value
sets the convergence criterion. Convergence is declared when the maximum change in the
parameter estimates between Newton-Raphson steps is less than the value specied. The
change is a relative change if the parameter is greater than 0.01 in absolute value; otherwise,
it is an absolute change. By default, CONVERGE=1E8.
CONVG=value
sets the relative Hessian convergence criterion; value must be between 0 and 1. After conver-
gence is determined with the change in parameter criterion specied with the CONVERGE=
option, the quantity t c =
g
0
H
1
g
j( j
is computed and compared to value, where g is the gra-
dient vector, H is the Hessian matrix for the model parameters, and } is the log-likelihood
function. If t c is greater than value, a warning that the relative Hessian convergence criterion
has been exceeded is displayed. This criterion detects the occasional case where the change
in parameter convergence criterion is satised, but a maximum in the log-likelihood function
has not been attained. By default, CONVG=1E4.
CORRB
produces the estimated correlation matrix of the parameter estimates.
COVB
produces the estimated covariance matrix of the parameter estimates.
MODEL Statement ! 3021
DISTRIBUTION=distribution-type
DIST=distribution-type
D=distribution-type
species the distribution type assumed for the failure time. By default, PROC LIFEREG ts
a type 1 extreme-value distribution to the log of the response. This is equivalent to tting the
Weibull distribution, since the scale parameter for the extreme-value distribution is related
to a Weibull shape parameter and the intercept is related to the Weibull scale parameter in
this case. When the NOLOG option is specied, PROC LIFEREG models the untransformed
response with a type 1 extreme-value distribution as the default. See the section Supported
Distributions on page 3037 for descriptions of the distributions. The following are valid
values for distribution-type:
EXPONENTIAL the exponential distribution, which is treated as a restricted Weibull dis-
tribution
GAMMA a generalized gamma distribution (Lawless 2003, p. 240). The standard
two-parameter gamma distribution is not available in PROC LIFEREG.
LLOGISTIC a loglogistic distribution
LNORMAL a lognormal distribution
LOGISTIC a logistic distribution (equivalent to LLOGISTIC when the NOLOG op-
tion is specied)
NORMAL a normal distribution (equivalent to LNORMAL when the NOLOG op-
tion is specied)
WEIBULL a Weibull distribution. If NOLOG is specied, it ts a type 1 extreme-
value distribution to the raw, untransformed data.
By default, PROC LIFEREG transforms the response with the natural logarithm before t-
ting the specied model when you specify the GAMMA, LLOGISTIC, LNORMAL, or
WEIBULL option. You can suppress the log transformation with the NOLOG option. The
following table summarizes the resulting distributions when the preceding distribution op-
tions are used in combination with the NOLOG option.
DISTRIBUTION= NOLOG Specied? Resulting Distribution
EXPONENTIAL No Exponential
EXPONENTIAL Yes One-parameter extreme value
GAMMA No Generalized gamma
GAMMA Yes Generalized gamma with untransformed responses
LOGISTIC No Logistic
LOGISTIC Yes Logistic (NOLOG has no effect)
LLOGISTIC No Log-logistic
LLOGISTIC Yes Logistic
LNORMAL No Lognormal
LNORMAL Yes Normal
NORMAL No Normal
NORMAL Yes Normal (NOLOG has no effect)
WEIBULL No Weibull
WEIBULL Yes Extreme value
INITIAL=values
sets initial values for the regression parameters. This option can be helpful in the case of
convergence difculty. Specied values are used to initialize the regression coefcients for
the covariates specied in the MODEL statement. The intercept parameter is initialized with
the INTERCEPT= option and is not included here. The values are assigned to the variables in
the MODEL statement in the same order in which they are listed in the MODEL statement.
Note that a CLASS variable requires k 1 values when the CLASS variable takes on k
different levels. The order of the CLASS levels is determined by the ORDER= option. If
there is no intercept term, the rst CLASS variable requires k initial values. If a BY statement
is used, all CLASS variables must take on the same number of levels in each BY group or no
meaningful initial values can be specied. The INITIAL= option can be specied as follows.
Type of List Specication
list separated by blanks initial=3 4 5
list separated by commas initial=3,4,5
x to y initial=3 to 5
x to y by z initial=3 to 5 by 1
combination of methods initial=1,3 to 5,9
By default, PROC LIFEREG computes initial estimates with ordinary least squares. See the
section Computational Method on page 3035 for details.
NOTE: The INITIAL= option is overwritten by the INEST= option. See the section INEST=
Data Set on page 3048 for details.
INTERCEPT=value
initializes the intercept term to value. By default, the intercept is initialized by an ordinary
least squares estimate.
ITPRINT
displays the iteration history for computing maximum likelihood estimates, the nal evalua-
tion of the gradient, and the nal evaluation of the negative of the second derivative matrix
that is, the negative of the Hessian. If you perform a Bayesian analysis by specifying the
BAYES statement, the iteration history for computing the mode of the posterior distribution
is also displayed.
MAXITER=n
sets the maximum allowable number of iterations during the model estimation. By default,
MAXITER=50.
NOINT
holds the intercept term xed. Because of the usual log transformation of the response, the
intercept parameter is usually a scale parameter for the untransformed response, or a location
parameter for a transformed response.
NOLOG
requests that no log transformation of the response variable be performed. By default, PROC
OUTPUT Statement ! 3023
LIFEREGmodels the log of the response variable for the GAMMA, LLOGISTIC, LOGNOR-
MAL, and WEIBULL distribution options. NOLOG is implicitly assumed for the NORMAL
and LOGISTIC distribution options.
NOSCALE
holds the scale parameter xed. Note that if the log transformation has been applied to the
response, the effect of the scale parameter is a power transformation of the original response.
If no SCALE= value is specied, the scale parameter is xed at the value 1.
NOSHAPE1
holds the rst shape parameter, SHAPE1, xed. If no SHAPE1= value is specied, SHAPE1
is xed at a value that depends on the DISTRIBUTION type.
OFFSET=variable
species a variable in the input data set to be used as an offset variable. This variable cannot be
a CLASS variable, and it cannot be the response variable or one of the explanatory variables.
SCALE=value
initializes the scale parameter to value. If the Weibull distribution is specied, this scale
parameter is the scale parameter of the type 1 extreme-value distribution, not the Weibull
scale parameter. Note that, with a log transformation, the exponential model is the same as a
Weibull model with the scale parameter xed at the value 1.
SHAPE1=value
initializes the rst shape parameter to value. If the specied distribution does not depend on
this parameter, then this option has no effect. The only distribution that depends on this shape
parameter is the generalized gamma distribution. See the section Supported Distributions
on page 3037 for descriptions of the parameterizations of the distributions.
SINGULAR=value
sets the tolerance for testing singularity of the information matrix and the crossproducts ma-
trix for the initial least squares estimates. Roughly, the test requires that a pivot be at least
this value times the original diagonal value. By default, SINGULAR=1E12.
OUTPUT Statement
OUTPUT <OUT=SAS-data-set > <keyword=name > . . . <keyword=name > ;
The OUTPUT statement creates a new SAS data set containing statistics calculated after tting the
model. At least one specication of the form keyword=name is required.
All variables in the original data set are included in the new data set, along with the variables created
as options for the OUTPUT statement. These new variables contain tted values and estimated
quantiles. If you want to create a permanent SAS data set, you must specify a two-level name (see
SAS Language Reference: Concepts for more information about permanent SAS data sets). Each
OUTPUT statement applies to the preceding MODEL statement. See Example 48.1 for illustrations
of the OUTPUT statement.
The following specications can appear in the OUTPUT statement:
OUT=SAS-data-set species the new data set. By default, the procedure uses the DATAn conven-
tion to name the new data set.
keyword=name species the statistics to include in the output data set and gives names to the
new variables. Specify a keyword for each desired statistic (see the following
list of keywords), an equal sign, and the variable to contain the statistic.
The keywords allowed and the statistics they represent are as follows:
CENSORED species an indicator variable to signal censoring. The variable takes on the
value 1 if the observation is censored; otherwise, it is 0.
CDF species a variable to contain the estimates of the cumulative distribution func-
tion evaluated at the observed response. See the section Predicted Values on
page 3041 for more information.
CONTROL species a variable in the input data set to control the estimation of quantiles.
See Example 48.1 for an illustration. If the specied variable has the value 1,
estimates for all the values listed in the QUANTILE= list are computed for that
observation in the input data set; otherwise, no estimates are computed. If no
CONTROL= variable is specied, all quantiles are estimated for all observations.
If the response variable in the MODEL statement is binomial, then this option
has no effect.
CRESIDUAL | CRES species a variable to contain the Cox-Snell residuals
log(S(u
i
))
where S is the standard survival function and
u
i
=
.
i
x
0
i
b
o
If the response variable in the corresponding model statement is binomial, then
the residuals are not computed, and this variable contains missing values.
SRESIDUAL | SRES species a variable to contain the standardized residuals
.
i
x
0
i
b
o
If the response variable in the corresponding model statement is binomial, then
the residuals are not computed, and this variable contains missing values.
PREDICTED | P species a variable to contain the quantile estimates. If the response variable in
the corresponding model statement is binomial, then this variable contains the
estimated probabilities, 1 J(x
0
b).
QUANTILES | QUANTILE | Q gives a list of values for which quantiles are calculated. The val-
ues must be between 0 and 1, noninclusive. For each value, a corresponding
quantile is estimated. This option is not used if the response variable in the cor-
responding MODEL statement is binomial. The QUANTILES option can be
specied as follows.
PROBPLOT Statement ! 3025
Type of List Specication
list separated by blanks .2 .4 .6 .8
list separated by commas .2,.4,.6,.8
x to y .2 to .8
x to y by z .2 to .8 by .1
combination of methods .1,.2 to .8 by .2
By default, QUANTILES=0.5. When the response is not binomial, a numeric
variable, _PROB_, is added to the OUTPUT data set whenever the QUAN-
TILES= option is specied. The variable _PROB_ gives the probability value
for the quantile estimates. These are the values taken from the QUANTILES=
list and are given as values between 0 and 1, not as values between 0 and 100.
STD_ERR | STD species a variable to contain the estimates of the standard errors of the es-
timated quantiles or x
0
b. If the response used in the MODEL statement is a
binomial response, then these are the standard errors of x
0
b. Otherwise, they
are the standard errors of the quantile estimates. These estimates can be used to
compute condence intervals for the quantiles. However, if the model is t to
the log of the event time, better condence intervals can usually be computed by
transforming the condence intervals for the log response. See Example 48.1 for
such a transformation.
XBETA species a variable to contain the computed value of x
0
b, where x is the covariate
vector and b is the vector of parameter estimates.
PROBPLOT Statement
PROBPLOT | PPLOT </ options > ;
You can use the PROBPLOT statement to create a probability plot from lifetime data. The data can
be uncensored, right censored, or arbitrarily censored. You can specify any number of PROBPLOT
statements after a MODEL statement. The syntax used for the response in the MODEL statement
determines the type of censoring assumed in creating the probability plot. The model t with the
MODEL statement is plotted along with the data. If there are covariates in the model, they are
set to constant values specied in the XDATA= data set when creating the probability plot. If
no XDATA= data set is specied, continuous variables are set to their overall mean values and
categorical variables specied in the CLASS statement are set to their highest levels.
You can specify the following options to control the content, layout, and appearance of a probability
plot.
Traditional Graphics
The following options are available if you use traditional graphicsthat is, if ODS Graphics is not
enabled.
ANNOTATE=SAS-data-set
ANNO=SAS-data-set
species an Annotate data set, as described in SAS/GRAPH Software: Reference, that enables
you to add features to the probability plot. The data set you specify with the ANNOTATE=
option in the PROBPLOT statement provides the Annotate data set for all plots created by the
statement.
CAXIS=color
CAXES=color
species the color used for the axes and tick marks. This option overrides any COLOR=
specications in an AXIS statement. The default is the rst color in the device color list.
CCENSOR=color
species the color for lling the censor plot area. The default is the rst color in the device
color list.
CENBIN
plots censored data as frequency counts (rounding for noninteger frequency) rather than as
individual points.
CENCOLOR=color
species the color for the censor symbol. The default is the rst color in the device color list.
CENSYMBOL=symbol | (symbol list )
species symbols for censored values. The symbol is one of the symbol names (plus, star,
square, diamond, triangle, hash, paw, point, dot, and circle) or a letter (AZ). If you do not
specify the CENSYMBOL= option, the symbol used for censored values is the same as for
failures.
CFIT=color
species the color for the tted probability line and condence curves. The default is the rst
color in the device color list.
CFRAME=color
CFR=color
species the color for the area enclosed by the axes and frame. This area is not shaded by
default.
CGRID=color
species the color for grid lines. The default is the rst color in the device color list.
CHREF=color
CH=color
species the color for lines requested by the HREF= option. The default is the rst color in
the device color list.
CTEXT=color
species the color for tick mark values and axis labels. The default is the color specied for
the CTEXT= option in the most recent GOPTIONS statement.
CVREF=color
CV=color
species the color for lines requested by the VREF= option. The default is the rst color in
the device color list.
DESCRIPTION=string
DES=string
species a description, up to 40 characters, that appears in the PROC GREPLAY master
menu. The default is the variable name.
FONT=font
species a software font for reference line and axis labels. You can also specify fonts for
axis labels in an AXIS statement. The FONT= font takes precedence over the FTEXT= font
specied in the most recent GOPTIONS statement. Hardware characters are used by default.
HCL
computes and draws condence limits for the predicted probabilities in the horizontal direc-
tion.
HEIGHT=value
species the height of text used outside framed areas. The default value is 3.846 (in percent-
age).
HLOWER=value
species the lower limit on the lifetime axis scale. The HLOWER= option species value
as the lower lifetime axis tick mark. The tick mark interval and the upper axis limit are
determined automatically.
HOFFSET=value
species the offset for the horizontal axis. The default value is 1.
HUPPER=value
species value as the upper lifetime axis tick mark. The tick mark interval and the lower axis
limit are determined automatically.
HREF <(INTERSECT) > =value-list
requests reference lines perpendicular to the horizontal axis. If (INTERSECT) is specied, a
second reference line perpendicular to the vertical axis is drawn that intersects the t line at
the same point as the horizontal axis reference line. If a horizontal axis reference line label
is specied, the intersecting vertical axis reference line is labeled with the vertical axis value.
See also the CHREF=, HREFLABELS=, and LHREF= options.
HREFLABELS=label1 . . . labeln
HREFLABEL=label1 . . . labeln
HREFLAB=label1 . . . labeln
species labels for the lines requested by the HREF= option. The number of labels must
equal the number of lines. Enclose each label in quotes. Labels can be up to 16 characters.
HREFLABPOS=n
species the vertical position of labels for HREF= lines. The following table shows the valid
values for n and the corresponding label placements.
n Label Placement
1 top
2 staggered from top
3 bottom
4 staggered from bottom
5 alternating from top
6 alternating from bottom
INBORDER
requests a border around probability plots.
INTERTILE=value
species the distance between tiles.
ITPRINTEM
displays the iteration history for the Turnbull algorithm.
JITTER=value
species the amount to jitter overlaying plot symbols, in units of symbol width.
LFIT=linetype
species a line style for tted curves and condence limits. By default, tted curves are
drawn by connecting solid lines (linetype = 1), and condence limits are drawn by connecting
dashed lines (linetype = 3).
LGRID=linetype
species a line style for all grid lines; linetype is between 1 and 46. The default is 35.
LHREF=linetype
LH=linetype
species the line type for lines requested by the HREF= option. The default is 2, which
produces a dashed line.
LVREF=linetype
LV=linetype
species the line type for lines requested by the VREF= option. The default is 2, which
produces a dashed line.
MAXITEM=n1 <,n2> n1
species the maximum number of iterations allowed for the Turnbull algorithm. Iteration
history will be displayed in increments of n2 if requested with the ITPRINTEM option. See
the section Arbitrarily Censored Data on page 3046 for details.
NAME=string
species a name for the plot, up to eight characters, that appears in the PROC GREPLAY
master menu. The default is LIFEREG.
NOCENPLOT
suppresses the plotting of censored data points.
NOCONF
suppresses the default percentile condence bands on the probability plot.
NODATA
suppresses plotting of the estimated empirical probability plot.
NOFIT
suppresses the tted probability (percentile) line and condence bands.
NOFRAME
suppresses the frame around plotting areas.
NOGRID
suppresses grid lines.
NOHLABEL
suppresses horizontal labels.
NOHTICK
suppresses horizontal tick marks.
NOPOLISH
suppresses setting small interval probabilities to zero in the Turnbull algorithm.
NOVLABEL
suppresses vertical labels.
NOVTICK
suppresses vertical tick marks.
NPINTERVALS=interval type
species one of the two kinds of condence limits for the estimated cumulative probabilities,
pointwise (NPINTERVALS=POINT) or simultaneous (NPINTERVALS=SIMUL), requested
by the PPOUT option to be displayed in the tabular output.
PCTLIST=value-list
species the list of percentages for which to compute percentile estimates; value-list must be
a list of values separated by blanks or commas. Each value in the list must be between 0 and
100.
PLOWER=value
species the lower limit on the probability axis scale. The PLOWER= option species value
as the lower probability axis tick mark. The tick mark interval and the upper axis limit are
PRINTPROBS
displays intervals and associated probabilities for the Turnbull algorithm.
PUPPER=value
species the upper limit on the probability axis scale. The PUPPER= option species value
as the upper probability axis tick mark. The tick mark interval and the lower axis limit are
PPOS=character-list
species the plotting position type. See the section Probability Plotting on page 3044 for
details.
PPOS Method
EXPRANK expected ranks
MEDRANK median ranks
MEDRANK1 median ranks (exact formula)
KM Kaplan-Meier
MKM modied Kaplan-Meier (default)
PPOUT
species that a table of the cumulative probabilities plotted on the probability plot be dis-
played. Kaplan-Meier estimates of the cumulative probabilities are also displayed, along
with standard errors and condence limits. The condence limits can be pointwise or simul-
taneous, as specied by the NPINTERVALS= option.
PROBLIST=value-list
species the list of initial values for the Turnbull algorithm.
ROTATE
requests probability plots with probability scale on the horizontal axis.
SQUARE
makes the layout of the probability plots square.
TOLLIKE=value
species the criterion for convergence in the Turnbull algorithm.
TOLPROB=value
species the criterion for setting the interval probability to zero in the Turnbull algorithm.
VAXISLABEL=string
species a label for the vertical axis.
VREF=value-list
requests reference lines perpendicular to the vertical axis. If (INTERSECT) is specied, a
second reference line perpendicular to the horizontal axis is drawn that intersects the t line
at the same point as the vertical axis reference line. If a vertical axis reference line label is
specied, the intersecting horizontal axis reference line is labeled with the horizontal axis
value. See also the entries for the CVREF=, LVREF=, and VREFLABELS= options.
VREFLABELS=label1 . . . labeln
VREFLABEL=label1 . . . labeln
VREFLAB=label1 . . . labeln
species labels for the lines requested by the VREF= option. The number of labels must
VREFLABPOS=n
species the horizontal position of labels for VREF= lines. The valid values for n and the
corresponding label placements are shown in the following table.
n Label Placement
1 left
2 right
WAXIS=n
species line thickness for axes and frame. The default value is 1.
WFIT=n
species line thickness for tted curves. The default value is 1.
WGRID=n
species line thickness for grids. The default value is 1.
WREFL=n
species line thickness for reference lines. The default value is 1.
ODS Graphics
The following options are available if ODS Graphics is enabled.
HCL
computes and draws condence limits for the predicted probabilities in the horizontal direc-
tion.
HLOWER=value
species the lower limit on the lifetime axis scale. The HLOWER= option species value
as the lower lifetime axis tick mark. The tick mark interval and the upper axis limit are
HUPPER=value
species value as the upper lifetime axis tick mark. The tick mark interval and the lower axis
limit are determined automatically.
HREF <(INTERSECT) > =value-list
requests reference lines perpendicular to the horizontal axis. If (INTERSECT) is specied, a
second reference line perpendicular to the vertical axis is drawn that intersects the t line at
the same point as the horizontal axis reference line. If a horizontal axis reference line label
is specied, the intersecting vertical axis reference line is labeled with the vertical axis value.
See also the CHREF=, HREFLABELS=, and LHREF= options.
HREFLABELS=label1 . . . labeln
HREFLABEL=label1 . . . labeln
HREFLAB=label1 . . . labeln
species labels for the lines requested by the HREF= option. The number of labels must
ITPRINTEM
displays the iteration history for the Turnbull algorithm.
MAXITEM=n1 <,n2> n1
species the maximum number of iterations allowed for the Turnbull algorithm. Iteration
history will be displayed in increments of n2 if requested with the ITPRINTEM option. See
the section Arbitrarily Censored Data on page 3046 for details.
NOCENPLOT
suppresses the plotting of censored data points.
NOCONF
suppresses the default percentile condence bands on the probability plot.
NODATA
suppresses plotting of the estimated empirical probability plot.
NOFIT
suppresses the tted probability (percentile) line and condence bands.
NOFRAME
suppresses the frame around plotting areas.
NOGRID
suppresses grid lines.
NOPOLISH
suppresses setting small interval probabilities to zero in the Turnbull algorithm.
NPINTERVALS=interval type
species one of the two kinds of condence limits for the estimated cumulative probabilities,
pointwise (NPINTERVALS=POINT) or simultaneous (NPINTERVALS=SIMUL), requested
by the PPOUT option to be displayed in the tabular output.
PCTLIST=value-list
species the list of percentages for which to compute percentile estimates; value-list must be
a list of values separated by blanks or commas. Each value in the list must be between 0 and
100.
PLOWER=value
species the lower limit on the probability axis scale. The PLOWER= option species value
as the lower probability axis tick mark. The tick mark interval and the upper axis limit are
PRINTPROBS
displays intervals and associated probabilities for the Turnbull algorithm.
PUPPER=value
species the upper limit on the probability axis scale. The PUPPER= option species value
as the upper probability axis tick mark. The tick mark interval and the lower axis limit are
PPOS=plotting-position-type
species the plotting position type. See the section Probability Plotting on page 3044 for
details.
PPOS Method
EXPRANK expected ranks
MEDRANK median ranks
MEDRANK1 median ranks (exact formula)
KM Kaplan-Meier
MKM modied Kaplan-Meier (default)
PPOUT
species that a table of the cumulative probabilities plotted on the probability plot be dis-
played. Kaplan-Meier estimates of the cumulative probabilities are also displayed, along
with standard errors and condence limits. The condence limits can be pointwise or simul-
taneous, as specied by the NPINTERVALS= option.
PROBLIST=value-list
species the list of initial values for the Turnbull algorithm.
ROTATE
requests probability plots with probability scale on the horizontal axis.
SQUARE
makes the layout of the probability plots square.
TOLLIKE=value
species the criterion for convergence in the Turnbull algorithm.
TOLPROB=value
species the criterion for setting the interval probability to zero in the Turnbull algorithm.
VREF=value-list
requests reference lines perpendicular to the vertical axis. If (INTERSECT) is specied, a
second reference line perpendicular to the horizontal axis is drawn that intersects the t line
at the same point as the vertical axis reference line. If a vertical axis reference line label is
specied, the intersecting horizontal axis reference line is labeled with the horizontal axis
value. See also the entries for the CVREF=, LVREF=, and VREFLABELS= options.
VREFLABELS=label1 . . . labeln
VREFLABEL=label1 . . . labeln
VREFLAB=label1 . . . labeln
species labels for the lines requested by the VREF= option. The number of labels must
WEIGHT Statement
WEIGHT variable ;
If you want to use weights for each observation in the input data set, place the weights in a variable
in the data set and specify the name in a WEIGHT statement. The values of the WEIGHT variable
can be nonintegral and are not truncated. Observations with nonpositive or missing values for the
weight variable do not contribute to the t of the model. The WEIGHT variable multiplies the
contribution to the log likelihood for each observation.
Details: LIFEREG Procedure
Missing Values
Any observation with missing values for the dependent variable is not used in the model estimation
unless it is one and only one of the values in an interval specication. Also, if one of the explanatory
variables or the censoring variable is missing, the observation is not used. For any observation to be
used in the estimation of a model, only the variables needed in that model have to be nonmissing.
Predicted values are computed for all observations with no missing explanatory variable values.
If the censoring variable is missing, the CENSORED= variable in the OUT= SAS data set is also
missing.
Model Specication
Main effects as well as interaction terms are allowed in the model specication, similar to the GLM
procedure. For numeric variables, a main effect is a linear term equal to the value of the variable
unless the variable appears in the CLASS statement. For variables listed in the CLASS statement,
PROC LIFEREG creates indicator variables (variables taking the values zero or one) for every level
of the variable except the last level. If there is no intercept term, the rst CLASS variable has
indicator variables created for all levels including the last level. The levels are ordered according
to the ORDER= option. Estimates of a main effect depend upon other effects in the model and,
therefore, are adjusted for the presence of other effects in the model.
Computational Method ! 3035
Computational Method
By default, the LIFEREG procedure computes initial values for the parameters by using ordinary
least squares (OLS) and ignoring censoring. This might not be the best set of starting values for
a given set of data. For example, if there are extreme values in your data, the OLS t might be
excessively inuenced by the extreme observations, causing an overow or convergence problems.
See Example 48.3 for one way to deal with convergence problems.
You can specify the INITIAL= option in the MODEL statement to override these starting values.
You can also specify the INTERCEPT=, SCALE=, and SHAPE= options to set initial values of the
intercept, scale, and shape parameters. For models with multilevel interaction effects, it is a little
difcult to use the INITIAL= option to provide starting values for all parameters. In this case, you
can use the INEST= data set. See the section INEST= Data Set on page 3048 for details. The
INEST= data set overrides all previous specications for starting values of parameters.
The rank of the design matrix X is estimated before the model is t. Columns of X that are judged
linearly dependent on other columns have the corresponding parameters set to zero. The test for
linear dependence is controlled by the SINGULAR= option in the MODEL statement. Variables
are included in the model in the order in which they are listed in the MODEL statement with the
continuous variables included in the model before any classication variables.
The log-likelihood function is maximized by means of a ridge-stabilized Newton-Raphson algo-
rithm. The maximized value of the log likelihood can take positive or negative values, depending
on the specied model and the values of the maximum likelihood estimates of the model parameters.
If convergence of the maximum likelihood estimates is attained, a Type III chi-square test statistic
is computed for each effect, testing whether there is any contribution from any of the levels of the
effect. This statistic is computed as a quadratic form in the appropriate parameter estimates by using
the corresponding submatrix of the asymptotic covariance matrix estimate. See Chapter 39, The
GLM Procedure, and Chapter 15, The Four Types of Estimable Functions, for more information
about Type III estimable functions.
The asymptotic covariance matrix is computed as the inverse of the observed information matrix.
Note that if the NOINT option is specied and CLASS variables are used, the rst CLASS variable
contains a contribution from an intercept term. The results are displayed in an ODS table named
Type3Analysis.
Chi-square tests for individual parameters are Wald tests based on the observed information matrix
and the parameter estimates. If an effect has a single degree of freedom in the parameter estimates
table, the chi-square test for this parameter is equivalent to the Type III test for this effect.
Before SAS 8.2, a multiple-degree-of-freedom statistic was computed for each effect to test for
contribution from any level of the effect. In general, the Type III test statistic in a main-effect-only
model (no interaction terms) will be equal to the previously computed effect statistic, unless there
are collinearities among the effects. If there are collinearities, the Type III statistic will adjust for
them, and the value of the Type III statistic and the number of degrees of freedom might not be
equal to those of the previous effect statistic.
Suppose there are n observations from the model y = X o (or y = X O o if there
is an offset variable), where X is an n k matrix of covariate values (including the intercept), y
is a vector of responses, O is a vector of offset variable values, and is a vector of errors with
survival function S, cumulative distribution function J, and probability density function } . That
is, S(t ) = Pr(c
i
> t ), J(t ) = Pr(c
i
_ t ), and }(t ) = JJ(t )Jt , where c
i
is a component of the
error vector. Then, if all the responses are observed, the log likelihood, 1, can be written as
1 =
log
}(u
i
)
o
where u
i
=
1
c
(.
i
x
0
i
).
If some of the responses are left, right, or interval censored, the log likelihood can be written as
1 =
log
}(u
i
)
o
log (S(u
i
))
log (J(u
i
))
log (J(u
i
) J(
i
))
with the rst sum over uncensored observations, the second sum over right-censored observations,
the third sum over left-censored observations, the last sum over interval-censored observations, and
i
=
1
o
(:
i
x
0
i
)
where :
i
is the lower end of a censoring interval.
If the response is specied in the binomial format, events/trials, then the log-likelihood function is
1 =
r
i
log(1
i
) (n
i
r
i
) log(1 1
i
)
where r
i
is the number of events and n
i
is the number of trials for the i th observation. In this
case, 1
i
= 1 J(x
0
i
). For the symmetric distributions, logistic and normal, this is the same
as J(x
0
i
). Additional information about censored and limited dependent variable models can be
found in Kalbeisch and Prentice (1980) and Maddala (1983).
The estimated covariance matrix of the parameter estimates is computed as the negative inverse
of I, which is the information matrix of second derivatives of L with respect to the parameters
evaluated at the nal parameter estimates. If I is not positive denite, a positive-denite submatrix
of I is inverted, and the remaining rows and columns of the inverse are set to zero. If some of
the parameters, such as the scale and intercept, are restricted, the corresponding elements of the
estimated covariance matrix are set to zero. The standard error estimates for the parameter estimates
are taken as the square roots of the corresponding diagonal elements.
For restrictions placed on the intercept, scale, and shape parameters, one-degree-of-freedom La-
grange multiplier test statistics are computed. These statistics are computed as
,
2
=
g
2
V
where g is the derivative of the log likelihood with respect to the restricted parameter at the restricted
maximum and
V = I
11
I
12
I
1
22
I
21
where the 1 subscripts refer to the restricted parameter and the 2 subscripts refer to the unrestricted
parameters. The information matrix is evaluated at the restricted maximum. These statistics are
Supported Distributions ! 3037
asymptotically distributed as chi-squares with one degree of freedom under the null hypothesis that
the restrictions are valid, provided that some regularity conditions are satised. Refer to Rao (1973,
p. 418) for a more complete discussion. It is possible for these statistics to be missing if the observed
information matrix is not positive denite. Higher-degree-of-freedom tests for multiple restrictions
are not currently computed.
A Lagrange multiplier test statistic is computed to test this constraint. Notice that this test statistic
is comparable to the Wald test statistic for testing that the scale is one. The Wald statistic is the
result of squaring the difference of the estimate of the scale parameter from one and dividing this
by the square of its estimated standard error.
Supported Distributions
For most distributions, the baseline survival function (S) and the probability density function(} )
are listed for the additive random disturbance (.
0
or log(T
0
)) with location parameter u and scale
parameter o. See the section Overview: LIFEREG Procedure on page 2990 for more information.
These distributions apply when the log of the response is modeled (this is the default analysis). The
corresponding survival function (G) and its density function (g) are given for the untransformed
baseline distribution (T
0
).
For the normal and logistic distributions, the response is not log transformed by PROC LIFEREG,
and the survival functions and probability density functions listed apply to the untransformed re-
sponse.
For example, for the WEIBULL distribution, S(n) and }(n) are the survival function and the prob-
ability density function for the extreme-value distribution (distribution of the log of the response),
while G(t ) and g(t ) are the survival function and the probability density function of a Weibull
distribution (using the untransformed response).
The chosen baseline functions dene the meaning of the intercept, scale, and shape parameters.
Only the gamma distribution has a free shape parameter in the following parameterizations. Notice
that some of the distributions do not have mean zero and that o is not, in general, the standard
deviation of the baseline distribution.
For the Weibull distribution, the accelerated failure time model is also a proportional-hazards model.
However, the parameterization for the covariates differs by a multiple of the scale parameter from
the parameterization commonly used for the proportional hazards model.
The distributions supported in the LIFEREG procedure follow. If there are no covariates in the
model, u = Intercept in the output; otherwise, u = x
0
. o = Scale in the output.
Exponential
S(n) = exp(exp(n u))
}(n) = exp(n u) exp(exp(n u))
G(t ) = exp(t )
g(t ) = exp(t )
where exp(u) = .
Generalized Gamma
S(n) = S
0
(u), }(n) = o
1
}
0
(u), G(t ) = G
0
(), g(t ) =

tc
g
0
(), u =
u
c
, = exp(
log(t)
c
),
and
S
0
(u) =
1
I(
2
,
2
exp(u))
I(
2
)
if > 0
I(
2
,
2
exp(u))
I(
2
)
if < 0
}
0
(u) =
[[
I
2
exp(u)
2
exp
exp(u)
2
G
0
() =
1
I(
2
,
2
)
I(
2
)
if > 0
I(
2
,
2
)
I(
2
)
if < 0
g
0
() =
[[
I
2
exp
where I(a) denotes the complete gamma function, I(a. :) denotes the incomplete gamma function,
and is a free shape parameter. The parameter is called Shape by PROC LIFEREG. See Lawless
(2003, p. 240), and Klein and Moeschberger (1997, p. 386) for a description of the generalized
gamma distribution.
Supported Distributions ! 3039
Logistic
S(n) =
1 exp
n u
o
1
}(n) =
exp
u
c
1 exp
u
c
2
Loglogistic
S(n) =
1 exp
n u
o
1
}(n) =
exp
u
c
1 exp
u
c
2
G(t ) =
1
1 t
,
g(t ) =
,t
,1
(1 t
,
)
2
where , = 1o and = exp(uo).
Lognormal
S(n) = 1
n u
o
}(n) =
1
2o
exp
1
2
n u
o
G(t ) = 1
log(t ) u
o
g(t ) =
1
2ot
exp
1
2
log(t ) u
o
where is the cumulative distribution function for the normal distribution.

Normal
S(n) = 1
n u
o
}(n) =
1
2o
exp
1
2
n u
o
where is the cumulative distribution function for the normal distribution.

Weibull
S(n) = exp
exp
n u
o
}(n) =
1
o
exp
n u
o
exp
exp
n u
o
G(t ) = exp
t
,
g(t ) = ,t
,1
exp
t
,
where o = 1, and = exp(uo).

If your parameterization is different from the ones shown here, you can still use the procedure to t
your model. For example, a common parameterization for the Weibull distribution is
g(t : z. ) =
t
z
1
exp
t
z
G(t : z. ) = exp
t
z
so that z = exp(u) and = 1o.

Again note that the expected value of the baseline log response is, in general, not zero and that the
distributions are not symmetric in all cases. Thus, for a given set of covariates, x, the expected value
of the log response is not always x
0
.
Some relations among the distributions are as follows:
v The gamma with Shape=1 is a Weibull distribution.
v The gamma with Shape=0 is a lognormal distribution.
v The Weibull with Scale=1 is an exponential distribution.
Predicted Values ! 3041
Predicted Values
For a given set of covariates, x (including the intercept term), the ]th quantile of the log response,
.
;
, is given by
.
;
= x
0
ou
;
if no offset variable has been specied, or
.
;
= x
0
o ou
;
for a given value o of an offset variable, where u
;
is the ]th quantile of the baseline distribution.
The estimated quantile is computed by replacing the unknown parameters with their estimates,
including any shape parameters on which the baseline distribution might depend. The estimated
quantile of the original response is obtained by taking the exponential of the estimated log quantile
unless the NOLOG option is specied in the preceding MODEL statement.
The following table shows how u
;
is computed from the baseline distribution J(u):
Table 48.3 Baseline Probability Functions and u
;
Distribution J(u) u
;
Exponential 1 exp(exp(u)) log(log(1 ]))
Generalized Gamma
I(
2
,
2
exp(u))
I(
2
)
if > 0
1
I(
2
,
2
exp(u))
I(
2
)
if < 0
J
1
(])
Logistic 1 (1 exp(u))
1
log(](1 ]))
Loglogistic 1 (1 exp(u))
1
log(](1 ]))
Lognormal (u)
1
(])
Normal (u)
1
(])
Weibull 1 exp(exp(u)) log(log(1 ]))
For the generalized gamma distribution, u
;
is computed numerically.
The standard errors of the quantile estimates are computed using the estimated covariance matrix of
the parameter estimates and a Taylor series expansion of the quantile estimate. The standard error
is computed as
STD =
z
0
Vz
where V is the estimated covariance matrix of the parameter vector (
0
. o. )
0
, and z is the vector
z =
x
u
;
o
Ju
p
J
where is the vector of the shape parameters. Unless the NOLOG option is specied, this standard
error estimate is converted into a standard error estimate for exp(.
;
) as exp( .
;
)STD. It might be
more desirable to compute condence limits for the log response and convert them back to the origi-
nal response variable than to use the standard error estimates for exp(.
;
) directly. See Example 48.1
for a 90% condence interval of the response constructed by exponentiating a condence interval
for the log response.
The variable CDF is computed as
CDF
i
= J(u
i
)
where the residual is dened by
u
i
=
.
i
x
0
i
b
o
and J is the baseline cumulative distribution function.

Condence Intervals
Condence intervals are computed for all model parameters and are reported in the Analysis of Pa-
rameter Estimates table. The condence coefcient can be specied with the ALPHA= MODEL
statement option, resulting in a (1 ) 100% two-sided condence coefcient. The default
condence coefcient is 95%, corresponding to = 0.05.
Regression Parameters
A two-sided (1 ) 100% condence interval
i1
.
iU
| for the regression parameter
i
is
based on the asymptotic normality of the maximum likelihood estimator

i
and is computed by
i1
=

i
:
12
(SE
O
i
)
iU
=

i
:
12
(SE
O
i
)
where SE
O
i
is the estimated standard error of

i
, and :
;
is the ] 100% percentile of the standard
normal distribution.
Scale Parameter
A two-sided (1) 100% condence interval o
1
. o
U
| for the scale parameter o in the location-
scale model is based on the asymptotic normality of the logarithm of the maximum likelihood
estimator log( o), and is computed by
o
1
= o exp:
12
(SE
O c
) o|
Fit Statistics ! 3043
o
U
= o exp:
12
(SE
O c
) o|
Refer to Meeker and Escobar (1998) for more information.
Weibull Scale and Shape Parameters
The Weibull distribution scale parameter n and shape parameter are obtained by transforming the
extreme-value location parameter u and scale parameter o:
n = exp(u)
= 1o
Consequently, two-sided (1 ) 100% condence intervals for the Weibull scale and shape
parameters are computed as
n
1
. n
U
| = exp(u
1
). exp(u
U
)|
1
.
U
| = 1o
U
. 1o
1
|
Gamma Shape Parameter
A two-sided (1 ) 100% condence interval for the three-parameter gamma shape parameter
is computed by
1
.
U
| =
:
12
(SE
O
).

:
12
(SE
O
)|
Fit Statistics
Suppose that the model contains ] parameters and that n observations are used in model tting.
The t criteria displayed by the LIFEREG procedure are calculated as follows:
v 2 log likelihood:
2log(L)
where L is the maximized likelihood for the model.
v Akaike information criterion:
AIC = 2log(L) 2]
v corrected Akaike information criterion:
AICC = AIC
2](] 1)
n ] 1
v Bayesian information criterion:
BIC = 2log(L) ] log(n)
Refer to Akaike (1981, 1979) for details of AIC and BIC. Refer to Simonoff (2003) for a discussion
of using AIC, AICC, and BIC in statistical modeling.
Probability Plotting
Probability plots are useful tools for the display and analysis of lifetime data. Probability plots use
an inverse distribution scale so that a cumulative distribution function (CDF) plots as a straight line.
A nonparametric estimate of the CDF of the lifetime data will plot approximately as a straight line,
thus providing a visual assessment of goodness of t.
You can use the PROBPLOT statement in PROC LIFEREG to create probability plots of data that
are complete, right censored, interval censored, or a combination of censoring types (arbitrarily cen-
sored). A line representing the maximum likelihood t from the MODEL statement and pointwise
parametric condence bands for the cumulative probabilities are also included in the plot.
A random variable Y belongs to a location-scale family of distributions if its CDF J is of the form
1r{Y _ .] = J(.) = G
. u
o
where uis the location parameter and o is the scale parameter. Here, G is a CDF that cannot depend
on any unknown parameters, and G is the CDF of Y if u = 0 and o = 1. For example, if Y is a
normal random variable with mean u and standard deviation o,
G(u) = (u) =
u
1
1
2
exp
u
2
2
Ju
and
J(.) =
. u
o
The normal, extreme-value, and logistic distributions are location-scale models. The three-
parameter gamma distribution is a location-scale model if the shape parameter is xed. If T has
a lognormal, Weibull, or log-logistic distribution, then log(T ) has a distribution that is a location-
scale model. Probability plots are constructed for lognormal, Weibull, and log-logistic distributions
by using log(T ) instead of T in the plots.
Probability Plotting ! 3045
Let .
(1)
_ .
(2)
_ . . . _ .
(n)
be ordered observations of a random sample with distribution function
J(.). A probability plot is a plot of the points .
(i)
against m
i
= G
1
(a
i
), where a
i
=

J(.
i
) is an
estimate of the CDF J(.
(i)
) = G
+
.i/
. The nonparametric CDF estimates a

i
are sometimes
called plotting positions. The axis on which the points m
i
are plotted is usually labeled with a
probability scale (the scale of a
i
).
If J is one of the location-scale distributions, then . is the lifetime; otherwise, the log of the lifetime
is used to transform the distribution to a location-scale model.
If the data actually have the stated distribution, then

J ~ J,
m
i
= G
1
(

J(.
i
)) ~ G
1
.
(i)
u
o
=
.
(i)
u
o
and points (.
(i)
. m
i
) should fall approximately in a straight line.
There are several ways to compute the nonparametric CDF estimates used in probability plots from
lifetime data. These are discussed in the next two sections.
Complete and Right-Censored Data
The censoring times must be taken into account when you compute plotting positions for right-
censored data. The modied Kaplan-Meier method described in the following section is the default
method for computing nonparametric CDF estimates for display on probability plots. Refer to
Abernethy (1996), Meeker and Escobar (1998), and Nelson (1982) for discussions of the methods
described in the following sections.
Expected Ranks, Kaplan-Meier, and Modied Kaplan-Meier Methods
Let .
(1)
_ .
(2)
_ . . . _ .
(n)
be ordered observations of a random sample including failure times
and censor times. Order the data in increasing order. Label all the data with reverse ranks r
i
, with
r
1
= n. . . . . r
n
= 1. For the lifetime (not censoring time) corresponding to reverse rank r
i
, compute
the survival function estimate
S
i
=
r
i
r
i
1
S
i1
with S
0
= 1. The expected rank plotting position is computed as a
i
= 1 S
i
. The option
PPOS=EXPRANK species the expected rank plotting position.
For the Kaplan-Meier method,
S
i
=
r
i
1
r
i
S
i1
The Kaplan-Meier plotting position is then computed as a
0
i
= 1 S
i
. The option PPOS=KM
species the Kaplan-Meier plotting position.
For the modied Kaplan-Meier method, use
S
0
i
=
S
i
S
i1
2
where S
i
is computed from the Kaplan-Meier formula with S
0
= 1. The plotting position is then
computed as a
00
i
= 1 S
0
i
. The option PPOS=MKM species the modied Kaplan-Meier plotting
position. If the PPOS option is not specied, the modied Kaplan-Meier plotting position is used
as the default method.
For complete samples, a
i
= i(n1) for the expected rank method, a
0
i
= in for the Kaplan-Meier
method, and a
00
i
= (i 0.5)n for the modied Kaplan-Meier method. If the largest observation is
a failure for the Kaplan-Meier estimator, then J
n
= 1 and the point is not plotted.
Median Ranks
Let .
(1)
_ .
(2)
_ . . . _ .
(n)
be ordered observations of a random sample including failure times
and censor times. A failure order number ;
i
is assigned to the ith failure: ;
i
= ;
i1
^, where
;
0
= 0. The increment ^ is initially 1 and is modied when a censoring time is encountered in the
ordered sample. The new increment is computed as
^ =
(n 1) previous failure order number
1 number of items beyond previous censored item
The plotting position is computed for the i th failure time as
a
i
=
;
i
0.3
n 0.4
For complete samples, the failure order number ;
i
is equal to i , the order of the failure in the
sample. In this case, the preceding equation for a
i
is an approximation of the median plotting
position computed as the median of the ith-order statistic from the uniform distribution on (0, 1).
In the censored case, ;
i
is not necessarily an integer, but the preceding equation still provides an
approximation to the median plotting position. The PPOS=MEDRANK option species the median
rank plotting position.
Arbitrarily Censored Data
The LIFEREG procedure can create probability plots for data that consist of combinations of exact,
left-censored, right-censored, and interval-censored lifetimesthat is, arbitrarily censored data.
The LIFEREG procedure uses an iterative algorithm developed by Turnbull (1976) to compute a
nonparametric maximum likelihood estimate of the cumulative distribution function for the data.
Since the technique is maximum likelihood, standard errors of the cumulative probability estimates
are computed from the inverse of the associated Fisher information matrix. This algorithm is an
example of the expectation-maximization (EM) algorithm. The default initial estimate assigns equal
probabilities to each interval. You can specify different initial values with the PROBLIST= option.
Convergence is determined if the change in the log likelihood between two successive iterations
is less than delta, where the default value of delta is 10
S
. You can specify a different value for
delta with the TOLLIKE= option. Iterations will be terminated if the algorithm does not converge
after a xed number of iterations. The default maximum number of iterations is 1000. Some data
might require more iterations for convergence. You can specify the maximum allowed number
of iterations with the MAXITEM= option in the PROBPLOT statement. The iteration history of
the log likelihood is displayed if you specify the ITPRINTEM option. The iteration history of the
Probability Plotting ! 3047
estimated interval probabilities are also displayed if you specify both options ITPRINTEM and
PRINTPROBS.
If an interval probability is smaller than a tolerance (10
6
by default) after convergence, the proba-
bility is set to zero, the interval probabilities are renormalized so that they add to one, and iterations
are restarted. Usually the algorithm converges in just a few more iterations. You can change the
default value of the tolerance with the TOLPROB= option. You can specify the NOPOLISH option
to avoid setting small probabilities to zero and restarting the algorithm.
If you specify the ITPRINTEM option, a table summarizing the Turnbull estimate of the interval
probabilities is displayed. The columns labeled Reduced Gradient and Lagrange Multiplier are
used in checking nal convergence of the maximum likelihood estimate. The Lagrange multipliers
must all be greater than or equal to zero, or the solution is not maximum likelihood. Refer to
Gentleman and Geyer (1994) for more details of the convergence checking. Also refer to Meeker
and Escobar (1998, Chapter 3) for more information.
See Example 48.6 for an illustration.
Nonparametric Condence Intervals
You can use the PPOUT option in the PROBPLOT statement to create a table containing the non-
parametric CDF estimates computed by the selected method, Kaplan-Meier CDF estimates, stan-
dard errors of the Kaplan-Meier estimator, and nonparametric condence limits for the CDF. The
condence limits are either pointwise or simultaneous, depending on the value of the NPINTER-
VALS= option in the PROBPLOT statement. The method used in the LIFEREG procedure for
computation of approximate pointwise and simultaneous condence intervals for cumulative fail-
ure probabilities relies on the Kaplan-Meier estimator of the cumulative distribution function of
failure time and approximate standard deviation of the Kaplan-Meier estimator. For the case of
arbitrarily censored data, the Turnbull algorithm, discussed previously, provides an extension of the
Kaplan-Meier estimator. Both the Kaplan-Meier and the Turnbull estimators provide an estimate of
the standard error of the CDF estimator, se
O
T
, that is used in computing condence intervals.
Pointwise Condence Intervals
Approximate (1 )100% pointwise condence intervals are computed as in Meeker and Escobar
(1998, Section 3.6) as
J
1
. J
U
| =
J (1

J)n
.
J (1

J)n
where
n = exp
:
12
se
O
T
(

J(1

J))
where :
;
is the ]th quantile of the standard normal distribution.
Simultaneous Condence Intervals
Approximate (1 )100% simultaneous condence bands valid over the lifetime interval (t
o
. t
b
)
are computed as the Equal Precision case of Nair (1984) and Meeker and Escobar (1998, Section
3.8) as
J
1
. J
U
| =
J (1

J)n
.
J (1

J)n
where
n = exp
e
o,b,12
se
O
T
(

J(1

J))
where the factor . = e

o,b,12
is the solution of
. exp(.
2
2) log
(1 a)b
(1 b)a
8 = 2
The time interval (t
o
. t
b
) over which the bands are valid depends in a complicated way on the
constants a and b dened in Nair (1984), 0 < a < b < 1. The constants a and b are chosen by
default so that the condence bands are valid between the lowest and highest times corresponding
to failures in the case of multiply censored data, or to the lowest and highest intervals for which
probabilities are computed for arbitrarily censored data. You can optionally specify a and b directly
with the NPINTERVALS=SIMULTANEOUS(a, b) option in the PROBPLOT statement.
INEST= Data Set
If specied, the INEST= data set species initial estimates for all the parameters in the model. The
INEST= data set must contain the intercept variable (named Intercept) and all independent variables
in the MODEL statement.
If BY processing is used, the INEST= data set should also include the BY variables, and there must
be at least one observation for each BY group. If there is more than one observation in one BY
group, the rst observation read is used for that BY group.
If the INEST= data set also contains the _TYPE_ variable, only observations with _TYPE_ value
PARMS are used as starting values. Combining the INEST= data set and the MAXITER= option
in the MODEL statement, partial scoring can be done, such as predicting on a validation data set by
using the model built from a training data set.
You can specify starting values for the iterative algorithm in the INEST= data set. This data set
overwrites the INITIAL= option in the MODEL statement, which is a little difcult to use for
OUTEST= Data Set ! 3049
models including multilevel interaction effects. The INEST= data set has the same structure as the
OUTEST= data set but is not required to have all the variables or observations that appear in the
OUTEST= data set. One simple use of the INEST= option is passing the previous OUTEST= data
set directly to the next model as an INEST= data set, assuming that the two models have the same
parameterization. See Example 48.3 for an illustration.
OUTEST= Data Set
The OUTEST= data set contains parameter estimates and the log likelihood for the model. You can
specify a label in the MODEL statement to distinguish between the estimates for different models
t with the LIFEREG procedure. If the COVOUT option is specied, the OUTEST= data set also
contains the estimated covariance matrix of the parameter estimates. Note that, if the LIFEREG
procedure does not converge, the parameter estimates are set to missing in the OUTEST data set.
The OUTEST= data set contains all variables specied in the MODEL statement and the BY state-
ment. One observation consists of parameter values for the model with the dependent variable
having the value 1. If the COVOUT option is specied, there are additional observations con-
taining the rows of the estimated covariance matrix. For these observations, the dependent variable
contains the parameter estimate for the corresponding row variable. The following variables are
also added to the data set:
_MODEL_ a character variable containing the label of the MODEL statement, if present.
Otherwise, the variables value is blank.
_NAME_ a character variable containing the name of the dependent variable for the pa-
rameter estimates observations or the name of the row for the covariance matrix
estimates
_TYPE_ a character variable containing the type of the observation, either PARMS for
parameter estimates or COV for covariance estimates
_DIST_ a character variable containing the name of the distribution modeled
_LNLIKE_ a numeric variable containing the last computed value of the log likelihood
INTERCEPT a numeric variable containing the intercept parameter estimates and covariances
_SCALE_ a numeric variable containing the scale parameter estimates and covariances
_SHAPE1_ a numeric variable containing the rst shape parameter estimates and covariances
if the specied distribution has additional shape parameters
Any BY variables specied are also added to the OUTEST= data set.
XDATA= Data Set
The XDATA= data set is used for plotting the predicted probability when there are covariates spec-
ied in a MODEL statement and a probability plot is specied with a PROBPLOT statement. See
Example 48.4 for an illustration.
The XDATA= data set is an input SAS data set that contains values for all the independent variables
in the MODEL statement and variables in the CLASS statement.The XDATA= data set has the
same structure as the DATA= data set but is not required to have all the variables or observations
that appear in the DATA= data set.
The XDATA= data set must contain all the independent variables in the MODEL statement and
variables in the CLASS statement. Even though variables in the CLASS statement might not be
used, valid values are required for these variables in the XDATA= data set. Missing values are
not allowed. Missing values are not allowed in the XDATA= data set for any of the independent
variables, either. Missing values are allowed for the dependent variables and other variables if they
are included in the XDATA= data set.
If BY processing is used, the XDATA= data set should also include the BY variables, and there must
be at least one valid observation for each BY group. If there is more than one valid observation in a
BY group, the last one read is used for that BY group.
If there is no XDATA= data set in the PROC LIFEREG statement, by default, the LIFEREG pro-
cedure will use the overall mean for effects containing a continuous variable (or variables) and the
highest level of a single classication variable as reference level. The rules are summarized as
follows:
v If the effect contains a continuous variable (or variables), the overall mean of this effect (not
the variables) is used.
v If the effect is a single classication variable, the highest level of the variable is used.
Computational Resources
Let ] be the number of parameters estimated in the model. The minimum working space (in bytes)
needed is
16]
2
100]
However, if sufcient space is available, the input data set is also kept in memory; otherwise, the
input data set is reread for each evaluation of the likelihood function and its derivatives, with the
resulting execution time of the procedure substantially increased.
Let n be the number of observations used in the model estimation. Each evaluation of the likelihood
function and its rst and second derivatives requires O(n]
2
) multiplications and additions, n indi-
vidual function evaluations for the log density or log distribution function, and n evaluations of the
rst and second derivatives of the function. The calculation of each updating step from the gradient
and Hessian requires O(]
3
) multiplications and additions. The O() notation means that, for large
values of the argument, , O() is approximately a constant times .
Bayesian Analysis ! 3051
Bayesian Analysis
Gibbs Sampling
This section provides details about Bayesian analysis by Gibbs sampling in the location-scale mod-
els for survival data available in PROC LIFEREG. See the section Gibbs Sampler on page 154
for a general discussion of Gibbs sampling. PROC LIFEREG ts parametric location-scale survival
models. That is, the probability density of the response Y can expressed in the general form
}(.) = g
. u
o
where Y = log(T ) for lifetimes T . The function g determines the specic distribution. The
location parameter u
i
is modeled through regression parameters as u
i
= x
0
i
. The LIFEREG
procedure can provide Bayesian estimates of the regression parameters and o. The OUTPUT and
PROBPLOT statements, if specied, are ignored. The PLOTS=PROBPLOT option in the PROC
LIFEREG statement and the CORRB and COVB options in the MODEL statement are also ignored.
For the Weibull distribution, you can specify that Gibbs sampling be performed on the Weibull
shape parameter = o
1
instead of the scale parameter o by specifying a prior distribution for the
shape parameter with the WEIBULLSHAPEPRIOR= option. In addition, if there are no covariates
in the model, you can specify Gibbs sampling on the Weibull scale parameter = exp(u), where
u is the intercept term, with the WEIBULLSCALEPRIOR= option.
In the case of the exponential distribution with no covariates, you can specify Gibbs sampling
on the exponential scale parameter = exp(u), where u is the intercept term, with the EXP-
SCALEPRIOR= option.
Let = (0
1
. . . . . 0
k
)/ be the parameter vector. For location-scale models, the 0
i
s are the regression
coefcients
i
s and the scale parameter o. In the case of the three-parameter gamma distribution,
there is an additional gamma shape parameter t. Let 1(D[) be the likelihood function, where
D is the observed data. Let () be the prior distribution. The full conditional distribution of
0
i
[0
}
. i = ; | is proportional to the joint distribution; that is,
(0
i
[0
}
. i = ;. D) 1(D[)]()
For instance, the one-dimensional conditional distribution of 0
1
given 0
}
= 0
}
. 2 _ ; _ k, is
computed as
(0
1
[0
}
= 0
}
. 2 _ ; _ k. D) = 1(D[( = (0
1
. 0
2
. . . . . 0
k
)
0
)]( = (0
1
. 0
2
. . . . . 0
k
)
0
)
Suppose you have a set of arbitrary starting values {0
(0)
1
. . . . . 0
(0)
k
]. Using the ARMS (adaptive
rejection Metropolis sampling) algorithm of Gilks and Wild (1992) and Gilks, Best, and Tan (1995),
you can do the following:
draw 0
(1)
1
from 0
1
[0
(0)
2
. . . . . 0
(0)
k
|
draw 0
(1)
2
from 0
2
[0
(1)
1
. 0
(0)
3
. . . . . 0
(0)
k
|
. . .
draw 0
(1)
k
from 0
k
[0
(1)
1
. . . . . 0
(1)
k1
|
This completes one iteration of the Gibbs sampler. After one iteration, you have {0
(1)
1
. . . . . 0
(1)
k
].
After n iterations, you have {0
(n)
1
. . . . . 0
(n)
k
]. PROC LIFEREG implements the ARMS algorithm
based on a program provided by Gilks (2003) to draw a sample from a full conditional distribu-
tion. See the section Assessing Markov Chain Convergence on page 156 for information about
assessing the convergence of the chain of posterior samples.
You can output these posterior samples into a SAS data set. The following option in the BAYES
statement outputs the posterior samples into the SAS data set Post:
OUTPOST=Post
The data set also includes the variable LogPost, representing the log of the posterior log likelihood.
Priors for Model Parameters
The model parameters are the regression coefcients and the dispersion parameter (or the precision
or scale), if the model has one. The priors for the dispersion parameter and the priors for the re-
gression coefcients are assumed to be independent, while you can have a joint multivariate normal
prior for the regression coefcients.
Scale and Shape Parameters
Gamma Prior The gamma distribution G(a. b) has a pdf
}
o,b
(u) =
b(bu)
o1
e
bu
I(a)
. u > 0
where a is the shape parameter and b is the inverse-scale parameter. The mean is
o
b
and the variance
is
o
b
2
.
Improper Prior The joint prior density is given by
](u) u
1
. u > 0
Regression Coefcients
Let be the regression coefcients.
Normal Prior Assume has a multivariate normal prior with mean vector
0
and covariance
matrix
0
. The joint prior density is given by
]() e
1
2
(
0
)
0
1
0
(
0
)
Bayesian Analysis ! 3053
Uniform Prior The joint prior density is given by
]() 1
Posterior Distribution
Denote the observed data by D.
The posterior distribution is
([D) 1
1
(D[)]()
where 1
1
(D[) is the likelihood function with regression coefcients and any additional parame-
ters, such as scale or shape, as parameters; and ]() is the joint prior distribution of the parame-
ters.
Deviance Information Criterion
Let
i
be the model parameters at iteration i of the Gibbs sampler, and let LL(
i
) be the corre-
sponding model log likelihood. PROC LIFEREG computes the following t statistics dened by
Spiegelhalter et al. (2002):
v effective number of parameters:
]
T
= LL() LL(
)
v deviance information criterion (DIC):
DIC = LL() ]
T
where
LL() =
1
n
n
iD1
LL(
i
)
=
1
n
n
iD1
i
and n is the number of Gibbs samples.
Starting Values of the Markov Chains
When the BAYES statement is specied, PROC LIFEREG generates one Markov chain containing
the approximate posterior samples of the model parameters. Additional chains are produced when
the Gelman-Rubin diagnostics are requested. Starting values (or initial values) can be specied
in the INITIAL= data set in the BAYES statement. If INITIAL= option is not specied, PROC
LIFEREG picks its own initial values for the chains.
Denote .| as the integral value of x. Denote s(X) as the estimated standard error of the estimator
X.
Regression Coefcients and Gamma Shape Parameter
For the rst chain that the summary statistics and regression diagnostics are based on, the default
initial values are estimates of the mode of the posterior distribution. If the INITIALMLE option is
specied, the initial values are the maximum likelihood estimates; that is,
(0)
i
=

i
Initial values for the rth chain (r _ 2) are given by
(0)
i
=

r
2
s(
i
)
with the plus sign for odd r and minus sign for even r.
Scale, Exponential Scale, Weibull Scale, or Weibull Shape Parameter z
Let z be the parameter sampled.
For the rst chain that the summary statistics and diagnostics are based on, the initial values are
estimates of the mode of the posterior distribution; or the maximum likelihood estimates if the
INITIALMLE option is specied; that is,
z
(0)
=

z
The initial values of the rth chain (r _ 2) are given by
z
(0)
=

ze
r
2
]C2
O x(
O
2)
with the plus sign for odd r and minus sign for even r.
OUTPOST= Output Data Set
The OUTPOST= data set contains the generated posterior samples. There are 2+n variables, where
n is the number of model parameters. The variable Iteration represents the iteration number and the
variable LogPost contains the log posterior likelihood values. The other n variables represent the
draws of the Markov chain for the model parameters.
Displayed Output for Classical Analysis
For each model, PROC LIFEREG displays the following.
Displayed Output for Classical Analysis ! 3055
Model Information
The Model Information table displays the two-level name of the input data set, the distribution
name, and the name and label of the dependent variable; the name and label of the censor indicator
variable, for right-censored data; if you specify the WEIGHT statement, the name and label of the
weight variable; and the maximum value of the log likelihood.
Number of Observations
The Number of Observations table displays the number of observations read from the input data
set, and the number of observations used in the analysis.
The Class Level Information table displays the levels of classication variables if you specify a
CLASS statement.
Fit Statistics
The Fit Statistics table displays the negative of twice the log likelihood, the Akaike information
criterion (AIC), the corrected Akaike information criterion (AICC), and the Bayesian information
criterion (BIC).
Type III Analysis of Effects
The Type III Analysis of Effects table displays, for each effect in the model, the effect name, the
degrees of freedom associated with the type III contrast for the effect, the chi-square statistic for the
contrast, and the p-value for the statistic.
The Analysis of Maximum Likelihood Parameter Estimates table displays the parameter name,
the degrees of freedom for each parameter, the maximum likelihood estimate of each parameter,
the estimated standard error of the parameter estimator, condence limits for each parameter, a chi-
square statistic for testing whether the parameter is zero, and the associated p-value for the statistic.
Lagrange Multiplier Statistics
If there are constrained parameters in the model, such as the scale or intercept, then the Lagrange
Multiplier Statistics table displays a Lagrange multiplier test for the constraint.
Displayed Output for Bayesian Analysis
If a Bayesian analysis is requested with a BAYES statement, the displayed output includes the
following.
Model Information
The Model Information table displays the two-level name of the input data set, the number of
burn-in iterations, the number of iterations after the burn-in, the number of thinning iterations, the
distribution name, and the name and label of the dependent variable; the name and label of the
censor indicator variable, for right-censored data; if you specify the WEIGHT statement, the name
and label of the weight variable; and the maximum value of the log likelihood.
The Class Level Information table displays the levels of classication variables if you specify a
CLASS statement.
Maximum Likelihood Estimates
The Analysis of Maximum Likelihood Parameter Estimates table displays the maximum like-
lihood estimate of each parameter, the estimated standard error of the parameter estimator, and
condence limits for each parameter.
Coefcient Prior
The Coefcient Prior table displays the prior distribution of the regression coefcients.
The Independent Prior Distributions for Model Parameters table displays the prior distributions of
additional model parameters (scale, exponential scale, Weibull scale, Weibull shape, gamma shape).
Initial Values and Seeds
The Initial Values and Seeds table displays the initial values and random number generator seeds
for the Gibbs chains.
Displayed Output for Bayesian Analysis ! 3057
Fit Statistics
The Fit Statistics table displays the deviance information criterion (DIC) and the effective number
of parameters.
Posterior Summaries
The Posterior Summaries table contains the size of the sample, the mean, the standard deviation,
and the quartiles for each model parameter.
Posterior Intervals
The Posterior Intervals table contains the HPD intervals and the credible intervals for each model
parameter.
Correlation Matrix of the Posterior Samples
The Correlation Matrix of the Posterior Samples table is produced if you include the CORR
suboption in the SUMMARY= option in the BAYES statement. This table displays the sample
correlation of the posterior samples.
Covariance Matrix of the Posterior Samples
The Covariance Matrix of the Posterior Samples table is produced if you include the COV subop-
tion in the SUMMARY= option in the BAYES statement. This table displays the sample covariance
of the posterior samples.
Autocorrelations of the Posterior Samples
The Autocorrelations of the Posterior Samples table displays the lag1, lag5, lag10, and lag50
autocorrelations for each parameter.
Gelman and Rubin Diagnostics
The Gelman and Rubin Diagnostics table is produced if you include the GELMAN suboption
in the DIAGNOSTIC= option in the BAYES statement. This table displays the estimate of the
potential scale reduction factor and its 97.5% upper condence limit for each parameter.
Geweke Diagnostics
The Geweke Diagnostics table displays the Geweke statistic and its ]-value for each parameter.
Raftery and Lewis Diagnostics
The Raftery Diagnostics tables is produced if you include the RAFTERY suboption in the DIAG-
NOSTIC= option in the BAYES statement. This table displays the Raftery and Lewis diagnostics
for each variable.
Heidelberger and Welch Diagnostics
The Heidelberger and Welch Diagnostics table is displayed if you include the HEIDELBERGER
suboption in the DIAGNOSTIC= option in the BAYES statement. This table shows the results of a
stationary test and a halfwidth test for each parameter.
Effective Sample Size
The Effective Sample Size table displays, for each parameter, the effective sample size, the cor-
relation time, and the efciency.
Monte Carlo Standard Errors
The Monte Carlo Standard Errors table displays, for each parameter, the Monte Carlo standard
error, the posterior sample standard deviation, and the ratio of the two.
ODS Table Names
PROC LIFEREG assigns a name to each table it creates. You can use these names to reference
the table when using the Output Delivery System (ODS) to select tables and create output data
sets. These names are listed separately in Table 48.4 for a maximum likelihood analysis and in
Table 48.5 for a Bayesian analysis. For more information about ODS, see Chapter 20, Using the
Output Delivery System.
Table 48.4 ODS Tables Produced in PROC LIFEREG for a Classical Analysis
ODS Table Name Description Statement Option
ClassLevels Classication variable levels CLASS default
ConvergenceStatus Convergence status MODEL default

CorrB Parameter estimate correlation matrix MODEL CORRB
CovB Parameter estimate covariance matrix MODEL COVB
IterEM Iteration history for Turnbull algorithm PROBPLOT ITPRINTEM
FitStatistics Fit statistics MODEL default
IterHistory Iteration history MODEL ITPRINT
LagrangeStatistics Lagrange statistics MODEL NOINT | NOSCALE
LastGrad Last evaluation of the gradient MODEL ITPRINT
LastHess Last evaluation of the Hessian MODEL ITPRINT
ODS Table Names ! 3059
Table 48.4 continued
ModelInfo Model information MODEL default
NObs Number of observations MODEL default
ParameterEstimates Parameter estimates MODEL default
ParmInfo Parameter indices MODEL default
ProbabilityEstimates Nonparametric CDF estimates PROBPLOT PPOUT
TConvergenceStatus Convergence status for Turnbull algo-
rithm
PROBPLOT default
Turnbull Probability estimates from Turnbull algo-
rithm
PROBPLOT ITPRINTEM
Type3Analysis Type 3 tests MODEL default
Depends on data.
Table 48.5 ODS Tables Produced in PROC LIFEREG for a Bayesian Analysis
AutoCorr Autocorrelations of the posterior samples BAYES default
ClassLevels Classication variable levels CLASS default
CoeffPrior Prior distribution of the regression coef-

cients
BAYES default
ConvergenceStatus Convergence status of maximum likeli-
hood estimation
MODEL default
Corr Correlation matrix of the posterior sam-
ples
BAYES SUMMARY=CORR
ESS Effective sample size BAYES default
FitStatistics Fit statistics BAYES default
Gelman Gelman and Rubin convergence diagnos-
tics
BAYES DIAG=GELMAN
Geweke Geweke convergence diagnostics BAYES default
Heidelberger Heidelberger and Welch convergence di-
agnostics
BAYES DIAG=HEIDELBERGER
InitialValues Initial values of the Markov chains BAYES default
ModelInfo Model information MODEL default
NObs Number of observations MODEL default
ParameterEstimates Maximum likelihood estimates of model
parameters
MODEL default
ParmPrior Prior distribution for scale and shape BAYES default
PostIntervals HPD and equal-tail intervals of the poste-
rior samples
BAYES default
PosteriorSample Posterior samples (for output data set
only)
BAYES
PostSummaries Summary statistics of the posterior sam-
ples
BAYES default
Raftery Raftery and Lewis convergence diagnos-
tics
BAYES DIAG=RAFTERY
Depends on data.
ODS Graphics
To request graphics with PROC LIFEREG, you must rst enable ODS Graphics by specifying
the ods graphics on; statement. See Chapter 21, Statistical Graphics Using ODS, for more
information. Some graphs are produced by default; other graphs are produced by using statements
and options. You can reference every graph produced through ODS Graphics with a name. The
names of the graphs that PROC LIFEREG generates are listed in Table 48.6, along with the required
statements and options.
ODS Graph Names
PROC LIFEREG assigns a name to each graph it creates using ODS. You can use these names to
reference the graphs when using ODS. The names are listed in Table 48.6.
To request these graphs, you must specify the ods graphics on; statement in addition to the
options indicated in Table 48.6.
Table 48.6 ODS Graphics Produced by PROC LIFEREG
ODS Graph Name Description Statement Option
ADPanel Autocorrelation function
and density panel
BAYES PLOTS=(AUTOCORR DENSITY)
AutocorrPanel Autocorrelation function
panel
BAYES PLOTS= AUTOCORR
AutocorrPlot Autocorrelation function
plot
BAYES PLOTS(UNPACK)=AUTOCORR
ProbPlot Probability plot PROBPLOT default
TAPanel Trace and autocorrela-
tion function panel
BAYES PLOTS=(TRACE AUTOCORR)
TADPanel Trace, autocorrelation,
and density function
panel
BAYES default
TDPanel Trace and density panel BAYES PLOTS=(TRACE DENSITY)
TracePanel Trace panel BAYES PLOTS=TRACE
TracePlot Trace plot BAYES PLOTS(UNPACK)=TRACE
Examples: LIFEREG Procedure ! 3061
Examples: LIFEREG Procedure
Example 48.1: Motorette Failure
This example ts a Weibull model and a lognormal model to the example given in Kalbeisch
and Prentice (1980, p. 5). An output data set called models is specied to contain the parameter
estimates. By default, the natural log of the variable time is used by the procedure as the response.
After this log transformation, the Weibull model is t using the extreme-value baseline distribution,
and the lognormal is t using the normal baseline distribution.
Since the extreme-value and normal distributions do not contain any shape parameters, the vari-
able SHAPE1 is missing in the models data set. An additional output data set, out, is created that
contains the predicted quantiles and their standard errors for values of the covariate corresponding
to temp=130 and temp=150. This is done with the control variable, which is set to 1 for only two
observations.
Using the standard error estimates obtained from the output data set, approximate 90% condence
limits for the predicted quantities are then created in a subsequent DATA step for the log response.
The logs of the predicted values are obtained because the values of the P= variable in the OUT= data
set are in the same units as the original response variable, time. The standard errors of the quantiles
of log(time) are approximated (using a Taylor series approximation) by the standard deviation of
time divided by the mean value of time. These condence limits are then converted back to the
original scale by the exponential function.
The following statements produce Output 48.1.1:
title Motorette Failures With Operating Temperature as a Covariate;
data motors;
input time censor temp @@;
if _N_=1 then
do;
temp=130;
time=.;
control=1;
z=1000/(273.2+temp);
output;
temp=150;
time=.;
control=1;
z=1000/(273.2+temp);
output;
end;
if temp>150;
control=0;
z=1000/(273.2+temp);
output;
datalines;
8064 0 150 8064 0 150 8064 0 150 8064 0 150 8064 0 150
8064 0 150 8064 0 150 8064 0 150 8064 0 150 8064 0 150
1764 1 170 2772 1 170 3444 1 170 3542 1 170 3780 1 170
4860 1 170 5196 1 170 5448 0 170 5448 0 170 5448 0 170
408 1 190 408 1 190 1344 1 190 1344 1 190 1440 1 190
1680 0 190 1680 0 190 1680 0 190 1680 0 190 1680 0 190
408 1 220 408 1 220 504 1 220 504 1 220 504 1 220
528 0 220 528 0 220 528 0 220 528 0 220 528 0 220
;
run;
proc print data=motors;
run;
Example 48.1: Motorette Failure ! 3063
Output 48.1.1 Motorette Failure Data
Motorette Failures With Operating Temperature as a Covariate
Obs time censor temp control z
1 . 0 130 1 2.48016
2 . 0 150 1 2.36295
3 1764 1 170 0 2.25632
4 2772 1 170 0 2.25632
5 3444 1 170 0 2.25632
6 3542 1 170 0 2.25632
7 3780 1 170 0 2.25632
8 4860 1 170 0 2.25632
9 5196 1 170 0 2.25632
10 5448 0 170 0 2.25632
11 5448 0 170 0 2.25632
12 5448 0 170 0 2.25632
13 408 1 190 0 2.15889
14 408 1 190 0 2.15889
15 1344 1 190 0 2.15889
16 1344 1 190 0 2.15889
17 1440 1 190 0 2.15889
18 1680 0 190 0 2.15889
19 1680 0 190 0 2.15889
20 1680 0 190 0 2.15889
21 1680 0 190 0 2.15889
22 1680 0 190 0 2.15889
23 408 1 220 0 2.02758
24 408 1 220 0 2.02758
25 504 1 220 0 2.02758
26 504 1 220 0 2.02758
27 504 1 220 0 2.02758
28 528 0 220 0 2.02758
29 528 0 220 0 2.02758
30 528 0 220 0 2.02758
31 528 0 220 0 2.02758
32 528 0 220 0 2.02758
The following statements produce Output 48.1.2 and Output 48.1.3:
proc lifereg data=motors outest=modela covout;
a: model time
*
censor(0)=z;
output out=outa quantiles=.1 .5 .9 std=std p=predtime
control=control;
run;
proc lifereg data=motors outest=modelb covout;
b: model time
*
censor(0)=z / dist=lnormal;
output out=outb quantiles=.1 .5 .9 std=std p=predtime
control=control;
run;
Output 48.1.2 Motorette Failure: Model A
Model Information
Data Set WORK.MOTORS
Dependent Variable Log(time)
Censoring Variable censor
Wald
Effect DF Chi-Square Pr > ChiSq
z 1 99.5239 <.0001
Intercept 1 -11.8912 1.9655 -15.7435 -8.0389 36.60 <.0001
z 1 9.0383 0.9060 7.2626 10.8141 99.52 <.0001
Scale 1 0.3613 0.0795 0.2347 0.5561
Weibull Shape 1 2.7679 0.6091 1.7982 4.2605
Example 48.1: Motorette Failure ! 3065
Output 48.1.3 Motorette Failure: Model B
Model Information
Data Set WORK.MOTORS
Dependent Variable Log(time)
Name of Distribution Lognormal
Wald
z 1 42.0001 <.0001
Intercept 1 -10.4706 2.7719 -15.9034 -5.0377 14.27 0.0002
z 1 8.3221 1.2841 5.8052 10.8389 42.00 <.0001
Scale 1 0.6040 0.1107 0.4217 0.8652
data models;
set modela modelb;
run;
proc print data=models;
id _model_;
title fitted models;
run;
Output 48.1.4 Motorette Failure: Fitted Models
fitted models
_MODEL_ _NAME_ _TYPE_ _DIST_ _STATUS_ _LNLIKE_
a time PARMS Weibull 0 Converged -22.9515
a Intercept COV Weibull 0 Converged -22.9515
a z COV Weibull 0 Converged -22.9515
a Scale COV Weibull 0 Converged -22.9515
b time PARMS Lognormal 0 Converged -24.4738
b Intercept COV Lognormal 0 Converged -24.4738
b z COV Lognormal 0 Converged -24.4738
b Scale COV Lognormal 0 Converged -24.4738
_MODEL_ time Intercept z _SCALE_
a -1.0000 -11.8912 9.03834 0.36128
a -11.8912 3.8632 -1.77878 0.03448
a 9.0383 -1.7788 0.82082 -0.01488
a 0.3613 0.0345 -0.01488 0.00632
b -1.0000 -10.4706 8.32208 0.60403
b -10.4706 7.6835 -3.55566 0.03267
b 8.3221 -3.5557 1.64897 -0.01285
b 0.6040 0.0327 -0.01285 0.01226
data out;
set outa outb;
run;
data out1;
set out;
ltime=log(predtime);
stde=std/predtime;
upper=exp(ltime+1.64
*
stde);
lower=exp(ltime-1.64
*
stde);
run;
title quantile estimates and confidence limits;
proc print data=out1;
id temp;
run;
title;
Example 48.2: Computing Predicted Values for a Tobit Model ! 3067
Output 48.1.5 Motorette Failure: Quantile Estimates and Condence Limits
quantile estimates and confidence limits
p
c r
c o _ e
e n P d l u l
t t n t R t t s p o
e i s r O i s i t p w
m m o o B m t m d e e
p e r l z _ e d e e r r
130 . 0 1 2.48016 0.1 16519.27 5999.85 9.7123 0.36320 29969.51 9105.47
130 . 0 1 2.48016 0.5 32626.65 9874.33 10.3929 0.30265 53595.71 19861.63
130 . 0 1 2.48016 0.9 50343.22 15044.35 10.8266 0.29884 82183.49 30838.80
150 . 0 1 2.36295 0.1 5726.74 1569.34 8.6529 0.27404 8976.12 3653.64
150 . 0 1 2.36295 0.5 11310.68 2299.92 9.3335 0.20334 15787.62 8103.28
150 . 0 1 2.36295 0.9 17452.49 3629.28 9.7672 0.20795 24545.37 12409.24
130 . 0 1 2.48016 0.1 12033.19 5482.34 9.3954 0.45560 25402.68 5700.09
130 . 0 1 2.48016 0.5 26095.68 11359.45 10.1695 0.43530 53285.36 12779.95
130 . 0 1 2.48016 0.9 56592.19 26036.90 10.9436 0.46008 120349.65 26611.42
150 . 0 1 2.36295 0.1 4536.88 1443.07 8.4200 0.31808 7643.71 2692.83
150 . 0 1 2.36295 0.5 9838.86 2901.15 9.1941 0.29487 15957.38 6066.36
150 . 0 1 2.36295 0.9 21336.97 7172.34 9.9682 0.33615 37029.72 12294.62
Example 48.2: Computing Predicted Values for a Tobit Model
The LIFEREG procedure can be used to perform a Tobit analysis. The Tobit model, described by
Tobin (1958), is a regression model for left-censored data assuming a normally distributed error
term. The model parameters are estimated by maximum likelihood. PROC LIFEREG provides
estimates of the parameters of the distribution of the uncensored data. See Greene (1993) and
Maddala (1983) for a more complete discussion of censored normal data and related distributions.
This example shows how you can use PROC LIFEREG and the DATA step to compute two of the
three types of predicted values discussed there.
Consider a continuous random variable Y and a constant C. If you were to sample from the distribu-
tion of Y but discard values less than (greater than) C, the distribution of the remaining observations
would be truncated on the left (right). If you were to sample from the distribution of Y and report
values less than (greater than) C as C, the distribution of the sample would be left (right) censored.
The probability density function of the truncated random variable Y
0
is given by
}
Y
0 (.) =
}
Y
(.)
Pr(Y > C)
for . > C
where }
Y
(.) is the probability density function of Y. PROC LIFEREG cannot compute the proper
likelihood function to estimate parameters or predicted values for a truncated distribution. Suppose
the model being t is specied as follows:
Y
i
= x
0
i
c
i
where c
i
is a normal error term with zero mean and standard deviation o.
Dene the censored random variable Y
i
as
Y
i
= 0 if Y
i
_ 0
Y
i
= Y
i
if Y
i
> 0
This is the Tobit model for left-censored normal data. Y
i
is sometimes called the latent variable.
PROC LIFEREG estimates parameters of the distribution of Y
i
by maximum likelihood.
You can use the LIFEREG procedure to compute predicted values based on the mean functions of
the latent and observed variables. The mean of the latent variable Y
i
is x
0
i
, and you can compute
values of the mean for different settings of x
i
by specifying XBETA=variable-name in an OUTPUT
statement. Estimates of x
0
i
for each observation will be written to the OUT= data set. Predicted
values of the observed variable Y
i
can be computed based on the mean
1(Y
i
) =
x
0
i
(x
0
i
oz
i
)
where
z
i
=
(x
0
i
o)
(x
0
i
o)
and represent the normal probability density and cumulative distribution functions.
Although the distribution of c
i
in the Tobit model is often assumed normal, you can use other
distributions for the Tobit model in the LIFEREG procedure by specifying a distribution with the
DISTRIBUTION= option in the MODEL statement. One distribution that should be mentioned is
the logistic distribution. For this distribution, the MLE has bounded inuence function with respect
to the response variable, but not the design variables. If you believe your data have outliers in the
response direction, you might try this distribution for some robust estimation of the Tobit model.
With the logistic distribution, the predicted values of the observed variable Y
i
can be computed
based on the mean of Y
i
,
1(Y
i
) = o ln(1 exp(x
0
i
o))
The following table shows a subset of the Mroz (1987) data set. In these data, Hours is the number
of hours the wife worked outside the household in a given year, Yrs_Ed is the years of education,
and Yrs_Exp is the years of work experience. A Tobit model will be t to the hours worked with
years of education and experience as covariates.
Example 48.2: Computing Predicted Values for a Tobit Model ! 3069
Hours Yrs_Ed Yrs_Exp
0 8 9
0 8 12
0 9 10
0 10 15
0 11 4
0 11 6
1000 12 1
1960 12 29
0 13 3
2100 13 36
3686 14 11
1920 14 38
0 15 14
1728 16 3
1568 16 19
1316 17 7
0 17 15
If the wife was not employed (worked 0 hours), her hours worked will be left censored at zero.
In order to accommodate left censoring in PROC LIFEREG, you need two variables to indicate
censoring status of observations. You can think of these variables as lower and upper endpoints of
interval censoring. If there is no censoring, set both variables to the observed value of Hours. To
indicate left censoring, set the lower endpoint to missing and the upper endpoint to the censored
value, zero in this case.
The following statements create a SAS data set with the variables Hours, Yrs_Ed, and Yrs_Exp from
the preceding data. A new variable, Lower, is created such that Lower=. if Hours=0 and Lower=Hours
if Hours>0.
data subset;
input Hours Yrs_Ed Yrs_Exp @@;
if Hours eq 0
then Lower=.;
else Lower=Hours;
datalines;
0 8 9 0 8 12 0 9 10 0 10 15 0 11 4 0 11 6
1000 12 1 1960 12 29 0 13 3 2100 13 36
3686 14 11 1920 14 38 0 15 14 1728 16 3
1568 16 19 1316 17 7 0 17 15
;
run;
The following statements t a normal regression model to the left-censored Hours data with Yrs_Ed
and Yrs_Exp as covariates. You need the estimated standard deviation of the normal distribution to
compute the predicted values of the censored distribution from the preceding formulas. The data
set OUTEST contains the standard deviation estimate in a variable named _SCALE_. You also need
estimates of x
0
i
. These are contained in the data set OUT as the variable Xbeta.
proc lifereg data=subset outest=OUTEST(keep=_scale_);
model (lower, hours) = yrs_ed yrs_exp / d=normal;
output out=OUT xbeta=Xbeta;
run;
Output 48.2.1 shows the results of the model t. These tables show parameter estimates for the
uncensored, or latent variable, distribution.
Output 48.2.1 Parameter Estimates from PROC LIFEREG
Model Information
Data Set WORK.SUBSET
Dependent Variable Lower
Dependent Variable Hours
Name of Distribution Normal
Intercept 1 -5598.64 2850.248 -11185.0 -12.2553 3.86 0.0495
Yrs_Ed 1 373.1477 191.8872 -2.9442 749.2397 3.78 0.0518
Yrs_Exp 1 63.3371 38.3632 -11.8533 138.5276 2.73 0.0987
Scale 1 1582.870 442.6732 914.9433 2738.397
The following statements combine the two data sets created by PROC LIFEREG to compute pre-
dicted values for the censored distribution. The OUTEST= data set contains the estimate of the
standard deviation from the uncensored distribution, and the OUT= data set contains estimates of
x
0
i
.
data predict;
drop lambda _scale_ _prob_;
set out;
if _n_ eq 1 then set outest;
lambda = pdf(NORMAL,Xbeta/_scale_)
/ cdf(NORMAL,Xbeta/_scale_);
Predict = cdf(NORMAL, Xbeta/_scale_)
*
(Xbeta + _scale_
*
lambda);
label Xbeta=MEAN OF UNCENSORED VARIABLE
Predict = MEAN OF CENSORED VARIABLE;
run;
Output 48.2.2 shows the original variables, the predicted means of the uncensored distribution, and
the predicted means of the censored distribution.
Example 48.3: Overcoming Convergence Problems by Specifying Initial Values ! 3071
Output 48.2.2 Predicted Means from PROC LIFEREG
MEAN OF MEAN OF
UNCENSORED CENSORED
Hours Lower Yrs_Ed Yrs_Exp VARIABLE VARIABLE
0 . 8 9 -2043.42 73.46
0 . 8 12 -1853.41 94.23
0 . 9 10 -1606.94 128.10
0 . 10 15 -917.10 276.04
0 . 11 4 -1240.67 195.76
0 . 11 6 -1113.99 224.72
1000 1000 12 1 -1057.53 238.63
1960 1960 12 29 715.91 1052.94
0 . 13 3 -557.71 391.42
2100 2100 13 36 1532.42 1672.50
3686 3686 14 11 322.14 805.58
1920 1920 14 38 2032.24 2106.81
0 . 15 14 885.30 1170.39
1728 1728 16 3 561.74 951.69
1568 1568 16 19 1575.13 1708.24
1316 1316 17 7 1188.23 1395.61
0 . 17 15 1694.93 1809.97
Example 48.3: Overcoming Convergence Problems by Specifying Initial
Values
This example illustrates the use of parameter initial value specication to help overcome conver-
gence difculties.
The following statements create a SAS data set.
data raw;
input censor x c1 @@;
datalines;
0 16 0.00 0 17 0.00 0 18 0.00
0 17 0.04 0 18 0.04 0 18 0.04
0 23 0.40 0 22 0.40 0 22 0.40
0 33 4.00 0 34 4.00 0 35 4.00
1 54 40.00 1 54 40.00 1 54 40.00
1 54 400.00 1 54 400.00 1 54 400.00
;
run;
Output 48.3.1 shows the contents of the data set raw.
Output 48.3.1 Contents of the Data Set
Obs censor x c1
1 0 16 0.00
2 0 17 0.00
3 0 18 0.00
4 0 17 0.04
5 0 18 0.04
6 0 18 0.04
7 0 23 0.40
8 0 22 0.40
9 0 22 0.40
10 0 33 4.00
11 0 34 4.00
12 0 35 4.00
13 1 54 40.00
14 1 54 40.00
15 1 54 40.00
16 1 54 400.00
17 1 54 400.00
18 1 54 400.00
The following SAS statements request that a Weibull regression model be t to the data:
title OLS (default) initial values;
proc lifereg data=raw;
model x
*
censor(1) = c1 / distribution = Weibull itprint;
run;
Convergence was not attained in 50 iterations for this model, as the following messages to the log
indicate:
WARNING: Convergence was not attained in 50 iterations. You might want to
increase the maximum number of iterations (MAXITER= option) or
change the convergence criteria (CONVERGE = value) in the MODEL
statement.
WARNING: The procedure is continuing in spite of the above warning. Results
shown are based on the last maximum likelihood iteration. Validity
of the model fit is questionable.
The rst line (iter=0) of the iteration history table, shown in Output 48.3.2, shows the default initial
ordinary least squares (OLS) estimates of the parameters.
Output 48.3.2 Initial Least Squares
OLS (default) initial values
Iteration History for Parameter Estimates
Iter Ridge Loglikelihood Intercept c1 Scale
0 0 -22.891088 3.2324769714 0.0020664542 0.3995754195
1 0 -16.427074 3.5337141598 0.0028713635 0.3283544365
2 0 -13.216768 3.4480787541 0.0052801225 0.3816964358
3 0 -5.0786635 3.1966395335 0.0191439929 0.2325418958
4 0 -2.0018885 3.1848047525 0.0275425402 0.1963590539
5 0 -0.1814984 3.1478989655 0.0374731819 0.2103607621
6 0 2.90712131 3.0858183316 0.0659946149 0.1818245261
7 0.063 2.9991781 3.1014479187 0.0661096622 0.1648677081
8 0.063 3.01557837 3.0995493638 0.0662333056 0.1670552505
9 0.063 3.0301815 3.0992317977 0.0663580659 0.1669529486
10 0.063 3.0448013 3.0989901232 0.0664827053 0.1667371524
11 0.063 3.05941254 3.0987507448 0.0666071514 0.1665197313
12 0.063 3.07401474 3.0985118143 0.0667314052 0.1663026517
13 0.063 3.08860788 3.0982732928 0.066855467 0.1660859472
14 0.063 3.10319193 3.0980351787 0.0669793371 0.1658696184
15 0.063 3.11776689 3.0977974713 0.0671030156 0.1656536651
16 0.063 3.13233272 3.0975601698 0.0672265029 0.1654380873
17 0.063 3.1468894 3.0973232737 0.0673497993 0.165222885
18 0.063 3.16143692 3.0970867821 0.0674729049 0.1650080579
19 0.063 3.17597526 3.0968506943 0.06759582 0.1647936061
20 0.063 3.19050439 3.0966150098 0.0677185449 0.1645795293
21 0.063 3.2050243 3.0963797277 0.0678410799 0.1643658275
22 0.063 3.21953496 3.0961448474 0.0679634252 0.1641525006
23 0.063 3.23403635 3.0959103682 0.068085581 0.1639395483
24 0.063 3.24852845 3.0956762896 0.0682075476 0.1637269705
25 0.063 3.26301123 3.0954426107 0.0683293253 0.1635147672
26 0.063 3.27748468 3.095209331 0.0684509143 0.163302938
27 0.063 3.29194878 3.0949764498 0.0685723149 0.1630914829
28 0.063 3.3064035 3.0947439665 0.0686935273 0.1628804017
29 0.063 3.32084881 3.0945118805 0.0688145517 0.1626696942
30 0.063 3.3352847 3.0942801911 0.0689353885 0.1624593601
31 0.063 3.34971114 3.0940488977 0.0690560378 0.1622493994
32 0.063 3.36412812 3.0938179997 0.0691765 0.1620398118
33 0.063 3.3785356 3.0935874965 0.0692967752 0.1618305971
34 0.063 3.39293356 3.0933573875 0.0694168637 0.161621755
35 0.063 3.40732199 3.093127672 0.0695367658 0.1614132855
36 0.063 3.42170085 3.0928983495 0.0696564816 0.1612051882
37 0.063 3.43607013 3.0926694194 0.0697760116 0.1609974629
38 0.063 3.45042979 3.0924408811 0.0698953558 0.1607901095
39 0.063 3.46477983 3.092212734 0.0700145146 0.1605831276
40 0.063 3.4791202 3.0919849776 0.0701334882 0.160376517
41 0.063 3.4934509 3.0917576112 0.0702522768 0.1601702775
42 0.063 3.50777188 3.0915306343 0.0703708808 0.1599644088
43 0.063 3.52208314 3.0913040464 0.0704893002 0.1597589108
44 0.063 3.53638465 3.0910778468 0.0706075354 0.159553783
45 0.063 3.55067637 3.0908520349 0.0707255867 0.1593490254
46 0.063 3.5649583 3.0906266104 0.0708434542 0.1591446376
47 0.063 3.57923039 3.0904015725 0.0709611382 0.1589406193
48 0.063 3.59349263 3.0901769207 0.0710786389 0.1587369703
49 0.063 3.607745 3.0899526546 0.0711959567 0.1585336903
50 0.063 3.62198746 3.0897287734 0.0713130916 0.1583307791
The log-logistic distribution is more robust to large values of the response than the Weibull distribu-
tion, so one approach to improving the convergence performance is to t a log-logistic distribution,
and if this converges, use the resulting parameter estimates as initial values in a subsequent t of a
model with the Weibull distribution.
The following statements t a log-logistic distribution to the data:
proc lifereg data=raw;
model x
*
censor(1) = c1 / distribution = llogistic;
run;
The algorithm converges, and the maximum likelihood estimates for the log-logistic distribution are
shown in Output 48.3.3
Output 48.3.3 Estimates from the Log-Logistic Distribution
Intercept 1 2.8983 0.0318 2.8360 2.9606 8309.43 <.0001
c1 1 0.1592 0.0133 0.1332 0.1852 143.85 <.0001
Scale 1 0.0498 0.0122 0.0308 0.0804
The following statements ret the Weibull model by using the maximum likelihood estimates from
the log-logistic t as initial values:
proc lifereg data=raw outest=outest;
model x
*
censor(1) = c1 / itprint distribution = weibull
intercept=2.898 initial=0.16 scale=0.05;
output out=out xbeta=xbeta;
run;
Examination of the resulting output in Output 48.3.4 shows that the convergence problem has been
solved by specifying different initial values.
Output 48.3.4 Final Estimates from the Weibull Distribution
Model Information
Data Set WORK.RAW
Dependent Variable Log(x)
Log Likelihood 11.232023272
Algorithm converged.
Intercept 1 2.9699 0.0326 2.9059 3.0338 8278.86 <.0001
c1 1 0.1435 0.0165 0.1111 0.1758 75.43 <.0001
Scale 1 0.0844 0.0189 0.0544 0.1308
Weibull Shape 1 11.8526 2.6514 7.6455 18.3749
As an example of an alternative way of specifying initial values, the following invocation of PROC
LIFEREG, using the INEST= data set to provide starting values for the three parameters, is equiva-
lent to the previous invocation:
data in;
input intercept c1 scale;
datalines;
2.898 0.16 0.05
;
proc lifereg data=raw inest=in outest=outest;
model x
*
censor(1) = c1 / itprint distribution = weibull;
output out=out xbeta=xbeta;
run;
Example 48.4: Analysis of Arbitrarily Censored Data with Interaction
Effects
The articial data in this example are froma study of the natural recovery time of mice after injection
of a certain toxin. Twenty mice were grouped by sex (sex: 1 = Male, 2 = Female) with equal
sizes. Their ages (in days) were recorded at the injection. Their recovery times (in minutes) were
also recorded. Toxin density in blood was used to decide whether a mouse recovered. Mice were
checked at two times for recovery. If a mouse had recovered at the rst time, the observation is
left censored, and no further measurement is made. The variable time1 is set to missing and time2
is set to the measurement time to indicate left censoring. If a mouse had not recovered at the rst
time, it was checked later at a second time. If it had recovered by the second measurement time,
the observation is interval censored, and the variable time1 is set to the rst measurement time and
time2 is set to the second measurement time. If there was no recovery at the second measurement,
the observation is right censored, and time1 is set to the second measurement time and time2 is set
to missing to indicate right censoring.
The following statements create a SAS data set containing the data from the experiment:
title Natural Recovery Time;
data mice;
input sex age time1 time2 ;
datalines;
1 57 631 631
1 45 . 170
1 54 227 227
1 43 143 143
1 64 916 .
1 67 691 705
1 44 100 100
1 59 730 .
1 47 365 365
1 74 1916 1916
2 79 1326 .
2 75 837 837
2 84 1200 1235
2 54 . 365
2 74 1255 1255
2 71 1823 .
2 65 537 637
2 33 583 683
2 77 955 .
2 46 577 577
;
Example 48.4: Analysis of Arbitrarily Censored Data with Interaction Effects ! 3077
The following SAS statements create the SAS data sets xrow1 and xrow2:
data xrow1;
datalines;
1 50 . .
;
data xrow2;
datalines;
2 60.6 . .
;
The following SAS statements t a Weibull model with age, sex, and an age-by-sex interaction term
as covariates, and create a plot of predicted probabilities against recovery time for the xed values
of age and sex specied in the SAS data set xrow1:
ods graphics on;
proc lifereg data=mice xdata=xrow1;
class sex ;
model (time1, time2) = age sex age
*
sex / dist=Weibull;
probplot / nodata
plower=.5
vref(intersect) = 75
vreflab = 75 Percent
;
inset;
run;
Standard output is shown in Output 48.4.1. Tables containing general model information, Type III
tests for the main effects and interaction terms, and parameter estimates are created.
Output 48.4.1 Parameter Estimates for the Interaction Model
Natural Recovery Time
Model Information
Data Set WORK.MICE
Dependent Variable Log(time1)
Dependent Variable Log(time2)
Output 48.4.1 continued
Wald
age 1 33.8496 <.0001
sex 1 14.0245 0.0002
age
*
sex 1 10.7196 0.0011
Intercept 1 5.4110 0.5549 4.3234 6.4986 95.08 <.0001
age 1 0.0250 0.0086 0.0081 0.0419 8.42 0.0037
sex 1 1 -3.9808 1.0630 -6.0643 -1.8974 14.02 0.0002
sex 2 0 0.0000 . . . . .
age
*
sex 1 1 0.0613 0.0187 0.0246 0.0980 10.72 0.0011
age
*
sex 2 0 0.0000 . . . . .
Scale 1 0.4087 0.0900 0.2654 0.6294
Weibull Shape 1 2.4468 0.5391 1.5887 3.7682
The following two plots display the predicted probability against the recovery time for two different
populations. Output 48.4.2 is created with the PROBPLOT statement with the option XDATA=
xrow1, which species the population with sex = 1, age = 50. Output 48.4.3 is created with the
PROBPLOT statement with the option XDATA= xrow2, which species the population with sex
= 2, age = 60.6. These are the default values that the LIFEREG procedure would use for the
probability plot if the XDATA= option had not been specied. Reference lines are used to display
specied predicted probability points and their relative locations in the plot.
Example 48.4: Analysis of Arbitrarily Censored Data with Interaction Effects ! 3079
Output 48.4.2 Probability Plot for Recovery Time with sex = 1, age = 50
The following SAS statements t a Weibull model with age, sex, and an age-by-sex interaction
term as covariates, and create the plot of predicted probabilities against recovery time shown in
Output 48.4.3, for the xed values of age and sex specied in the SAS data set xrow2:
proc lifereg data=mice xdata=xrow2;
class sex ;
model (time1, time2) = age sex age
*
sex / dist=Weibull;
probplot / nodata
plower=.5
vref(intersect) = 75
vreflab = 75 Percent
;
inset;
run;
title;
ods graphics off;
Output 48.4.3 Probability Plot for Recovery Time with sex = 2, age = 60.6
Example 48.5: Probability PlottingRight Censoring ! 3081
Example 48.5: Probability PlottingRight Censoring
The following statements create a SAS data set containing observed and right-censored lifetimes of
70 diesel engine fans (Nelson 1982):
data Fan;
input Lifetime Censor@@;
Lifetime = Lifetime / 1000;
datalines;
450 0 460 1 1150 0 1150 0 1560 1
1600 0 1660 1 1850 1 1850 1 1850 1
1850 1 1850 1 2030 1 2030 1 2030 1
2070 0 2070 0 2080 0 2200 1 3000 1
3000 1 3000 1 3000 1 3100 0 3200 1
3450 0 3750 1 3750 1 4150 1 4150 1
4150 1 4150 1 4300 1 4300 1 4300 1
4300 1 4600 0 4850 1 4850 1 4850 1
4850 1 5000 1 5000 1 5000 1 6100 1
6100 0 6100 1 6100 1 6300 1 6450 1
6450 1 6700 1 7450 1 7800 1 7800 1
8100 1 8100 1 8200 1 8500 1 8500 1
8500 1 8750 1 8750 0 8750 1 9400 1
9900 1 10100 1 10100 1 10100 1 11500 1
;
run;
Some of the fans had not failed at the time the data were collected, and the unfailed units have right-
censored lifetimes. The variable LIFETIME represents either a failure time or a censoring time, in
thousands of hours. The variable CENSOR is equal to 0 if the value of LIFETIME is a failure time,
and it is equal to 1 if the value is a censoring time. The following statements use the LIFEREG
procedure to produce the probability plot with an inset for the engine lifetimes:
ods graphics on;
proc lifereg data=Fan;
model Lifetime
*
Censor( 1 ) = / d = Weibull;
probplot
ppout
npintervals=simul
;
inset;
run;
ods graphics off;
The resulting graphical output is shown in Output 48.5.1. The estimated CDF, a line representing
the maximum likelihood t, and pointwise parametric condence bands are plotted in the body of
Output 48.5.1. The values of right-censored observations are plotted along the bottom of the graph.
The Cumulative Probability Estimates table is also created in Output 48.5.2.
Output 48.5.1 Probability Plot for the Fan Data
Output 48.5.2 CDF Estimates
Cumulative Probability Estimates
Simultaneous Kaplan-
95% Confidence Kaplan- Meier
Cumulative Limits Meier Standard
Lifetime Probability Lower Upper Estimate Error
0.45 0.0071 0.0007 0.2114 0.0143 0.0142
1.15 0.0215 0.0033 0.2114 0.0288 0.0201
1.15 0.0360 0.0073 0.2168 0.0433 0.0244
1.6 0.0506 0.0125 0.2304 0.0580 0.0282
2.07 0.0666 0.0190 0.2539 0.0751 0.0324
2.07 0.0837 0.0264 0.2760 0.0923 0.0361
2.08 0.1008 0.0344 0.2972 0.1094 0.0392
3.1 0.1189 0.0436 0.3223 0.1283 0.0427
3.45 0.1380 0.0535 0.3471 0.1477 0.0460
4.6 0.1602 0.0653 0.3844 0.1728 0.0510
6.1 0.1887 0.0791 0.4349 0.2046 0.0581
8.75 0.2488 0.0884 0.6391 0.2930 0.0980
Example 48.6: Probability PlottingArbitrary Censoring ! 3083
Example 48.6: Probability PlottingArbitrary Censoring
Table 48.7 contains microprocessor failure data (Nelson 1990). Units were inspected at predeter-
mined time intervals. The data consist of inspection interval endpoints (in hours) and the number of
units failing in each interval. A missing (.) lower endpoint indicates left censoring, and a missing
upper endpoint indicates right censoring. These can be thought of as semi-innite intervals with a
lower (upper) endpoint of negative (positive) innity for left (right) censoring.
Table 48.7 Interval-Censored Data
Lower Endpoint Upper Endpoint Number Failed
. 6 6
6 12 2
24 48 2
24 . 1
48 168 1
48 . 839
168 500 1
168 . 150
500 1000 2
500 . 149
1000 2000 1
1000 . 147
2000 . 122
The following SAS statements create the SAS data set Micro:
data Micro;
input t1 t2 f ;
datalines;
. 6 6
6 12 2
12 24 0
24 48 2
24 . 1
48 168 1
48 . 839
168 500 1
168 . 150
500 1000 2
500 . 149
1000 2000 1
1000 . 147
2000 . 122
;
run;
The following SAS statements compute the nonparametric Turnbull estimate of the cumulative
distribution function and create a lognormal probability plot:
ods graphics on;
proc lifereg data=Micro;
model ( t1 t2 ) = / d=lognormal intercept=25 scale=5;
weight f;
probplot
pupper = 10
itprintem
printprobs
maxitem = (1000,25)
ppout;
inset;
run;
ods graphics off;
The two initial values INTERCEPT=25 and SCALE=5 in the MODEL statement are used to aid
convergence in the model-tting algorithm.
The following tables are created by the PROBPLOT statement in addition to the standard tabular
output from the MODEL statement. Output 48.6.1 shows the iteration history for the Turnbull esti-
mate of the CDF for the microprocessor data. With both options ITPRINTEM and PRINTPROBS
specied in the PROBPLOT statement, this table contains the log likelihoods and interval probabil-
ities for every 25th iteration and the last iteration. It would contain only the log likelihoods if the
option PRINTPROBS were not specied.
Output 48.6.1 Iteration History for the Turnbull Estimate
Iteration History for the Turnbull Estimate of the CDF
Iteration Loglikelihood (., 6) (6, 12) (24, 48) (48, 168)
(168, 500) (500, 1000) (1000, 2000) (2000, .)
0 -1133.4051 0.125 0.125 0.125 0.125
0.125 0.125 0.125 0.125
25 -104.16622 0.00421644 0.00140548 0.00140648 0.00173338
0.00237846 0.00846094 0.04565407 0.93474475
50 -101.15151 0.00421644 0.00140548 0.00140648 0.00173293
0.00234891 0.00727679 0.01174486 0.96986811
75 -101.06641 0.00421644 0.00140548 0.00140648 0.00173293
0.00234891 0.00727127 0.00835638 0.9732621
100 -101.06534 0.00421644 0.00140548 0.00140648 0.00173293
0.00234891 0.00727125 0.00801814 0.97360037
125 -101.06533 0.00421644 0.00140548 0.00140648 0.00173293
0.00234891 0.00727125 0.00798438 0.97363413
130 -101.06533 0.00421644 0.00140548 0.00140648 0.00173293
0.00234891 0.00727125 0.007983 0.97363551
Example 48.6: Probability PlottingArbitrary Censoring ! 3085
The table in Output 48.6.2 summarizes the Turnbull estimates of the interval probabilities, the re-
duced gradients, and Lagrange multipliers as described in the section Arbitrarily Censored Data
on page 3046.
Output 48.6.2 Summary for the Turnbull Algorithm
Lower Upper Reduced Lagrange
Lifetime Lifetime Probability Gradient Multiplier
. 6 0.0042 0 0
6 12 0.0014 0 0
24 48 0.0014 0 0
48 168 0.0017 0 0
168 500 0.0023 0 0
500 1000 0.0073 -7.219342E-9 0
1000 2000 0.0080 -0.037063236 0
2000 . 0.9736 0.0003038877 0
Output 48.6.3 shows the nal estimate of the CDF, along with standard errors and nonparamet-
ric condence limits. Two kinds of nonparametric condence limits, pointwise or simultaneous,
are available. The default is the pointwise nonparametric condence limits. You can specify the
simultaneous nonparametric condence limits by using the NPINTERVALS=SIMUL option.
Output 48.6.3 Final CDF Estimates for Turnbull Algorithm
Cumulative Probability Estimates
Pointwise 95%
Confidence
Lower Upper Cumulative Limits Standard
Lifetime Lifetime Probability Lower Upper Error
6 6 0.0042 0.0019 0.0094 0.0017
12 24 0.0056 0.0028 0.0112 0.0020
48 48 0.0070 0.0038 0.0130 0.0022
168 168 0.0088 0.0047 0.0164 0.0028
500 500 0.0111 0.0058 0.0211 0.0037
1000 1000 0.0184 0.0094 0.0357 0.0063
2000 2000 0.0264 0.0124 0.0553 0.0101
Output 48.6.4 shows the CDF estimates, maximum likelihood t, and pointwise parametric con-
dence limits plotted on a lognormal probability plot.
Output 48.6.4 Lognormal Probability Plot for the Microprocessor Data
Example 48.7: Bayesian Analysis of Clinical Trial Data
Consider the data on melanoma patients from a clinical trial described in Ibrahim, Chen, and Sinha
(2001). A partial listing of the data is shown in Output 48.7.1.
The survival time is modeled by a Weibull regression model with three covariates. An analysis of
the right-censored survival data is performed with PROC LIFEREG to obtain Bayesian estimates
of the regression coefcients by using the following SAS statements:
ods graphics on;
proc lifereg data=e1684;
class Sex;
model Survtime
*
Survcens(1)=Age Sex Perform / dist=Weibull;
bayes WeibullShapePrior=gamma;
run;
ods graphics off;
Example 48.7: Bayesian Analysis of Clinical Trial Data ! 3087
Output 48.7.1 Clinical Trial Data
Obs survtime survcens age sex perform
1 1.57808 2 35.9945 1 0
2 1.48219 2 41.9014 1 0
3 7.33425 1 70.2164 2 0
4 0.65479 2 58.1753 2 1
5 2.23288 2 33.7096 1 0
6 9.38356 1 47.9726 1 0
7 3.27671 2 31.8219 2 0
8 0.00000 1 72.3644 2 0
9 0.80274 2 40.7151 2 0
10 9.64384 1 32.9479 1 0
11 1.66575 2 35.9205 1 0
12 0.94247 2 40.5068 2 0
13 1.68767 2 57.0384 1 0
14 5.94247 2 63.1452 1 0
15 2.34247 2 62.0630 1 0
16 0.89863 2 56.5342 1 1
17 9.03288 1 22.9945 2 0
18 9.63014 1 18.4712 1 0
19 0.52603 2 41.2521 1 0
20 1.82192 2 29.5178 1 0
Maximum likelihood estimates of the model parameters shown in Output 48.7.2 are displayed by
default.
Output 48.7.2 Maximum Likelihood Parameter Estimates
Bayesian Analysis
Standard 95% Confidence
Parameter DF Estimate Error Limits
Intercept 1 2.4402 0.3716 1.7119 3.1685
age 1 -0.0115 0.0070 -0.0253 0.0023
sex 1 1 -0.1170 0.1978 -0.5046 0.2707
sex 2 0 0.0000 . . .
perform 1 0.2905 0.3222 -0.3411 0.9220
Scale 1 1.2537 0.0824 1.1021 1.4260
Weibull Shape 1 0.7977 0.0524 0.7012 0.9073
Since no prior distributions for the regression coefcients were specied, the default uniform im-
proper distributions shown in the Uniform Prior for Regression Coefcients table in Output 48.7.3
are used. The specied gamma prior for the Weibull shape parameter is also shown in Output 48.7.3.
Output 48.7.3 Model Parameter Priors
Bayesian Analysis
Uniform Prior for Regression Coefficients
Parameter Prior
Intercept Constant
age Constant
sex1 Constant
perform Constant
Prior
Parameter Distribution Hyperparameters
Weibull Shape Gamma Shape 0.001 Inverse Scale 0.001
Fit statistics, descriptive statistics, interval statistics, and the sample parameter correlation matrix
for the posterior sample are displayed in the tables in Output 48.7.4. Since noninformative prior
distributions for the regression coefcients were used, the mean and standard deviations of the
posterior distributions for the model parameters are close to the maximum likelihood estimates and
standard errors.
Output 48.7.4 Posterior Sample Statistics
Fit Statistics
DIC (smaller is better) 875.156
pD (effective number of parameters) 4.935
Bayesian Analysis
Posterior Summaries
Standard Percentiles
Parameter N Mean Deviation 25% 50% 75%
Intercept 10000 2.4762 0.3794 2.2140 2.4697 2.7303
age 10000 -0.0117 0.00717 -0.0165 -0.0117 -0.00693
sex1 10000 -0.1261 0.2024 -0.2622 -0.1253 0.0125
perform 10000 0.3279 0.3352 0.0966 0.3152 0.5407
WeibShape 10000 0.7826 0.0517 0.7467 0.7820 0.8167
Posterior Intervals
Parameter Alpha Equal-Tail Interval HPD Interval
Intercept 0.050 1.7530 3.2344 1.7558 3.2366
age 0.050 -0.0256 0.00231 -0.0255 0.00236
sex1 0.050 -0.5235 0.2654 -0.5217 0.2666
perform 0.050 -0.3079 1.0233 -0.3291 0.9937
WeibShape 0.050 0.6850 0.8864 0.6803 0.8807
Posterior Correlation Matrix
Weib
Parameter Intercept age sex1 perform Shape
Intercept 1.0000 -.8983 -.2988 -.0759 -.1422
age -.8983 1.0000 -.0467 -.0401 0.0685
sex1 -.2988 -.0467 1.0000 0.0874 0.0485
perform -.0759 -.0401 0.0874 1.0000 -.0320
WeibShape -.1422 0.0685 0.0485 -.0320 1.0000
The default diagnostic statistics are displayed in Output 48.7.5. See the section Assessing Markov
Chain Convergence on page 156 for more details on Bayesian convergence diagnostics.
Output 48.7.5 Convergence Diagnostics
Bayesian Analysis
Posterior Autocorrelations
Parameter Lag 1 Lag 5 Lag 10 Lag 50
Intercept 0.0660 0.0089 0.0194 -0.0107
age 0.0035 -0.0134 0.0180 -0.0175
sex1 0.6230 0.0645 -0.0057 -0.0121
perform 0.6594 0.1132 0.0199 -0.0100
WeibShape 0.0923 0.0322 0.0050 0.0063
Geweke Diagnostics
Parameter z Pr > |z|
Intercept 1.4526 0.1463
age -1.5941 0.1109
sex1 -0.6555 0.5121
perform -0.1008 0.9197
WeibShape 0.0127 0.9899
Effective Sample Sizes
Correlation
Parameter ESS Time Efficiency
Intercept 7262.3 1.3770 0.7262
age 10000.0 1.0000 1.0000
sex1 2521.7 3.9656 0.2522
perform 2071.8 4.8266 0.2072
WeibShape 6585.9 1.5184 0.6586
Trace, autocorrelation, and density plots for the seven model parameters are shown in Output 48.7.6
through Output 48.7.10. These plots show no indication that the Markov chains have not converged.
See the sections Assessing Markov Chain Convergence on page 156 and Visual Analysis via
Trace Plots on page 156 for more information about assessing the convergence of the chain of
posterior samples.
Output 48.7.6 Diagnostic Plots
References
Abernethy, R. B. (1996), The New Weibull Handbook, Second Edition, North Palm Beach, FL:
Robert B. Abernethy.
Akaike, H. (1979), ABayesian Extension of the MinimumAICProcedure of Autoregressive Model
Fitting, Biometrika, 66, 237242.
Akaike, H. (1981), Likelihood of a Model and Information Criteria, Journal of Econometrics, 16,
314.
Cox, D. R. and Oakes, D. (1984), Analysis of Survival Data, London: Chapman & Hall.
Gentleman, R. and Geyer, C. J. (1994), Maximum Likelihood for Interval Censored Data: Consis-
tency and Computation, Biometrika, 81, 618623.
Gilks, W. (2003), Adaptive Metropolis Rejection Sampling (ARMS), software from MRC
Biostatistics Unit, Cambridge, UK, http://www.maths.leeds.ac.uk/~wally.gilks/
adaptive.rejection/web_page/Welcome.html.
References ! 3095
Gilks, W. R., Best, N. G., and Tan, K. K. C. (1995), Adaptive Rejection Metropolis Sampling with
Gibbs Sampling, Applied Statistics, 44, 455472.
Gilks, W. R., Richardson, S., and Spiegelhalter, D. J. (1996), Markov Chain Monte Carlo in Prac-
tice, London: Chapman & Hall.
Gilks, W. R. and Wild, P. (1992), Adaptive Rejection Sampling for Gibbs Sampling, Applied
Statistics, 41, 337348.
Greene, W. H. (1993), Econometric Analysis, Second Edition, New York: Macmillan.
Ibrahim, J. G., Chen, M. H., and Sinha, D. (2001), Bayesian Survival Analysis, New York: Springer-
Verlag.
Kalbeisch, J. D. and Prentice, R. L. (1980), The Statistical Analysis of Failure Time Data, New
York: John Wiley & Sons.
Klein, J. P. and Moeschberger, M. L. (1997), Survival Analysis: Techniques for Censored and
Truncated Data, New York: Springer-Verlag.
Lawless, J. F. (2003), Statistical Model and Methods for Lifetime Data, Second Edition, New York:
John Wiley & Sons.
Maddala, G. S. (1983), Limited-Dependent and Qualitative Variables in Econometrics, New York:
Cambridge University Press.
Meeker, W. Q. and Escobar, L. A. (1998), Statistical Methods for Reliability Data, New York: John
Wiley & Sons.
Mroz, T. A. (1987), The Sensitivity of an Empirical Model of Married Womens Work to Economic
and Statistical Assumptions, Econometrica, 55, 765799.
Nair, V. N. (1984), Condence Bands for Survival Functions with Censored Data: A Comparative
Study, Technometrics, 26, 265275.
Nelson, W. (1982), Applied Life Data Analysis, New York: John Wiley & Sons.
Nelson, W. (1990), Accelerated Testing: Statistical Models, Test Plans, and Data Analyses, New
York: John Wiley & Sons.
Rao, C. R. (1973), Linear Statistical Inference, New York: John Wiley & Sons.
Simonoff, J. S. (2003), Analyzing Categorical Data, New York: Springer-Verlag.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van der Linde, A. (2002), Bayesian Measures
of Model Complexity and Fit, Journal of the Royal Statistical Society, Series B, 64(4), 583616,
with discussion.
Tobin, J. (1958), Estimation of Relationships for Limited Dependent Variables, Econometrica,
26, 2436.
Turnbull, B. W. (1976), The Empirical Distribution Function with Arbitrarily Grouped, Censored
and Truncated Data, Journal of the Royal Statistical Society, Series B, 38, 290295.
Subject Index
accelerated failure time models
LIFEREG procedure, 2990
annotating
pplot plots, 3026
censored
data (LIFEREG), 2990
censoring, 2990
computational details
computational resources
Condence intervals
cumulative distribution function, 3040
deviance information criterion, 3053
DIC, 3053
effective number of parameters, 3053
failure time
gamma distribution, 2990, 3020, 3037
graphics catalog, specifying
INEST= data sets
information matrix
LIFEREG procedure, 2990, 2991, 3035
initial estimates
inset
Lagrange multiplier
test statistics (LIFEREG), 3036
least-squares estimation
lifereg analysis
insets, 3016
accelerated failure time models, 2990
censoring, 3018
computational details, 3035
computational resources, 3050
Condence intervals, 3042
failure time, 2990
INEST= data sets, 3048
information matrix, 2990, 2991, 3035
initial estimates, 3035
inset, 3016
Lagrange multiplier test statistics, 3036
least-squares estimation, 3035
log-likelihood function, 2991, 3035
log-likelihood ratio tests, 2991
main effects, 3034
maximum likelihood estimates, 2990
missing values, 3034
Newton-Raphson algorithm, 2990
OUTEST= data sets, 3048
output data sets, 3054
output ODS Graphics table names, 3060
output table names, 3058
predicted values, 3040
supported distributions, 3037
survival function, 2991, 3037
Tobit model, 2992, 3067
XDATA= data sets, 3049
log-likelihood function
LIFEREG procedure, 2991, 3035
log-likelihood ratio tests
logistic distribution, 2990, 3020, 3037
loglogistic distribution, 2990, 3020, 3037
lognormal distribution, 2990, 3020, 3037
main effects
maximum likelihood
estimates (LIFEREG), 2990
missing values
Newton-Raphson algorithm
normal distribution, 2990, 3020, 3037
OUTEST= data sets
output data sets
output ODS Graphics table names
output table names
parameter estimates
pplot plots
annotating, 3026
axes, color, 3026
font, specifying, 3027
reference lines, options, 30263028,
30303034
predicted values
proportional hazards model
distribution (LIFEREG), 3037
standard error
survival function
survival models, parametric, 2990
Tobit model
Weibull distribution, 2990, 3020, 3037
XDATA= data sets
Syntax Index
ALPHA= option
MODEL statement (LIFEREG), 3020
BAYES statement
BY statement
CDF keyword
OUTPUT statement (LIFEREG), 3024
CENSORED keyword
CLASS statement
CONTROL keyword
CONVERGE= option
CONVG= option
CORRB option
COVB option
COVOUT option
PROC LIFEREG statement, 3004
DATA= option
DISTRIBUTION= option
GOUT= option
INEST= option
INITIAL= option
INSET statement
INTERCEPT= option
ITPRINT option
keyword= option
LIFEREG procedure
syntax, 3003
LIFEREG PROCEDURE, BAYES statement,
3005
LIFEREG procedure, BAYES statement
STATISTICS= option, 3013
THINNING= option, 3013
LIFEREG procedure, BY statement, 3015
LIFEREG procedure, CLASS statement, 3015
LIFEREG procedure, INSET statement, 3016
keywords, 3016
LIFEREG procedure, MODEL statement, 3018
ALPHA= option, 3020
CONVERGE= option, 3020
CONVG= option, 3020
CORRB option, 3020
COVB option, 3020
DISTRIBUTION= option, 3021
INITIAL= option, 3022
INTERCEPT= option, 3022
ITPRINT option, 3022
MAXITER= option, 3022
NOINT option, 3022
NOLOG option, 3022
NOSCALE option, 3023
NOSHAPE1 option, 3023
OFFSET= option, 3023
SCALE= option, 3023
SHAPE1= option, 3023
SINGULAR= option, 3023
LIFEREG procedure, OUTPUT statement, 3023
CDF keyword, 3024
CENSORED keyword, 3024
CONTROL keyword, 3024
keyword= option, 3024
OUT= option, 3024
PREDICTED keyword, 3024
QUANTILES keyword, 3024
STD_ERR keyword, 3025
XBETA keyword, 3025
LIFEREG procedure, PPLOT statement
ANNOTATE= option, 3026
CAXIS= option, 3026
CCENSOR option, 3026
CENBIN, 3026
CENCOLOR option, 3026
CENSYMBOL option, 3026
CFIT= option, 3026
CFRAME= option, 3026
CGRID= option, 3026
CHREF= option, 3026
CTEXT= option, 3026
CVREF= option, 3027
DESCRIPTION= option, 3027
FONT= option, 3027
HCL, 3027, 3031
HEIGHT= option, 3027
HLOWER= option, 3027, 3031
HOFFSET= option, 3027
HREF= option, 3027, 3031
HREFLABELS= option, 3027, 3032
HREFLABPOS= option, 3028
HUPPER= option, 3027, 3031
INBORDER option, 3028
INTERTILE option, 3028
ITPRINTEM option, 3028, 3032
JITTER option, 3028
LFIT option, 3028
LGRID option, 3028
LHREF= option, 3028
LVREF= option, 3028
MAXITEM= option, 3028, 3032
NAME= option, 3028
NOCENPLOT option, 3029, 3032
NOCONF option, 3029, 3032
NODATA option, 3029, 3032
NOFIT option, 3029, 3032
NOFRAME option, 3029, 3032
NOGRID option, 3029, 3032
NOHLABEL option, 3029
NOHTICK option, 3029
NOPOLISH option, 3029, 3032
NOVLABEL option, 3029
NOVTICK option, 3029
NPINTERVALS option, 3029, 3032
PCTLIST option, 3029, 3032
PLOWER= option, 3029, 3032
PPOS option, 3030, 3033
PPOUT option, 3030, 3033
PRINTPROBS option, 3029, 3033
PROBLIST option, 3030, 3033
PUPPER= option, 3030, 3033
ROTATE option, 3030, 3033
SQUARE option, 3030, 3033
TOLLIKE option, 3030, 3033
TOLPROB option, 3030, 3033
VAXISLABEL= option, 3030
VREF= option, 3030, 3033
VREFLABELS= option, 3031, 3034
VREFLABPOS= option, 3031
WAXIS= option, 3031
WFIT= option, 3031
WGRID= option, 3031
WREFL= option, 3031
LIFEREG procedure, PROBPLOT statement,
3025
LIFEREG procedure, PROC LIFEREG
statement, 3004
COVOUT option, 3004
DATA= option, 3004
GOUT= option, 3004
INEST= option, 3004
NAMELEN= option, 3004
NOPRINT option, 3004
ORDER= option, 3004
OUTEST= option, 3005
XDATA= option, 3005
LIFEREG procedure, WEIGHT statement, 3034
MAXITER= option
MODEL statement
NAMELEN= option
NOINT option
NOLOG option
NOPRINT option
NOSCALE option
NOSHAPE1 option
OFFSET= option
ORDER= option
OUT= option
OUTEST= option
OUTPUT statement
PREDICTED keyword
PROBPLOT statement
PROC LIFEREG statement, see LIFEREG
procedure
QUANTILES keyword
SCALE= option
SHAPE1= option
SINGULAR= option
STATISTICS= option
BAYES statement(PHREG), 3013
STD_ERR keyword
THINNING= option
BAYES statement(LIFEREG), 3013
WEIGHT statement
XBETA keyword
XDATA= option
Your Turn
We welcome your feedback.
v If you have comments about this book, please send them to
[email protected]. Include the full title and page numbers (if
applicable).
v If you have comments about the software, please send them to
[email protected].
SAS
Publishing Delivers!
Whether you are new to the work force or an experienced professional, you need to distinguish yourself in this rapidly
changing and competitive job market. SAS
Publishing provides you with a wide range of resources to help you set
yourself apart. Visit us online at support.sas.com/bookstore.
SAS
Press
Need to learn the basics? Struggling with a programming problem? Youll fnd the expert answers that you
need in example-rich books from SAS Press. Written by experienced SAS professionals from around the
world, SAS Press books deliver real-world insights on a broad range of topics for all skill levels.
s u p p o r t . s a s . c o m/ s a s p r e s s
SAS
Documentation
To successfully implement applications using SAS software, companies in every industry and on every
continent all turn to the one source for accurate, timely, and reliable information: SAS documentation.
We currently produce the following types of reference documentation to improve your work experience:
Onlinehelpthatisbuiltintothesoftware.
Tutorialsthatareintegratedintotheproduct.
ReferencedocumentationdeliveredinHTMLandPDF free on the Web.
Hard-copybooks.
s u p p o r t . s a s . c o m/ p u b l i s h i n g
SAS
Publishing News
Subscribe to SAS Publishing News to receive up-to-date information about all new SAS titles, author
podcasts, and new Web site features via e-mail. Complete instructions on how to subscribe, as well as
access to past issues, are available at our Web site.
s u p p o r t . s a s . c o m/ s p n
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration.
Otherbrandandproductnamesaretrademarksoftheirrespectivecompanies.2009SASInstituteInc.Allrightsreserved.518177_1US.0109

Statuglifereg

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statuglifereg

Uploaded by

Copyright:

Available Formats

SAS/STAT

9.2 Users Guide

9.2 Users Guide.

3.30911843 0.1933025 for group = 1

and the variance is

where is the cumulative distribution function for the normal distribution.

where is the cumulative distribution function for the normal distribution.

where o = 1, and = exp(uo).

so that z = exp(u) and = 1o.

and J is the baseline cumulative distribution function.

. The nonparametric CDF estimates a

where the factor . = e

ConvergenceStatus Convergence status MODEL default

CoeffPrior Prior distribution of the regression coef-

You might also like