ABSTRACT
The cure fraction models are generally used to model lifetime data with long term survivors. In
a cohort of cancer patients, it has been observed that due to the development of new drugs some
patients are cured permanently, and some are not cured. The patients who are cured permanently
are called cured or long term survivors while patients who experience the recurrence of the
disease are termed as susceptibles or uncured. The proportion of cured individuals after a
treatment is typically known as the cure fraction (cure rate). Thus, the population is divided into
two groups: a group of cured individuals and a group of susceptible individuals. In this paper, we
have introduced a three parameter Gompertz (viz. scale, shape and acceleration) or Generalized
Gompertz Distribution in the presence of cure fraction, censored data and covariates for
estimating the proportion of cure fraction through Bayesian Approach. Inferences are obtained
using standard Markov Chain Monte Carlo technique in openBUGS software.
Keywords: Cure rate, long-term survivor, Generalized Gompertz Distribution, Bayesian
Analysis, mixture cure model.
- INTRODUCTION
In lifetime data analysis, the standard survival analysis techniques inherently assume that all the
subjects have the same susceptibility to the disease and will eventually experience the event over
a sufficiently long period of follow-up. But the situation in which the studied population is a
mixture of uncured (susceptible individuals- who may experience the event of interest), and
cured (non-susceptible individuals- who will never experience the event), the standard survival
models are usually not appropriate because they do not account for the possibility of cure. Many
patients with disease like cancer can be long-term survivors, and thus cure models can be a
useful tool to analyze and describe their survival data. Progress in the treatment of cancer has led
to a spate of statistical research to develop cure models. These models are generally used to
model lifetime data with long term survivors with an objective to estimate the cure rate, survival
distribution and the effect of covariates. The cure fraction is a useful measure to monitor trends
2
and differences in survival of curable diseases. The first created cure model, which is still widely
used in survival analysis, is the model constructed by Boag in 1949 and later developed by
Berkson and Gage in 1952. The model proposed by him is formulated in terms of a mixture
model, which introduces a component representing the proportion of immunes in the population
and a distribution representing the survival experience of the susceptible, called the latency
distribution. There are many choices for latency distribution like Weibull, Gamma, Lognormal,
Gompertz, Exponential etc. Many analyses of cancer survival data are based on overall survival
or progression free survival (PFS). No patient can be “cured” of death, so in these situations cure
models can be used to model long-term survivors rather than cured patients (Othus et. al. 2012).
These models can be used to investigate heterogeneity between cured and uncured patients. Also
they are suitable for modeling censored and uncensored lifetime data. This model extends several
distributions widely used in the lifetime data analysis allowing flexibility in modeling monotone
and non-monotone shape hazard rates and it serves as a good alternative for the analysis of real
data sets. Generally, in a population cure is said to occur when the mortality rate in the diseased
group of individuals return to same level as that of general population. A straightforward way to
identify whether a particular dataset has long term survivors or not, is to look at their survival
curve. If the survival curve has a plateau at the end then a cure model may be an appropriate
model for analyzing that dataset. These models can be a useful alternative to Cox-Proportional
Hazard models as these can be used in situations where the assumption of proportionality fails.
Also these are helpful in determining the covariates which are associated with long-term as well
as short term effects. Cure models provide simultaneous estimates of the proportion of the
patients cured from the disease and the distribution of the survival times for uncured patients
(latency distribution). Achcar et. al.(2012) estimated the cure fraction by using a two parameter
Weibull distribution in the presence of cure fraction under both mixture and non mixture cure
models . A Bayesian analysis of the four-parameter generalized modified Weibull (GMW)
distribution in presence of cure fraction, censored data and covariates had been presented by
Martinez et al. in (2013). Kundu et al. (2010) used generalized exponential cure rate model with
covariates to estimate cure fraction under mixture and non mixture setup. Several authors
(Rahimzadeh et. al. (2014)) used Poisson distribution as latent distribution in analyzing long
term survivors. Yamaguchi K.(1992) used the generalized gamma to model the latency
distribution and the logistic function to model the cure fraction in terms of covariates. Yu et al.
3
(2004) established that among the various distributions namely lognormal, loglogistic, Weibull
and generalized gamma, the estimate of cure fraction was robust with generalized gamma
distribution.
Gompertz distribution is one of the commonly used distributions in survival analysis, as it is a
flexible distribution that can be skewed to the right and to the left and also is a continuous
probability distribution on ),0( that has an increasing failure rate. Chien-lin Su et .al. had used
a 2- parameter Gompertz distribution(scale and shape parameters) for the survival analysis of
smoking-cessation data in the presence of cure fraction. In addition to shape and scale parameter
of Gompertz distribution we have considered the acceleration parameter (which is related to an
accelerating factor in the imperfection time and it works as a factor of fragility in the survival of
the individual when the time increases) for determining the changes in the hazard rate with
respect to time. Also we have made an attempt to estimate the cure fraction using Bayesian
Approach while assuming beta and gamma priors on parameters under mixture and non mixture
cure models. Bayesian methods make it easier to estimate and analyze complicated problems,
while classical inference methods are quite cumbersome. Also, the Bayesian approach allows us
to include any prior information that we have on the parameters in the model and hence obtain a
much refined set of posterior estimates. An illustration of proposed methodology is applied to a
real dataset of melanoma cancer patients in the presence of cure data, censored observations and
covariates.
The rest of the paper is organized as follows. In the next section we describe (i) classes of cure
models namely , mixture and non mixture cure models, (ii) the 3-parameter Gompertz
Distribution and (iii) its likelihood under both mixture and non-mixture models. Then in section
3 results have been given, and finally the discussion has been presented in section 4.
- MATERIALS AND METHODS
2.1 Mixture Cure fraction Model
A mixture cure fraction model (Maller and Zhou (1996)), as the name suggests, is a mixture of
two types of individuals. In this model, population is divided into two segments viz. cured or
long term survivors and uncured or susceptibles.
4
Let p be the probability of an individual being cured (0 < p < 1) & (1-p) be the probability of an
individual being susceptible.
Then the survival function at time t can be defined as:
)1()()1()(0tSpptS
where )(0tS
is the baseline survival function for the susceptible individuals, which is assumed to
follow Generalized Gompertz Distribution with parameters λ, c, θ i.e.
),,(~)(0cGGDtS
The cdf for the lifetime T can be defined as
)(1)(tStF
= ))(1)(1(0tSp
= )()1(0tFp
Implying that the pdf is
)()1()(0tfptf
where )(0tf
is the baseline probability density function for the susceptible individuals.
Let ),(iit
be a random sample of size n(i=1,2,..,n) from the cancer data set, where it
is the
survival time for the thi
cancer patient and i
is the indicator variable defined as:
lifetimecensoredfor
lifetimeuncensoredfor
i
,0
,1
, i=1,2,..,n
Therefore, the contribution of thi cancer patient to the likelihood is given by
iiiiitStfL1)()(
= )2()()1()()1(100iiiitSpptfp
2.2 Non-Mixture Cure fraction Model
5
A non mixture cure fraction model (Achrar et. al.(2012)) define an asymptote for the cumulative
hazard and hence for the cure fraction. In this case the survival function is defined as:
)3()()]exp(log)(0)(0tFpptStF where )(1)(00tStF Under this model, the contribution of the thi subject to the likelihood function is given by: )()(iiitSthLi = )4()(log00iitFptfpi
where
h(t)= )(
)(
tS
tf
is the hazard function.
2.3 Generalized Gompertz Distribution For The Susceptible Individuals
As a special case let us assume a three parameter Gompertz Distribution introduced by El-
Gohary et al. (2013)) for the susceptible individuals with probability density function given by:
)5()]1(exp[1)]1(exp[)(
1
0
ctctcte
ce
cetf
where t, 0,,c . Here is a scale parameter, c is an acceleration parameter and θ is the shape
parameter.
Its survival function is given as:
)]1(exp[11)(ct
oe
ctS
Then the cumulative density function is )(1)(00tStF
. The 2- parameter Gompertz
distribution is a particular case of this distribution with shape parameter θ=1.
6
Now the log likelihood functions of this distribution under mixture and non-mixture cure fraction
models can be written as
)7……()mod(,}])]1(exp[1(1){1(log[)1(
)1(exp[1log()1(
)1(loglog)1log()(
elscuremixtureundere
cpp
e
c
e
ctcpl
i
i
i
ct
i
i
ct
i
i
ct
i
ii
i
i
i
i
i
i
i
i
),,,(
)8).(mod(,)])1(exp[1)((log
)1(exp[1log()1(
)1(loglog)loglog()(
cpwhere
elscuremixturenonundere
cp
e
c
e
ctcpl
and
i
ct
ct
i
i
ct
i
ii
i
i
i
i
i
i
i
i
i
i
i
The joint posterior distribution for the parameters of the model is obtained by combining the
joint prior distribution with the likelihood function for β. Although the joint posterior distribution
for the parameters of the proposed model is of great complexity, samples of the joint posterior
distribution can be generated using some existing MCMC (Markov Chain Monte Carlo)
simulation methods. A great computational simplification to simulate these samples is obtained
using the OpenBUGS software, where we only need to specify the distribution for the data and
the prior distributions for the parameters. For a Bayesian analysis of the mixture and non-mixture
models not including covariates, we assume a beta prior distribution Beta (a,b) for the proportion
p of the long-term survivors as p can be defined for values in the interval(0,1). We also assume
Gamma prior distribution for the parameters of Generalized Gompertz Distribution where
Gamma (a,b) denotes a gamma distribution with mean (a/b) and variance (a/b 2 ). Posterior
summaries of interest are obtained from simulated samples for the joint posterior distribution
using standard Markov Chain Monte Carlo (MCMC) procedures. Comparison between mixture
and non-mixture cure models is assessed by using the akaike information criteria (AIC). This
criteria provides a mean for model selection. The model with minimum value of AIC is
preferred. It is defined as:
7
AIC= -2 (ln(likelihood)) + 2K, where K is the number of parameters in the model
To obtain inferences regarding the predictors, we have consider the following regression model,
,),exp(332211andsurvivaloverallforxxxiiioi
.,
1log332211fractioncureforxxx
p
p
iiio
i
i
where ix1
is the sex of the patient (0=male, 1=female), ix2
denotes the treatment (1= IFN
treatment group, 0= control group), ix3
is the patient age for i=1,2,…,n. Assuming the mixture
and non-mixture models based on the Generalized Gompertz Distribution; let us consider normal
prior distributions N (0, 100) for the parameters of regression models. Thus, we are assuming
approximately non-informative priors for these parameters. Note that the parameter 2 is
related to the effect of the treatment on the cure fraction. If the credible interval for 2 includes
zero, we can conclude that there is no evidence of treatment effect on cure fraction.
- RESULTS
To illustrate the methodology, mixture and non- mixture cure models are fitted to the melanoma
dataset from the ECOG phase III clinical trial e1684, which is also illustrated by PSPMCM SAS
macro(Corbiere et. al.(2007)). This trial was a two-arm clinical trial involving patients
randomized to one of two treatment arms: high-dose interferon (IFN) or observation. The aim of
this trial is to evaluate the high dose interferon alpha-2b (IFN) regimen against the placebo as the
postoperative adjuvant therapy. After deleting missing data, a total number of 284 observations
is used in the analysis. Three covariates viz. treatment (0, control group; 1 IFN group), gender (0
for male, 1 for female) and age are taken into account both in the incidence and latency parts. A
total of 69% patients were found to be censored in the trial. The Kaplan- Meier estimate of the
survival function is given in figure1, where the presence of a plateau near 0.3 suggests that cure
model is suitable for this dataset.
Figure 1 . Kaplan Meier Survival Curve
8
0246810
Time
We first obtained the posterior summaries under Standard Weibull, Standard Gompertz and
Generalized Gompertz Distribution. The parameter estimates from all these three distributions
are quite similar (Table1). Moreover, from the AIC values, it has been observed that the
Generalized Gompertz Distribution has the smallest AIC value, concluding this distribution to be
best amongst all. In Table 1, we have the posterior summaries of the estimates considering
Bayesian approach for each of these probability distributions. The Bayesian estimates were
obtained using OpenBUGS software.
Table 1. Posterior Summaries not including the cure fraction p
Model
Paramete
r
Posterior
Mean(SD) 95% Credible Interval AIC
Standard Weibull(2-pmts.) λ 8.347(81.34) (0,37.9) 267.
9
5 θ 8.6(81.86) (0,38.4)
Standard Gompertz(2-pmts.) λ 8.961(89.01) (0,44.07) 392.
5 θ 9.945(102.6) (0,47.76)
GeneralizedGompertz(3-pmts.)
λ 8.63(84.32) (0,36.9)
196.
6 θ 10.54(110.6) (0,44.3)
c 9.19(91.16) (0,49.9)
As the result from table1 indicate GGD to be the best , we now fit cure model on this
distribution in our dataset of melanoma cancer patients .To analyze this dataset we consider the
mixture and non mixture cure fraction models defined earlier in the presence or not of covariates.
As a first analysis, we assume the cure fraction models not in presence of covariates. Table 2
presents the posterior summaries of parameters based on GGD under mixture and non mixture
cure models.
Table 2. Posterior summaries including the cure fraction p(in the absence of covariates)
Model
Parameter Posterior Mean(SD)
95% Credible
Interval AIC
Mixture Cure Model
λ 9.407(103) (0,30.97)
193.
9
θ 8.757(86.15) (0,41.12)
c 10.63(98.85) (0,67.23)
p 0.499(0.291) (0.024,0.976)
Non Mixture Cure Model
λ 6.785(63.32) (0,23.52)
θ 7.137(81.58) (0,32.1)
224.
5
c 9.037(86.89) (0,48.85)
p 0.498(0.291) (0.023,0.975)
From the fitted cure models in the absence of covariates, it has been observed that the mixture
and non mixture GGD fits well to the survival times. The results indicate that the cure fraction
(p) is significant under both the models. Also the AIC value of mixture cure model (193.9) is less
than the non-mixture cure model (224.5). Table3 gives the estimates of the regression model
considering Bayesian approach under both the models in the presence of covariates.
Table 3. Posterior summaries including cure fraction p (in the presence of covariates)
10
Model Parameter
Posterior
Mean SD 95% Credible Interval
Mixture Cure Model
α 0(intercept) 0.0002 0.102 (-0.192,0.195)
α 1(sex) 0.0013 0.099 (-0.196,0.191)
α 2(treatment) -0.0005 0.100 (0.189,0.199)
α 3(age) 0.0029 0.098 (-0.189,0.198)
β 0(intercept) 0.0009 0.101 (-0.199,0.199)
β 1(sex) -0.0032 0.098 (-0.198,0.186)
β 2(treatment) -0.0011 0.101 (0.194,0.201)
β 3(age) -0.0007 0.098 (-0.188,0.192)
c 11.55 113.10 (0,55.97)
θ 7.037 60.71 (0.29.75)
Non-Mixture Cure Model
α 0(intercept) -0.0010 0.100 (-0.194,0.193)
α 1(sex) 0.0007 0.100 (-0.196,0.198)
α 2(treatment) -0.0008 0.099 (0.196,0.194)
α 3(age) 0.0007 0.100 (-0.195,0.201)
β 0(intercept) -0.0015 0.101 (-0.199,0.199)
β 1(sex) 0.0002 0.099 (-0.195,0.196)
β 2(treatment) -0.0002 0.100 (0.197,0.200)
β 3(age) 0.0008 0.100 (-0.194,0.195)
c 10.11 96.03 (0,43.94)
θ 12.95 130.8 (0,64.01)
From Table 3, we observe that the similar results are obtained for both the models. A significant
comparison reveals that 95% Credible Interval for α 2 & β 2 does not include zero suggesting that
the IFN treatment has a significant effect on the survival and cure probability, and the covariates
namely age and sex have no significant effect on the survival and cure probability. Figure 2
shows the dynamic trace and posterior density plots of cure fraction under mixture and non
mixture cure models. The trace plot indicates that the Markov chain has stabilized with good mixing
and hence MCMC algorithm converged, and the kernel density plot estimates the posterior marginal
distribution.
Figure 2: Trace plots for convergence diagnostics and marginal posterior kernel density
plots.
11
Dynamic trace and posterior Density plot of cure fraction under mixture model
iteration
195019001850
p sample: 4000
p
-0.50.00.51.01.5
Dynamic trace and posterior Density plot of cure fraction under non- mixture model
iteration
195019001850
p sample: 6000
p
-0.50.00.51.01.5
- DISCUSSION
The purpose of this study is to show the utility of Generalized Gompertz distribution under cure
model. This model has been well developed in the statistical literature, but is not as common in
the clinical literature. For diseases like cancer in which patients are long-term survivors, cure
model, can provide an interesting way to characterize and study their survival. This model is
generally used to model lifetime data with long term survivors. Even though the cure models are
first proposed by Boag in 1949 but they are still widely used in survival analysis, as they provide
a measure to monitor trends and differences in survival of curable diseases. These are the only
models which provide an estimate for proportion of cure patients and distribution of survival
times of uncured patients simultaneously. These models are very appropriate when the studied
population is a mixture of cure (who does not experience the event) and susceptible individuals
(who experience the event). There are two classes of cure models, mixture and non-mixture
models, both of which can describe short-term and long-term effects. One advantage of these
models besides estimating cure rate is to reduce them to a common survival model in absence of
cure patients i.e. if the study or follow-up time is long enough, then also these models are
12
reliable. Also they incorporate several distributions which are widely used in the lifetime data
analysis, allowing flexibility in modeling their monotone and non-monotone shape hazard rates.
Our aim in this study is to show the utility of Generalized Gompertz Distribution under cure
model. For achieving this, we have developed a cure fraction regression model using
Generalized Gompertz Distribution. As the Kaplan Meier survival curve shows a long and stable
plateau with heavy censoring at the tail, it may be taken as evidence of a cured fraction. Mixture
and Non-mixture cure models are fitted over a dataset of 284 melanoma cancer patients with
baseline survivor function as generalized gompertz distribution. A descriptive comparison
reveals that mixture cure models are best fitted than non-mixture cure models which match with
the findings of Achcar et. al.(2012). As treatment effect is significant across both the models,
implying that IFN treatment improve the cure rate and survival of patient which has also been
proved by Kirkwood et. al.(1996).
As discussed above, some alternative parametric distributions could also be considered that
provide more flexibility in the shape of excess mortality/relative survival functions, while still
giving reliable estimates of cure fraction.