Accelerated Failure Time with Smooth Rank Regression

Fits a semiparametric accelerated failure time (AFT) model with rank-based approach. General weights, additional sampling weights and fast sandwich variance estimations are also incorporated. Estimating equations are solved with Barzilar-Borwein spectral method implemented as BBsolve in package BB.

aftsrr(
  formula,
  data,
  subset,
  id = NULL,
  contrasts = NULL,
  weights = NULL,
  B = 100,
  rankWeights = c("gehan", "logrank", "PW", "GP", "userdefined"),
  eqType = c("is", "ns", "mis", "mns"),
  se = c("NULL", "bootstrap", "MB", "ZLCF", "ZLMB", "sHCF", "sHMB", "ISCF", "ISMB"),
  control = list()
)

Arguments

formula

a formula expression, of the form response ~ predictors. The response is a Surv object object with right censoring. See the documentation of lm, coxph and formula for details.

data

an optional data frame in which to interpret the variables occurring in the formula.

subset

an optional vector specifying a subset of observations to be used in the fitting process.

id

an optional vector used to identify the clusters. If missing, then each individual row of data is presumed to represent a distinct subject. The length of id should be the same as the number of observation.

contrasts

an optional list.

weights

an optional vector of observation weights.

B

a numeric value specifies the resampling number. When B = 0 or se = NULL, only the beta estimate will be displayed.

rankWeights

a character string specifying the type of general weights. The following are permitted:

logrank: logrank weight
gehan: Gehan's weight
PW: Prentice-Wilcoxon weight
GP: GP class weight
userdefined: a user defined weight provided as a vector with length equal to the number of subject. This argument is still under-development.

eqType

a character string specifying the type of the estimating equation used to obtain the regression parameters. The following are permitted:

is: Regression parameters are estimated by directly solving the induced-smoothing estimating equations. This is the default and recommended method.
ns: Regression parameters are estimated by directly solving the nonsmooth estimating equations.
mis: Regression parameters are estimated by iterating the monotonic smoothed Gehan-based estimating equations. This is typical when rankWeights = "PW" and rankWeights = "GP".
mns: Regression parameters are estimated by iterating the monotonic non-smoothed Gehan-based estimating equations. This is typical when rankWeights = "PW" and rankWeights = "GP".

se

a character string specifying the estimating method for the variance-covariance matrix. The following are permitted:

NULL: if se is specified as NULL, the variance-covariance matrix will not be computed.
bootstrap: nonparametric bootstrap.
MB: multiplier resampling.
ZLCF: Zeng and Lin's approach with closed form $V$, see Details.
ZLMB: Zeng and Lin's approach with empirical $V$, see Details.
sHCF: Huang's approach with closed form $V$, see Details.
sHMB: Huang's approach with empirical $V$, see Details.
ISCF: Johnson and Strawderman's sandwich variance estimates with closed form $V$, see Details.
ISMB: Johnson and Strawderman's sandwich variance estimates with empirical $V$, see Details.

control

controls equation solver, maxiter, tolerance, and resampling variance estimation. The available equation solvers are BBsolve and dfsane of the BB package. The default algorithm control parameters are used when these functions are called. However, the monotonicity parameter, M, can be specified by users via the control list. When M is specified, the merit parameter, noimp, is set at $$10 * M$$. The readers are refered to the BB package for details. Instead of searching for the zero crossing, options including BBoptim and optim will return solution from maximizing the corresponding objective function. When se = "bootstrap" or se = "MB", an additional argument parallel = TRUE can be specified to enable parallel computation. The number of CPU cores can be specified with parCl, the default number of CPU cores is the integer value of detectCores() / 2.

Value

aftsrr returns an object of class "aftsrr" representing the fit. An object of class "aftsrr" is a list containing at least the following components:

beta: A vector of beta estimates
covmat: A list of covariance estimates
convergence: An integer code indicating type of convergence.
bhist: When variance = "MB", bhist gives the bootstrap samples.

Details

When se = "bootstrap" or se = "MB", the variance-covariance matrix is estimated through a bootstrap fashion. Bootstrap samples that failed to converge are removed when computing the empirical variance matrix. When bootstrap is not called, we assume the variance-covariance matrix has a sandwich form $$\Sigma = A^{-1}V(A^{-1})^T,$$ where $V$ is the asymptotic variance of the estimating function and $A$ is the slope matrix. In this package, we provide seveal methods to estimate the variance-covariance matrix via this sandwich form, depending on how $V$ and $A$ are estimated. Specifically, the asymptotic variance, $V$, can be estimated by either a closed-form formulation (CF) or through bootstrap the estimating equations (MB). On the other hand, the methods to estimate the slope matrix $A$ are the inducing smoothing approach (IS), Zeng and Lin's approach (ZL), and the smoothed Huang's approach (sH).

References

Chiou, S., Kang, S. and Yan, J. (2014) Fast Accelerated Failure Time Modeling for Case-Cohort Data. Statistics and Computing, 24(4): 559--568.

Chiou, S., Kang, S. and Yan, J. (2014) Fitting Accelerated Failure Time Model in Routine Survival Analysis with R Package Aftgee. Journal of Statistical Software, 61(11): 1--23.

Huang, Y. (2002) Calibration Regression of Censored Lifetime Medical Cost. Journal of American Statistical Association, 97, 318--327.

Johnson, L. M. and Strawderman, R. L. (2009) Induced Smoothing for the Semiparametric Accelerated Failure Time Model: Asymptotic and Extensions to Clustered Data. Biometrika, 96, 577 -- 590.

Varadhan, R. and Gilbert, P. (2009) BB: An R Package for Solving a Large System of Nonlinear Equations and for Optimizing a High-Dimensional Nonlinear Objective Function. Journal of Statistical Software, 32(4): 1--26

Zeng, D. and Lin, D. Y. (2008) Efficient Resampling Methods for Nonsmooth Estimating Functions. Biostatistics, 9, 355--363

Examples

## Simulate data from an AFT model
datgen <- function(n = 100) {
    x1 <- rbinom(n, 1, 0.5)
    x2 <- rnorm(n)
    e <- rnorm(n)
    tt <- exp(2 + x1 + x2 + e)
    cen <- runif(n, 0, 100)
    data.frame(Time = pmin(tt, cen), status = 1 * (tt < cen),
               x1 = x1, x2 = x2, id = 1:n)
}
set.seed(1); dat <- datgen(n = 50)
summary(aftsrr(Surv(Time, status) ~ x1 + x2, data = dat, se = c("ISMB", "ZLMB"), B = 10))
#> Call:
#> aftsrr(formula = Surv(Time, status) ~ x1 + x2, data = dat, B = 10, 
#>     se = c("ISMB", "ZLMB"))
#> 
#> Variance Estimator: ISMB
#>    Estimate StdErr z.value p.value    
#> x1    1.012  0.328   3.084   0.002 ** 
#> x2    0.817  0.079  10.301  <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Variance Estimator: ZLMB
#>    Estimate StdErr z.value p.value    
#> x1    1.012  0.294   3.443   0.001 ***
#> x2    0.817  0.077  10.675  <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

## Data set with sampling weights
data(nwtco, package = "survival")
subinx <- sample(1:nrow(nwtco), 668, replace = FALSE)
nwtco$subcohort <- 0
nwtco$subcohort[subinx] <- 1
pn <- mean(nwtco$subcohort)
nwtco$hi <- nwtco$rel + ( 1 - nwtco$rel) * nwtco$subcohort / pn
nwtco$age12 <- nwtco$age / 12
nwtco$study <- factor(nwtco$study)
nwtco$histol <- factor(nwtco$histol)
sub <- nwtco[subinx,]
fit <- aftsrr(Surv(edrel, rel) ~ histol + age12 + study, id = seqno,
              weights = hi, data = sub, B = 10, se = c("ISMB", "ZLMB"),
              subset = stage == 4)
summary(fit)
#> Call:
#> aftsrr(formula = Surv(edrel, rel) ~ histol + age12 + study, data = sub, 
#>     subset = stage == 4, id = seqno, weights = hi, B = 10, se = c("ISMB", 
#>         "ZLMB"))
#> 
#> Variance Estimator: ISMB
#>         Estimate StdErr z.value p.value    
#> histol2   -4.824  1.081  -4.462  <2e-16 ***
#> age12     -0.147  0.170  -0.864   0.387    
#> study4     1.999  0.697   2.868   0.004 ** 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Variance Estimator: ZLMB
#>         Estimate StdErr z.value p.value    
#> histol2   -4.824  0.401 -12.024  <2e-16 ***
#> age12     -0.147  0.060  -2.458   0.014 *  
#> study4     1.999  0.331   6.034  <2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1