Fits a "rocTree
" model.
rocTree(formula, data, id, subset, ensemble = TRUE, splitBy = c("dCON", "CON"), control = list())
formula | is a formula object, with the response on the left of a '~' operator, and the terms on the right. The response must be a survival object returned by the 'Surv' function. |
---|---|
data | is an optional data frame in which to interpret the variables occurring in the 'formula'. |
id | is an optional vector used to identify the longitudinal observations of subject's id. The length of 'id' should be the same as the total number of observations. If 'id' is missing, each row of `data` represents a distinct observation from a subject and all covariates are treated as a baseline covariate. |
subset | is an optional vector specifying a subset of observations to be used in the fitting process. |
ensemble | is an optional logical value. If |
splitBy | is a character string specifying the splitting algorithm. The available options are 'CON' and 'dCON' corresponding to the splitting algorithm based on the total concordance measure or the difference in concordance measure, respectively. The default value is 'dCON'. |
control | a list of control parameters. See 'details' for important special features of control parameters. |
An object of S4 class "rocTree
" representig the fit, with the following components:
The argument "control" defaults to a list with the following values:
tau
is the maximum follow-up time; default value is the 90th percentile of the unique observed survival times.
maxTree
is the number of survival trees to be used in the ensemble method (when ensemble = TRUE
).
maxNode
is the maximum node number allowed to be in the tree; the default value is 500.
numFold
is the number of folds used in the cross-validation. When numFold > 0
, the survival tree will be pruned;
when numFold = 0
, the unpruned survival tree will be presented. The default value is 10.
h
is the smoothing parameter used in the Kernel; the default value is tau / 20
.
minSplitTerm
is the minimum number of baseline observations in each terminal node; the default value is 15.
minSplitNode
is the minimum number of baseline observations in each splitable node; the default value is 30.
disc
is a logical vector specifying whether the covariates in formula
are discrete (TRUE
) or continuous (FALSE
).
The length of disc
should be the same as the number of covariates in formula
. When not specified, the rocTree()
function assumes continuous covariates for all.
K
is the number of time points on which the concordance measure is computed.
A less refined time grids (smaller K
) generally yields faster speed but a very small K
is not recommended. The default value is 20.
Sun Y. and Wang, M.C. (2018+). ROC-guided classification and survival trees. Technical report.
See print.rocTree
and plot.rocTree
for printing and plotting an rocTree
, respectively.
data(simDat) ## Fitting a pruned survival tree rocTree(Surv(Time, death) ~ z1 + z2, id = id, data = simDat, ensemble = FALSE)#> ROC-guided survival tree #> #> node), split #> * denotes terminal node #> #> Root #> ¦--2) z1 <= 0.32338* #> °--3) z1 > 0.32338 #> ¦--6) z2 <= 0.60199* #> °--7) z2 > 0.60199* #>## Fitting a unpruned survival tree rocTree(Surv(Time, death) ~ z1 + z2, id = id, data = simDat, ensemble = FALSE, control = list(numFold = 0))#> ROC-guided survival tree #> #> node), split #> * denotes terminal node #> #> Root #> ¦--2) z1 <= 0.32338 #> ¦ ¦--4) z1 <= 0.16418* #> ¦ °--5) z1 > 0.16418* #> °--3) z1 > 0.32338 #> ¦--6) z2 <= 0.60199 #> ¦ ¦--12) z2 <= 0.22388* #> ¦ °--13) z2 > 0.22388* #> °--7) z2 > 0.60199* #># NOT RUN { ## Fitting the ensemble algorithm (default) rocTree(Surv(Time, death) ~ z1 + z2, id = id, data = simDat, ensemble = TRUE) # }