rocTree
packagevignettes/rocTree-sim.Rmd
rocTree-sim.Rmd
In this vignette, we demonstrate how to use the simu
function in rocTree
package to generate simulated data from various scenarios.
Let \(Z\) be a \(p\)-dimensional vector of possible time-dependent covariate and \(\beta\) be the vector of regression coefficient. The function simu
generates survival times (\(T\)) under the following scenarios:
Scenario 1.1, proportional hazards model:
Survival times are generated from the hazard function \[\lambda(t|Z) = \lambda_0(t)\exp\{-0.5Z_1 + 0.5Z_2 - 0.5Z_3 + \ldots + 0.5Z_{10}\},\] with \(\lambda_0(t)=2t\).
Scenario 1.2, proportional hazards model with noise variable:
Survival times are generated from the hazard function \[\lambda(t|Z) = \lambda_0(t)\exp\{2Z_1 + 2Z_2 + 0\cdot Z_3 + 0\cdot Z_4 + \ldots + 0\cdot Z_{10}\},\] with \(\lambda_0(t)=2t\).
Scenario 1.3, proportional hazards model with nonlinear covariate effects:
Survival times are generated from the hazard function \[\lambda(t|Z) = \lambda_0(t) \exp\{2\sin(2\pi Z_1) + 2 |Z_2 - 0.5|\}, \] with \(\lambda_0(t)=2t\).
Scenario 1.4, accelerated failure time model:
Survival times are generated from \[\log(T) = -2 + 2Z_1 + 2Z_2 + \epsilon, \] where \(\epsilon\sim\mbox{N}(0, 0.5^2)\).
Scenario 1.5, generalized gamma family:
Survival times are generated from \[T = e^{\sigma\omega}, \] where \(\omega = \log(Q^2g)/Q\), \(g\) follows gamma\((Q^{-2}, 1)\), \(\sigma = 2Z_1\), \(Q=2Z_2\).
Scenario 2.1, dichotomous time dependent covariate with at most one change in value:
Survival times are generated from the hazard function \[\lambda(t|Z(t)) = e^{2Z_1(t) + 2Z_2}, \] where \(Z_1(t) = \theta I(t\ge U_0) + (1 - \theta)I(t<U_0)\), \(\theta\) is a Bernoulli variable with equal probability, and \(U_0\) follows a uniform over \([0,1]\).
Scenario 2.2, dichotomous time dependent covariate with multiple jumps:
Survival times are generated from the hazard function \[\lambda(t|Z(t)) = e^{2Z_1(t) + 2Z_2}, \] where \(Z_1(t) = \theta\left[I(U_1 \le t < U_2) + I(U_3\le t)\right] + (1 - \theta)\left[I(t < U_1) + I(U_2\le t < U_3)\right]\), \(\theta\) is a Bernoulli variable with equal probability and \(U_1\le U_2\le U_3\) are the first three terms of a stationary Poisson process with rate 10.
Scenario 2.3, proportional hazard model with a continuous time dependent covariate:
Survival times are generated from the hazard function \[\lambda(t|Z(t)) = 0.1 e^{Z_1(t) + Z_2}, \] where \(Z_1(t)=kt+b\), \(k\) and \(b\) are independent uniform random variables over \([1, 2]\).
Scenario 2.4, non-proportional hazard model with a continuous time dependent covariate:
Survival times are generated from the hazard function \[\lambda(t|Z(t)) = 0.1 \cdot\left[1 + \sin\{Z_1(t) + Z_2\}\right],\] where \(Z_1(t)=kt+b\), \(k\) and \(b\) are independent uniform random variables over \([1, 2]\).
Scenario 2.5, non-proportional hazard model with a nonlinear time dependent covariate:
Survival times are generated from the hazard function \[\lambda(t|Z(t)) = 0.1 \cdot\left[1 + \sin\{Z_1(t) + Z_2\}\right],\] where \(Z_1(t)=2kt\cdot\{I(t>5) - 1\}\), \(k\) and \(b\) are independent uniform random variables over \([1, 2]\).
simu
functionThe simu
function can be used to generate survival times from the above scenarios. The complete list of arguments in simu
are as follow:
The arguments are as follows
n
an integer value indicating the number of subjects.cen
is a numeric value indicating the censoring percentage; three levels, 0%, 25%, 50%, are allowed.scenario
can be either a numeric value or a character sting. This indicates the simulation scenario noted above.summary
a logical value indicating whether a brief data summary will be printed.The simu
places the simulated data in a tibble
environment with the columns:
id
is the subject id.Time
is the observed follow-up time.death
is the death indicator; death = 1
if an event (death) occurs and death = 0
if censored.z1
–z10
are the possible time-dependent covariate.k
, b
, U
are the latent variables used to generate \(Z_1(t)\) in Scenario 2.1 – 2.5.We first generate a small dataset with n = 5
, 25% censoring rate, under scenario 1.2.
> set.seed(2019)
> dat1 <- simu(n = 5, cen = 0.25, sce = 1.2, summary = TRUE)
Summary results:
Number of subjects: 5
Number of subjects experienced death: 4
Number of covariates: 10
Time independent covaraites: z1 z2 z3 z4 z5 z6 z7 z8 z9 z10
Number of unique observation times: 5
Median survival time: 0.4231
> dat1
id Time death z1 z2 z3 z4
1 1 0.0931342 0 0.76990155 0.043218804 0.7698180 0.63545297
2 1 0.1464104 0 0.76990155 0.043218804 0.7698180 0.63545297
3 1 0.3397603 0 0.76990155 0.043218804 0.7698180 0.63545297
4 1 0.4231000 1 0.76990155 0.043218804 0.7698180 0.63545297
5 2 0.0931342 0 0.71283973 0.820176206 0.6605425 0.06812013
6 2 0.1464104 0 0.71283973 0.820176206 0.6605425 0.06812013
7 2 0.3397603 1 0.71283973 0.820176206 0.6605425 0.06812013
8 3 0.0931342 0 0.30336020 0.009614496 0.2169243 0.70031486
9 3 0.1464104 0 0.30336020 0.009614496 0.2169243 0.70031486
10 3 0.3397603 0 0.30336020 0.009614496 0.2169243 0.70031486
11 3 0.4231000 0 0.30336020 0.009614496 0.2169243 0.70031486
12 3 1.2479563 1 0.30336020 0.009614496 0.2169243 0.70031486
13 4 0.0931342 0 0.61823636 0.102491504 0.1950175 0.37479527
14 4 0.1464104 0 0.61823636 0.102491504 0.1950175 0.37479527
15 5 0.0931342 1 0.05048374 0.608572199 0.6947276 0.46909425
z5 z6 z7 z8 z9 z10
1 0.4059750 0.01103272 0.644595051 0.08595205 0.6123926 0.4948702
2 0.4059750 0.01103272 0.644595051 0.08595205 0.6123926 0.4948702
3 0.4059750 0.01103272 0.644595051 0.08595205 0.6123926 0.4948702
4 0.4059750 0.01103272 0.644595051 0.08595205 0.6123926 0.4948702
5 0.5814981 0.59250859 0.009253307 0.37178613 0.2572539 0.9159192
6 0.5814981 0.59250859 0.009253307 0.37178613 0.2572539 0.9159192
7 0.5814981 0.59250859 0.009253307 0.37178613 0.2572539 0.9159192
8 0.9472358 0.94830891 0.809916487 0.44626045 0.8335848 0.9305124
9 0.9472358 0.94830891 0.809916487 0.44626045 0.8335848 0.9305124
10 0.9472358 0.94830891 0.809916487 0.44626045 0.8335848 0.9305124
11 0.9472358 0.94830891 0.809916487 0.44626045 0.8335848 0.9305124
12 0.9472358 0.94830891 0.809916487 0.44626045 0.8335848 0.9305124
13 0.1710192 0.62405627 0.170301794 0.01181090 0.6711542 0.2094706
14 0.1710192 0.62405627 0.170301794 0.01181090 0.6711542 0.2094706
15 0.1318687 0.45561955 0.038503417 0.78263931 0.8071130 0.1992966
> class(dat1)
[1] "data.frame"
In this scenario, the covariate information was observed at Time = 0.0931
, 0.146
, 0.340
, and 0.423
for subject #1, who died (death = 1
) at Time = 0.423
. Since the covariate are time-independent, its values is invariant to time.
The following codes generate a small dataset with n = 5
, 50% censoring rate, under scenario 2.1.
> set.seed(2019)
> dat2 <- simu(n = 5, cen = 0.5, sce = 2.1, summary = TRUE)
Summary results:
Number of subjects: 5
Number of subjects experienced death: 1
Number of covariates: 2
Time independent covaraites: z1.
Time dependent covaraites: z2.
Number of unique observation times: 5
Median survival time: NA
> dat2
id Time death z1 z2 e u
1 1 0.008826172 0 0 0.76990155 0 0.10792722
2 1 0.101672586 1 0 0.76990155 0 0.10792722
3 2 0.008826172 0 1 0.71283973 1 0.06421699
4 2 0.101672586 0 0 0.71283973 1 0.06421699
5 2 0.105494961 0 0 0.71283973 1 0.06421699
6 2 0.136815371 0 0 0.71283973 1 0.06421699
7 3 0.008826172 0 0 0.30336020 0 0.30429404
8 3 0.101672586 0 0 0.30336020 0 0.30429404
9 3 0.105494961 0 0 0.30336020 0 0.30429404
10 4 0.008826172 0 0 0.61823636 0 0.05418119
11 5 0.008826172 0 1 0.05048374 1 0.43387271
12 5 0.101672586 0 1 0.05048374 1 0.43387271
13 5 0.105494961 0 1 0.05048374 1 0.43387271
14 5 0.136815371 0 1 0.05048374 1 0.43387271
15 5 0.474006872 0 0 0.05048374 1 0.43387271
In this scenario, the covariate information was observed at Time = 0.00883
and 0.102
for subject #1, who died (death = 1
) at Time = 0.102
. Similarly, the covariate information was observed at Time = 0.00883
, 0.102
, 0.105
, and 0.137
for subject #2, who was censored (death = 0
) at Time 0.137
. Moreover, z1
is a time-dependent covariate and its value changed from 1 (at Time = 0.00883
) to 0 ( at Time
\(\ge\) 0.102
) for subject #2.