In this vignette, we demonstrate how to create a recurrent event object with the Recur() function from the reda package (Wang et al. 2021). The Recur() function is imported when the reReg package is loaded. The Recur object bundles together a set of recurrent times, failure time, and censoring status, with the convenience that it can be used as the response in model formula in the reReg package. We will illustrate the usage of Recur() with the cgd data set from the survival (Therneau 2021) and the readmission data set from the frailtypack package (Rondeau, Mazroui, and González 2012), (González et al. 2005).

> library(reReg)
> packageVersion("reReg")
[1] '1.4.6'
> data(readmission, package = "frailtypack")
> head(readmission)
  id enum t.start t.stop time event      chemo    sex dukes charlson death
1  1    1       0     24   24     1    Treated Female     D        3     0
2  1    2      24    457  433     1    Treated Female     D        0     0
3  1    3     457   1037  580     0    Treated Female     D        0     0
4  2    1       0    489  489     1 NonTreated   Male     C        0     0
5  2    2     489   1182  693     0 NonTreated   Male     C        0     0
6  3    1       0     15   15     1 NonTreated   Male     C        3     0
> readmission <- subset(readmission, !(id %in% c(60, 109, 280)))

The Recur interface

The Recur() function is modeled after the Surv() function in the survival package (Therneau 2021). The function interface of Recur() is

> args(Recur)
function (time, id, event, terminal, origin, check = c("hard", 
    "soft", "none"), ...) 
NULL

The six arguments are

  • time: event and censoring times.

    It can be a vector that represents the time of recurrent events and censoring, or as a list of time intervals that contains the starting time and the ending time of the interval. In the latter, the intervals are assumed to be open on the left and closed on the right, where the right end points are the time of recurrent events and censoring.

  • id: subject’s id.

    It can be numeric vector, character vector, or a factor vector.
    If it is left unspecified, Recur() will assume that each row represents a subject.

  • event: event indicator of recurrent events.

    This is a numeric vector that represents the types of the recurrent events. Logical vector is allowed and converted to numeric vector. Non-positive values are internally converted to zero indicating censoring status.

  • terminal: event indicator of terminal events.

    This is a numeric vector that represents the status of the terminal event. Logical vector is allowed and converted to numeric vector. Non-positive values are internally converted to zero indicating censoring status. If a scalar value is specified, all subjects will have the same status of terminal events at their last recurrent episodes. The length of the specified terminal should be equal to the number of subjects, or number of data rows. In the latter case, each subject may have at most one positive entry of terminal at the last recurrent episode.

  • origin: time origin of subjects.

    This is a numerical vector indicating the time origin of each subject. If a scalar value is specified, all subjects will have the same origin at the specified value. The length of the specified origin should be equal to the number of subjects, or number of data rows. In the latter case, different subjects may have different origins. However, one subject must have the same origin. In addition to numeric values, Date and difftime are also supported and converted to numeric values.

  • check: indicates how to run the data checking procedure.

    This is a character value specifying how to perform the checks for recurrent event data. Errors or warnings will be thrown, respectively, if the check is specified to be "hard" (default) or "soft". If check = "none" is specified, no data checking procedure will be run.

The Recur object

When the time origin is zero for all subjects as in the readmission data set, the time argument can be specified with time = t.stop or with time = t.start %to% t.stop, where the infix operator %to% is used to create a list of two elements containing the endpoints of the time intervals. When check = "hard" or check = "soft", the Recur() function performs an internal check for possible issues on the data structure. The Recur() function terminates and issues an error message once the check failed if check = "hard" (default). On the contrary, Recur() would proceed with a warning message when check = "soft" or without a warning message when check = "none". The checking criterion includes the following:

The Recur() function matches the arguments by position when the arguments’ names are not specified. Among all the arguments, only the argument time does not have default values and has to be specified by users. The default value for the argument id is seq_along(time), thus, Recur() assumes each row specifies the time point for each subject when id is not specified. However, using the default value id defeats the purpose using recurrent event methods. The default value for the argument event is a numerical vector, where the values 0 and 1 are used to indicate whether the endpoint of the time intervals in time is a non-recurrent event or a recurrent event, respectively. The event argument can accommodate more than one types of recurrent events; in this case the reference level (value 0) is used to indicate non-recurrent event. On the other hand, a zero vector is used as the default value for arguments terminal and orgin.

The default values in Recur() are chosen so that Recur() can be conveniently adopted in common situations. For example, in situations where the recurrent events are observed continuously and in the absence of terminal events, the event and terminal arguments can be left unspecified. In this case, the last entry within each subject will be treated as a censoring time. One example is the cgd data from the survival package, where the recurrent event is the serious infection observed from a placebo controlled trial of gamma interferon in chronic granulotamous disease. A terminal event was not defined in the cgd data and the patients were observed through the end of study. For this dataset, the Recur object can be constructed as below:

> data(cgd, package = "survival")
> (recur1 <- with(cgd, Recur(tstart %2% tstop, id)))
...
  [1] 1: (0, 219], (219, 373], (373, 414+]      
  [2] 2: (0, 8], (8, 26], ..., (350, 439+]      
  [3] 3: (0, 382+]                              
  [4] 4: (0, 388+]                              
  [5] 5: (0, 246], (246, 253], (253, 383+]      
  [6] 6: (0, 364+]                              
  [7] 7: (0, 292], (292, 364+]                  
  [8] 8: (0, 363+]                              
  [9] 9: (0, 294], (294, 349+]                  
 [10] 10: (0, 371+]                             
...

For each subject, the function Recur() prints intervals to represent the duration until the next event (a recurrent event or a terminal event). The Recur object for the readmission dataset can be constructed as below:

> (recur2 <- with(readmission, Recur(t.stop, id, event, death)))
...
  [1] 1: (0, 24], (24, 457], (457, 1037+]           
  [2] 2: (0, 489], (489, 1182+]                     
  [3] 3: (0, 15], (15, 783*]                        
  [4] 4: (0, 163], (163, 288], ..., (686, 2048+]    
  [5] 5: (0, 1134], (1134, 1144+]                   
  [6] 6: (0, 627], (627, 1190], ..., (1406, 1407+]  
  [7] 7: (0, 38], (38, 42], ..., (63, 1049+]        
  [8] 8: (0, 1466*]                                 
  [9] 9: (0, 148], (148, 1474+]                     
 [10] 10: (0, 1113+]                                
...

The readmission example above shows patient id #1 experienced two hospital readmissions with a terminal event at t = 1037 (days). The + at t = 1037 indicates the terminal time was censored, e.g., this patient did not experience the event of interest (death) at t = 1037. Similarly, patient id #3 has one readmission and died at t = 783 (days) as indicated by * at 783. On the other hand patient id # 4 has more than 3 readmissions and was censored at t = 2048 (days). The readmission intervals was suppressed to prevent printing results wider than the screen allowance. The number of intervals to be printed can be tuned using the options and argument reda.Recur.maxPrint.

The Recur output

The Recur() returns an S4-class representing model response for recurrent event data. The following shows the structure of the Recur object created for cgd data.

> str(recur1)
Formal class 'Recur' [package "reda"] with 9 slots
  ..@ .Data     : num [1:203, 1:6] 0 219 373 0 8 26 152 241 249 322 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:6] "time1" "time2" "id" "event" ...
  ..@ call      : language Recur(time = tstart %2% tstop, id = id)
  ..@ ID        : chr [1:128] "1" "2" "3" "4" ...
  ..@ ord       : int [1:203] 1 2 3 4 5 6 7 8 9 10 ...
  ..@ rev_ord   : int [1:203] 1 2 3 4 5 6 7 8 9 10 ...
  ..@ first_idx : int [1:128] 1 4 12 13 14 17 18 20 21 23 ...
  ..@ last_idx  : int [1:128] 3 11 12 13 16 17 19 20 22 23 ...
  ..@ check     : chr "hard"
  ..@ time_class: chr "integer"
  ..$ dim     : int [1:2] 203 6
  ..$ dimnames:List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:6] "time1" "time2" "id" "event" ...

The slots of the Recur S4-class are

  • .Data: a numerical matrix with columns time1, time2, id, event, terminal, and origin.
  • call: a function call producing the Recur object.
  • ID: a character string storing the original subject ID.
  • ord: indices that sort the response matrix by rows. Sorting is in an increasing order by id, time2, and -event.
  • rev_ord: indices that revert the increasingly sorted response matrix by ord to its original ordering.
  • first_idx: indices that indicates the first record of each subject in the sorted matrix.
  • last_idx: indices that indicates the last record of each subject in the sorted matrix.
  • check: a character string that records the specified check argument in Recur().
  • time_class: a character string the original times if specified in calendar dates.

The summary for Recur object can be printed with summary().

> summary(recur1)
Call: 
Recur(time = tstart %2% tstop, id = id)

Sample size:                                    128 
Number of recurrent event observed:             75 
Average number of recurrent event per subject:  0.586 
Proportion of subjects with a terminal event:   0 
Median follow-up time:                          293 
> summary(recur2)
Call: 
Recur(time = t.stop, id = id, event = event, terminal = death)

Sample size:                                    400 
Number of recurrent event observed:             452 
Average number of recurrent event per subject:  1.13 
Proportion of subjects with a terminal event:   0.265 
Median follow-up time:                          1143 

Addendum

Readers are referred to a separate vignette on Recur() for a detailed introduction of Recur(). The reSurv() function is being deprecated in Version 1.2.0. In the current version, the reSurv() function can still be used, but the reSurv object will be automatically transformed to the corresponding Recur object.

Reference

González, Juan Ramón, Esteve Fernandez, Víctor Moreno, Josepa Ribes, Mercè Peris, Matilde Navarro, Maria Cambray, and Josep Maria Borrás. 2005. “Sex Differences in Hospital Readmission Among Colorectal Cancer Patients.” Journal of Epidemiology & Community Health 59 (6): 506–11.

Rondeau, Virginie, Yassin Mazroui, and Juan Ramń González. 2012. “frailtypack: An R Package for the Analysis of Correlated Survival Data with Frailty Models Using Penalized Likelihood Estimation or Parametrical Estimation.” Journal of Statistical Software 47 (4): 1–28.

Therneau, Terry M. 2021. A Package for Survival Analysis in R. https://CRAN.R-project.org/package=survival.

Wang, Wenjie, Haoda Fu, Sy Han Chiou, and Jun Yan. 2021. reda: Recurrent Event Data Analysis. https://github.com/wenjie2wang/reda.