|
a/README.md |
|
b/README.md |
1 |
<div class="sc-cmRAlD dkqmWS"><div class="sc-UEtKG dGqiYy sc-flttKd cguEtd"><div class="sc-fqwslf gsqkEc"><div class="sc-cBQMlg kAHhUk"><h2 class="sc-dcKlJK sc-cVttbi gqEuPW ksnHgj">About Dataset</h2></div></div></div><div class="sc-jgvlka jFuPjz"><div class="sc-gzqKSP tNtjD"><div style="min-height: 80px;"><div class="sc-etVRix jqYJaa sc-bMmLMY ZURWJ"><pre class="uc-code-block"> |
1 |
## About Dataset |
2 |
|
2 |
Data on recurrences of bladder cancer, used by many people to demonstrate methodology for recurrent event modelling. |
3 |
<span class="hljs-built_in">Data</span> <span class="hljs-keyword">on</span> recurrences of bladder cancer, used <span class="hljs-keyword">by</span> many people <span class="hljs-keyword">to</span> demonstrate methodology for recurrent event modelling. |
3 |
|
4 |
|
4 |
<table> |
5 |
|
5 |
<thead> |
6 |
<table> |
6 |
<tr> |
7 |
<thead> |
7 |
<th>Column</th> |
8 |
<tr> |
8 |
<th>Description</th> |
9 |
<th>Column</th> |
9 |
<th>Format</th> |
10 |
<th>Description</th> |
10 |
</tr> |
11 |
<th>Format</th> |
11 |
</thead> |
12 |
</tr> |
12 |
<tbody> |
13 |
</thead> |
13 |
<tr> |
14 |
<tbody> |
14 |
<td><strong>Bladder Dataset 1</strong></td> |
15 |
<tr> |
15 |
<td></td> |
16 |
<td><strong>Bladder Dataset 1</strong></td> |
16 |
<td></td> |
17 |
<td></td> |
17 |
</tr> |
18 |
<td></td> |
18 |
<tr> |
19 |
</tr> |
19 |
<td>id</td> |
20 |
<tr> |
20 |
<td>Patient ID</td> |
21 |
<td>id</td> |
21 |
<td></td> |
22 |
<td>Patient ID</td> |
22 |
</tr> |
23 |
<td></td> |
23 |
<tr> |
24 |
</tr> |
24 |
<td>treatment</td> |
25 |
<tr> |
25 |
<td>Treatment received</td> |
26 |
<td>treatment</td> |
26 |
<td>Placebo, pyridoxine (vitamin B6), or thiotepa</td> |
27 |
<td>Treatment received</td> |
27 |
</tr> |
28 |
<td>Placebo, pyridoxine (vitamin B6), or thiotepa</td> |
28 |
<tr> |
29 |
</tr> |
29 |
<td>number</td> |
30 |
<tr> |
30 |
<td>Initial number of tumors</td> |
31 |
<td>number</td> |
31 |
<td>8=8 or more</td> |
32 |
<td>Initial number of tumors</td> |
32 |
</tr> |
33 |
<td>8=8 or more</td> |
33 |
<tr> |
34 |
</tr> |
34 |
<td>size</td> |
35 |
<tr> |
35 |
<td>Size (cm) of largest initial tumor</td> |
36 |
<td>size</td> |
36 |
<td></td> |
37 |
<td>Size (cm) of largest initial tumor</td> |
37 |
</tr> |
38 |
<td></td> |
38 |
<tr> |
39 |
</tr> |
39 |
<td>recur</td> |
40 |
<tr> |
40 |
<td>Number of recurrences</td> |
41 |
<td>recur</td> |
41 |
<td></td> |
42 |
<td>Number of recurrences</td> |
42 |
</tr> |
43 |
<td></td> |
43 |
<tr> |
44 |
</tr> |
44 |
<td>start</td> |
45 |
<tr> |
45 |
<td>Start time of each interval</td> |
46 |
<td>start</td> |
46 |
<td></td> |
47 |
<td>Start time of each interval</td> |
47 |
</tr> |
48 |
<td></td> |
48 |
<tr> |
49 |
</tr> |
49 |
<td>stop</td> |
50 |
<tr> |
50 |
<td>End time of each interval</td> |
51 |
<td>stop</td> |
51 |
<td></td> |
52 |
<td>End time of each interval</td> |
52 |
</tr> |
53 |
<td></td> |
53 |
<tr> |
54 |
</tr> |
54 |
<td>status</td> |
55 |
<tr> |
55 |
<td>End of interval code</td> |
56 |
<td>status</td> |
56 |
<td>0=censored, 1=recurrence, 2=death from bladder disease, 3=death other/unknown cause</td> |
57 |
<td>End of interval code</td> |
57 |
</tr> |
58 |
<td>0=censored, 1=recurrence, 2=death from bladder disease, 3=death other/unknown cause</td> |
58 |
<tr> |
59 |
</tr> |
59 |
<td>rtumor</td> |
60 |
<tr> |
60 |
<td>Number of tumors found at recurrence</td> |
61 |
<td>rtumor</td> |
61 |
<td></td> |
62 |
<td>Number of tumors found at recurrence</td> |
62 |
</tr> |
63 |
<td></td> |
63 |
<tr> |
64 |
</tr> |
64 |
<td>rsize</td> |
65 |
<tr> |
65 |
<td>Size of largest tumor at recurrence</td> |
66 |
<td>rsize</td> |
66 |
<td></td> |
67 |
<td>Size of largest tumor at recurrence</td> |
67 |
</tr> |
68 |
<td></td> |
68 |
<tr> |
69 |
</tr> |
69 |
<td>enum</td> |
70 |
<tr> |
70 |
<td>Event number (observation number within patient)</td> |
71 |
<td>enum</td> |
71 |
<td></td> |
72 |
<td>Event number (observation number within patient)</td> |
72 |
</tr> |
73 |
<td></td> |
73 |
<tr> |
74 |
</tr> |
74 |
<td><strong>Bladder Dataset 0</strong></td> |
75 |
<tr> |
75 |
<td></td> |
76 |
<td><strong>Bladder Dataset 0</strong></td> |
76 |
<td></td> |
77 |
<td></td> |
77 |
</tr> |
78 |
<td></td> |
78 |
<tr> |
79 |
</tr> |
79 |
<td>id</td> |
80 |
<tr> |
80 |
<td>Patient ID</td> |
81 |
<td>id</td> |
81 |
<td></td> |
82 |
<td>Patient ID</td> |
82 |
</tr> |
83 |
<td></td> |
83 |
<tr> |
84 |
</tr> |
84 |
<td>rx</td> |
85 |
<tr> |
85 |
<td>Treatment received</td> |
86 |
<td>rx</td> |
86 |
<td>1=placebo, 2=thiotepa</td> |
87 |
<td>Treatment received</td> |
87 |
</tr> |
88 |
<td>1=placebo, 2=thiotepa</td> |
88 |
<tr> |
89 |
</tr> |
89 |
<td>number</td> |
90 |
<tr> |
90 |
<td>Initial number of tumors</td> |
91 |
<td>number</td> |
91 |
<td>8=8 or more</td> |
92 |
<td>Initial number of tumors</td> |
92 |
</tr> |
93 |
<td>8=8 or more</td> |
93 |
<tr> |
94 |
</tr> |
94 |
<td>size</td> |
95 |
<tr> |
95 |
<td>Size (cm) of largest initial tumor</td> |
96 |
<td>size</td> |
96 |
<td></td> |
97 |
<td>Size (cm) of largest initial tumor</td> |
97 |
</tr> |
98 |
<td></td> |
98 |
<tr> |
99 |
</tr> |
99 |
<td>stop</td> |
100 |
<tr> |
100 |
<td>Recurrence or censoring time</td> |
101 |
<td>stop</td> |
101 |
<td></td> |
102 |
<td>Recurrence or censoring time</td> |
102 |
</tr> |
103 |
<td></td> |
103 |
<tr> |
104 |
</tr> |
104 |
<td>enum</td> |
105 |
<tr> |
105 |
<td>Which recurrence (up to 4)</td> |
106 |
<td>enum</td> |
106 |
<td></td> |
107 |
<td>Which recurrence (up to 4)</td> |
107 |
</tr> |
108 |
<td></td> |
108 |
<tr> |
109 |
</tr> |
109 |
<td><strong>Bladder Dataset 2</strong></td> |
110 |
<tr> |
110 |
<td></td> |
111 |
<td><strong>Bladder Dataset 2</strong></td> |
111 |
<td></td> |
112 |
<td></td> |
112 |
</tr> |
113 |
<td></td> |
113 |
<tr> |
114 |
</tr> |
114 |
<td>id</td> |
115 |
<tr> |
115 |
<td>Patient ID</td> |
116 |
<td>id</td> |
116 |
<td></td> |
117 |
<td>Patient ID</td> |
117 |
</tr> |
118 |
<td></td> |
118 |
<tr> |
119 |
</tr> |
119 |
<td>rx</td> |
120 |
<tr> |
120 |
<td>Treatment received</td> |
121 |
<td>rx</td> |
121 |
<td>1=placebo, 2=thiotepa</td> |
122 |
<td>Treatment received</td> |
122 |
</tr> |
123 |
<td>1=placebo, 2=thiotepa</td> |
123 |
<tr> |
124 |
</tr> |
124 |
<td>number</td> |
125 |
<tr> |
125 |
<td>Initial number of tumors</td> |
126 |
<td>number</td> |
126 |
<td>8=8 or more</td> |
127 |
<td>Initial number of tumors</td> |
127 |
</tr> |
128 |
<td>8=8 or more</td> |
128 |
<tr> |
129 |
</tr> |
129 |
<td>size</td> |
130 |
<tr> |
130 |
<td>Size (cm) of largest initial tumor</td> |
131 |
<td>size</td> |
131 |
<td></td> |
132 |
<td>Size (cm) of largest initial tumor</td> |
132 |
</tr> |
133 |
<td></td> |
133 |
<tr> |
134 |
</tr> |
134 |
<td>start</td> |
135 |
<tr> |
135 |
<td>Start of interval (0 or previous recurrence time)</td> |
136 |
<td>start</td> |
136 |
<td></td> |
137 |
<td>Start of interval (0 or previous recurrence time)</td> |
137 |
</tr> |
138 |
<td></td> |
138 |
<tr> |
139 |
</tr> |
139 |
<td>stop</td> |
140 |
<tr> |
140 |
<td>Recurrence or censoring time</td> |
141 |
<td>stop</td> |
141 |
<td></td> |
142 |
<td>Recurrence or censoring time</td> |
142 |
</tr> |
143 |
<td></td> |
143 |
<tr> |
144 |
</tr> |
144 |
<td>enum</td> |
145 |
<tr> |
145 |
<td>Which recurrence (up to 4)</td> |
146 |
<td>enum</td> |
146 |
<td></td> |
147 |
<td>Which recurrence (up to 4)</td> |
147 |
</tr> |
148 |
<td></td> |
148 |
</tbody> |
149 |
</tr> |
149 |
</table> |
150 |
</tbody> |
150 |
|
151 |
</table> |
151 |
Bladder is the data set that appears most commonly in the literature. It uses only the 85 subjects with nonzero follow-up who were assigned to either thiotepa or placebo, and only the first four recurrences for any patient. The status variable is 1 for recurrence and 0 for everything else (including death for any reason). The data set is laid out in the competing risks format of the paper by Wei, Lin, and Weissfeld. |
152 |
<pre class="uc-code-block"><code>Bladder is <span class="hljs-keyword">the</span> data <span class="hljs-built_in">set</span> that appears most commonly <span class="hljs-keyword">in</span> <span class="hljs-keyword">the</span> literature. It uses only <span class="hljs-keyword">the</span> <span class="hljs-number">85</span> subjects <span class="hljs-keyword">with</span> nonzero follow-up who were assigned <span class="hljs-built_in">to</span> either thiotepa <span class="hljs-keyword">or</span> placebo, <span class="hljs-keyword">and</span> only <span class="hljs-keyword">the</span> <span class="hljs-keyword">first</span> <span class="hljs-literal">four</span> recurrences <span class="hljs-keyword">for</span> <span class="hljs-keyword">any</span> patient. The status <span class="hljs-built_in">variable</span> is <span class="hljs-number">1</span> <span class="hljs-keyword">for</span> recurrence <span class="hljs-keyword">and</span> <span class="hljs-number">0</span> <span class="hljs-keyword">for</span> everything <span class="hljs-keyword">else</span> (including death <span class="hljs-keyword">for</span> <span class="hljs-keyword">any</span> reason). The data <span class="hljs-built_in">set</span> is laid out <span class="hljs-keyword">in</span> <span class="hljs-keyword">the</span> competing risks <span class="hljs-built_in">format</span> <span class="hljs-keyword">of</span> <span class="hljs-keyword">the</span> paper <span class="hljs-keyword">by</span> Wei, Lin, <span class="hljs-keyword">and</span> Weissfeld. |
152 |
content_copy |
153 |
</code> |
153 |
Bladder1 is the full data set from the study. It contains all three treatment arms and all recurrences for 118 subjects; the maximum observed number of recurrences is 9. |
154 |
|
154 |
Bladder2 uses the same subset of subjects as bladder, but formatted in the (start, stop] or Anderson-Gill style. Note that in transforming from the WLW to the AG style data set there is a quite common programming mistake that leads to extra follow-up time for 12 subjects: all those with follow-up beyond their 4th recurrence. This "follow-up" is a side effect of throwing away all events after the fourth while retaining the last follow-up time variable from the original data. The bladder2 data set found here does not make this mistake, but some analyses in the literature have done so; it results in the addition of a small amount of immortal time bias and shrinks the fitted coefficients towards zero. |
155 |
<pre class="uc-code-block"><code>Bladder1 is <span class="hljs-keyword">the</span> full data <span class="hljs-built_in">set</span> <span class="hljs-built_in">from</span> <span class="hljs-keyword">the</span> study. It <span class="hljs-keyword">contains</span> all <span class="hljs-literal">three</span> treatment arms <span class="hljs-keyword">and</span> all recurrences <span class="hljs-keyword">for</span> <span class="hljs-number">118</span> subjects; <span class="hljs-keyword">the</span> maximum observed <span class="hljs-built_in">number</span> <span class="hljs-keyword">of</span> recurrences is <span class="hljs-number">9.</span> |
|
|
156 |
</code>] |
|
|
157 |
<pre class="uc-code-block"> |
|
|
158 |
|
|
|
159 |
<code>Bladder2 uses <span class="hljs-keyword">the</span> same subset <span class="hljs-keyword">of</span> subjects <span class="hljs-keyword">as</span> bladder, but formatted <span class="hljs-keyword">in</span> <span class="hljs-keyword">the</span> (<span class="hljs-built_in">start</span>, <span class="hljs-built_in">stop</span>] <span class="hljs-keyword">or</span> Anderson-Gill style. Note that <span class="hljs-keyword">in</span> transforming <span class="hljs-built_in">from</span> <span class="hljs-keyword">the</span> WLW <span class="hljs-built_in">to</span> <span class="hljs-keyword">the</span> AG style data <span class="hljs-built_in">set</span> there is <span class="hljs-keyword">a</span> quite common programming mistake that leads <span class="hljs-built_in">to</span> extra follow-up <span class="hljs-built_in">time</span> <span class="hljs-keyword">for</span> <span class="hljs-number">12</span> subjects: all those <span class="hljs-keyword">with</span> follow-up beyond their <span class="hljs-number">4</span>th recurrence. This <span class="hljs-string">"follow-up"</span> is <span class="hljs-keyword">a</span> side effect <span class="hljs-keyword">of</span> throwing away all events <span class="hljs-keyword">after</span> <span class="hljs-keyword">the</span> <span class="hljs-keyword">fourth</span> <span class="hljs-keyword">while</span> retaining <span class="hljs-keyword">the</span> <span class="hljs-keyword">last</span> follow-up <span class="hljs-built_in">time</span> <span class="hljs-built_in">variable</span> <span class="hljs-built_in">from</span> <span class="hljs-keyword">the</span> original data. The bladder2 data <span class="hljs-built_in">set</span> found here does <span class="hljs-keyword">not</span> make this mistake, but some analyses <span class="hljs-keyword">in</span> <span class="hljs-keyword">the</span> literature have done so; <span class="hljs-keyword">it</span> results <span class="hljs-keyword">in</span> <span class="hljs-keyword">the</span> addition <span class="hljs-keyword">of</span> <span class="hljs-keyword">a</span> small amount <span class="hljs-keyword">of</span> immortal <span class="hljs-built_in">time</span> bias <span class="hljs-keyword">and</span> shrinks <span class="hljs-keyword">the</span> fitted coefficients towards <span class="hljs-literal">zero</span>. |
|
|
160 |
</code><div class="uc-code-block-copy-button-wrapper"> |
|
|