--- a/README.md +++ b/README.md @@ -1,7 +1,5 @@ -<div class="sc-cmRAlD dkqmWS"><div class="sc-UEtKG dGqiYy sc-flttKd cguEtd"><div class="sc-fqwslf gsqkEc"><div class="sc-cBQMlg kAHhUk"><h2 class="sc-dcKlJK sc-cVttbi gqEuPW ksnHgj">About Dataset</h2></div></div></div><div class="sc-jgvlka jFuPjz"><div class="sc-gzqKSP tNtjD"><div style="min-height: 80px;"><div class="sc-etVRix jqYJaa sc-bMmLMY ZURWJ"><pre class="uc-code-block"> - -<span class="hljs-built_in">Data</span> <span class="hljs-keyword">on</span> recurrences of bladder cancer, used <span class="hljs-keyword">by</span> many people <span class="hljs-keyword">to</span> demonstrate methodology for recurrent event modelling. - +## About Dataset +Data on recurrences of bladder cancer, used by many people to demonstrate methodology for recurrent event modelling. <table> <thead> @@ -149,12 +147,8 @@ </tr> </tbody> </table> -<pre class="uc-code-block"><code>Bladder is <span class="hljs-keyword">the</span> data <span class="hljs-built_in">set</span> that appears most commonly <span class="hljs-keyword">in</span> <span class="hljs-keyword">the</span> literature. It uses only <span class="hljs-keyword">the</span> <span class="hljs-number">85</span> subjects <span class="hljs-keyword">with</span> nonzero follow-up who were assigned <span class="hljs-built_in">to</span> either thiotepa <span class="hljs-keyword">or</span> placebo, <span class="hljs-keyword">and</span> only <span class="hljs-keyword">the</span> <span class="hljs-keyword">first</span> <span class="hljs-literal">four</span> recurrences <span class="hljs-keyword">for</span> <span class="hljs-keyword">any</span> patient. The status <span class="hljs-built_in">variable</span> is <span class="hljs-number">1</span> <span class="hljs-keyword">for</span> recurrence <span class="hljs-keyword">and</span> <span class="hljs-number">0</span> <span class="hljs-keyword">for</span> everything <span class="hljs-keyword">else</span> (including death <span class="hljs-keyword">for</span> <span class="hljs-keyword">any</span> reason). The data <span class="hljs-built_in">set</span> is laid out <span class="hljs-keyword">in</span> <span class="hljs-keyword">the</span> competing risks <span class="hljs-built_in">format</span> <span class="hljs-keyword">of</span> <span class="hljs-keyword">the</span> paper <span class="hljs-keyword">by</span> Wei, Lin, <span class="hljs-keyword">and</span> Weissfeld. -</code> -<pre class="uc-code-block"><code>Bladder1 is <span class="hljs-keyword">the</span> full data <span class="hljs-built_in">set</span> <span class="hljs-built_in">from</span> <span class="hljs-keyword">the</span> study. It <span class="hljs-keyword">contains</span> all <span class="hljs-literal">three</span> treatment arms <span class="hljs-keyword">and</span> all recurrences <span class="hljs-keyword">for</span> <span class="hljs-number">118</span> subjects; <span class="hljs-keyword">the</span> maximum observed <span class="hljs-built_in">number</span> <span class="hljs-keyword">of</span> recurrences is <span class="hljs-number">9.</span> -</code>] -<pre class="uc-code-block"> - -<code>Bladder2 uses <span class="hljs-keyword">the</span> same subset <span class="hljs-keyword">of</span> subjects <span class="hljs-keyword">as</span> bladder, but formatted <span class="hljs-keyword">in</span> <span class="hljs-keyword">the</span> (<span class="hljs-built_in">start</span>, <span class="hljs-built_in">stop</span>] <span class="hljs-keyword">or</span> Anderson-Gill style. Note that <span class="hljs-keyword">in</span> transforming <span class="hljs-built_in">from</span> <span class="hljs-keyword">the</span> WLW <span class="hljs-built_in">to</span> <span class="hljs-keyword">the</span> AG style data <span class="hljs-built_in">set</span> there is <span class="hljs-keyword">a</span> quite common programming mistake that leads <span class="hljs-built_in">to</span> extra follow-up <span class="hljs-built_in">time</span> <span class="hljs-keyword">for</span> <span class="hljs-number">12</span> subjects: all those <span class="hljs-keyword">with</span> follow-up beyond their <span class="hljs-number">4</span>th recurrence. This <span class="hljs-string">"follow-up"</span> is <span class="hljs-keyword">a</span> side effect <span class="hljs-keyword">of</span> throwing away all events <span class="hljs-keyword">after</span> <span class="hljs-keyword">the</span> <span class="hljs-keyword">fourth</span> <span class="hljs-keyword">while</span> retaining <span class="hljs-keyword">the</span> <span class="hljs-keyword">last</span> follow-up <span class="hljs-built_in">time</span> <span class="hljs-built_in">variable</span> <span class="hljs-built_in">from</span> <span class="hljs-keyword">the</span> original data. The bladder2 data <span class="hljs-built_in">set</span> found here does <span class="hljs-keyword">not</span> make this mistake, but some analyses <span class="hljs-keyword">in</span> <span class="hljs-keyword">the</span> literature have done so; <span class="hljs-keyword">it</span> results <span class="hljs-keyword">in</span> <span class="hljs-keyword">the</span> addition <span class="hljs-keyword">of</span> <span class="hljs-keyword">a</span> small amount <span class="hljs-keyword">of</span> immortal <span class="hljs-built_in">time</span> bias <span class="hljs-keyword">and</span> shrinks <span class="hljs-keyword">the</span> fitted coefficients towards <span class="hljs-literal">zero</span>. -</code><div class="uc-code-block-copy-button-wrapper"> \ No newline at end of file +Bladder is the data set that appears most commonly in the literature. It uses only the 85 subjects with nonzero follow-up who were assigned to either thiotepa or placebo, and only the first four recurrences for any patient. The status variable is 1 for recurrence and 0 for everything else (including death for any reason). The data set is laid out in the competing risks format of the paper by Wei, Lin, and Weissfeld. +content_copy +Bladder1 is the full data set from the study. It contains all three treatment arms and all recurrences for 118 subjects; the maximum observed number of recurrences is 9. +Bladder2 uses the same subset of subjects as bladder, but formatted in the (start, stop] or Anderson-Gill style. Note that in transforming from the WLW to the AG style data set there is a quite common programming mistake that leads to extra follow-up time for 12 subjects: all those with follow-up beyond their 4th recurrence. This "follow-up" is a side effect of throwing away all events after the fourth while retaining the last follow-up time variable from the original data. The bladder2 data set found here does not make this mistake, but some analyses in the literature have done so; it results in the addition of a small amount of immortal time bias and shrinks the fitted coefficients towards zero. \ No newline at end of file