Diff of /README.md [000000] .. [a47def]

Switch to unified view

a b/README.md
1
<div class="sc-jegwdG lhLRCf"><div class="sc-UEtKG dGqiYy sc-flttKd cguEtd"><div class="sc-fqwslf gsqkEc"><div class="sc-cBQMlg kAHhUk"><h2 class="sc-dcKlJK sc-cVttbi gqEuPW ksnHgj">About Dataset</h2></div></div></div><div class="sc-davvxH eCVTlP"><div class="sc-jCNfQM dTyvWO"><div style="min-height: 80px;"><div class="sc-etVRix jqYJaa sc-gVIFzB gQKGyV"><p>This is a <strong>brand-new</strong> (!) dataset from an open-access paper <a rel="noreferrer nofollow" aria-label="published December 10, 2020 (opens in a new tab)" target="_blank" href="https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1003489">published December 10, 2020</a>. The paper and the full dataset are open-access (<a rel="noreferrer nofollow" aria-label="CC-BY (opens in a new tab)" target="_blank" href="https://creativecommons.org/licenses/by/4.0/">CC-BY</a>), so please give attribution to the original authors in your work.   </p>
2
<h3>Background</h3>
3
<p>Pancreatic cancer is an extremely deadly type of cancer. Once diagnosed, the five-year survival rate is less than 10%. However, if pancreatic cancer is caught early, the odds of surviving are much better. Unfortunately, many cases of pancreatic cancer show no symptoms until the cancer has spread throughout the body. A diagnostic test to identify people with pancreatic cancer could be enormously helpful. </p>
4
<h3>The paper</h3>
5
<p>In a <a rel="noreferrer nofollow" aria-label="paper (opens in a new tab)" target="_blank" href="https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1003489">paper</a> by Silvana Debernardi and colleagues, published this year in the journal PLOS Medicine, a multi-national team of researchers sought to develop an accurate diagnostic test for the most common type of pancreatic cancer, called pancreatic ductal adenocarcinoma or PDAC. They gathered a series of biomarkers from the urine of three groups of patients:  </p>
6
<ul>
7
<li>Healthy controls</li>
8
<li>Patients with non-cancerous pancreatic conditions, like chronic pancreatitis</li>
9
<li>Patients with pancreatic ductal adenocarcinoma </li>
10
</ul>
11
<p>When possible, these patients were age- and sex-matched. The goal was to develop an accurate way to identify patients with pancreatic cancer.</p>
12
<h3>The data</h3>
13
<p>The key features are four urinary biomarkers: creatinine, LYVE1, REG1B, and TFF1. </p>
14
<ul>
15
<li><strong>Creatinine</strong> is a protein that is often used as an indicator of kidney function. </li>
16
<li><strong>YVLE1</strong> is lymphatic vessel endothelial hyaluronan receptor 1, a protein that may play a role in tumor metastasis</li>
17
<li><strong>REG1B</strong> is a protein that may be associated with pancreas regeneration</li>
18
<li><strong>TFF1</strong> is trefoil factor 1, which may be related to regeneration and repair of the urinary tract</li>
19
</ul>
20
<p><strong>Age</strong> and <strong>sex</strong>, both included in the dataset, may also play a role in who gets pancreatic cancer. The dataset includes a few other biomarkers as well, but these were not measured in all patients (they were collected partly to measure how various blood biomarkers compared to urine biomarkers).  </p>
21
<p>I have not changed any of the data from the paper, other than renaming the columns for easy importing and use. The file <code>Debernardi et al 2020 data.csv</code> contains the raw data, while the file <code>Debernardi et al 2020 documentation.csv</code> contains a detailed documentation of what each column represents (as well as the original column names from the paper).</p>
22
<h3>Prediction task</h3>
23
<p>The goal in this dataset is predicting <code>diagnosis</code>, and more specifically, differentiating between 3 (pancreatic cancer) versus 2 (non-cancerous pancreas condition) and 1 (healthy). The dataset includes information on stage of pancreatic cancer, and diagnosis for non-cancerous patients, but remember—these won't be available to a predictive model. The goal, after all, is to predict the presence of disease <em>before</em> it's diagnosed, not after! </p>
24
<h3>Acknowledgements</h3>
25
<p>I would like to thank the authors of this paper, for graciously sharing their raw data with the research community. </p></div></div></div>