Diff of /README.md [000000] .. [4bdf3e]

Switch to unified view

a b/README.md
1
# Machine-Learning-for-Disease-Treatment-Response-Prediction
2
# Background
3
Breast cancer is the most common cancer in the UK for women. Chemotherapy
4
is a commonly used treatment strategy to reduce the size of locally advanced
5
tumour before surgery. However, chemotherapy is a toxic process to human
6
body and it is not aways effective to everyone. Complete tumour resolution at
7
surgery, known as pathological complete response (PCR), has a high
8
likelihood of achieving cure and longer relapse-free survival (RFS) time. RFS
9
is the length of time after primary treatment for a cancer ends that the patient
10
survives without any signs or symptoms of that cancer. However, only 25% of
11
patients receiving chemotherapy will achieve a PCR, with the remaining 75%
12
having residual disease and a range of prognosis. Better patient stratification
13
and treatment could be achieved if PCR and RFS could be predicted using
14
information prior to chemotherapy treatment.
15
16
# Aim
17
To use advanced machine learning method to predict PCR
18
(classification) and RFS (regression) using both clinically measured features
19
and features derived from magnetic resonance images (MRI) prior to
20
chemotherapy treatment.
21
22
# Data
23
Based on the public dataset from The American College of Radiology Imaging
24
Network (I-SPY 2 TRIAL), a simplified dataset is generated for this assignment.
25
Each patient in this dataset contains 10 clinical features (Age, ER, PgG, HER2,
26
TrippleNegative Status, Chemotherapy Grade, Tumour Proliferation, Histology 2
27
Type, Lymph node Status and Tumour Stage) and 107 MRI-based features.
28
The image-based features were extracted from the tumour region of MRIs using
29
a radiomics feature extraction package (known as Pyradiomics:
30
https://pyradiomics.readthedocs.io/en/latest/ ). You do not need to understand
31
the meaning of these clinical feature and image-based features to complete this
32
assignment but worth reading background information on the I-SPY 2 Trial
33
website. “999” in the spreadsheet means a missing data value. A training
34
dataset (trainDataset.xls) is provided and available on Moodle that contains
35
400 patients. A test dataset that contains N patients is reserved (hidden from
36
you) for final performance evaluation. You can assume that the test set and
37
training set are sampled from the same data distribution, but the ratio of PCR
38
positive and negative could be different.