Diff of /README.md [000000] .. [4bdf3e]

Switch to side-by-side view

--- a
+++ b/README.md
@@ -0,0 +1,38 @@
+# Machine-Learning-for-Disease-Treatment-Response-Prediction
+# Background
+Breast cancer is the most common cancer in the UK for women. Chemotherapy
+is a commonly used treatment strategy to reduce the size of locally advanced
+tumour before surgery. However, chemotherapy is a toxic process to human
+body and it is not aways effective to everyone. Complete tumour resolution at
+surgery, known as pathological complete response (PCR), has a high
+likelihood of achieving cure and longer relapse-free survival (RFS) time. RFS
+is the length of time after primary treatment for a cancer ends that the patient
+survives without any signs or symptoms of that cancer. However, only 25% of
+patients receiving chemotherapy will achieve a PCR, with the remaining 75%
+having residual disease and a range of prognosis. Better patient stratification
+and treatment could be achieved if PCR and RFS could be predicted using
+information prior to chemotherapy treatment.
+
+# Aim
+To use advanced machine learning method to predict PCR
+(classification) and RFS (regression) using both clinically measured features
+and features derived from magnetic resonance images (MRI) prior to
+chemotherapy treatment.
+
+# Data
+Based on the public dataset from The American College of Radiology Imaging
+Network (I-SPY 2 TRIAL), a simplified dataset is generated for this assignment.
+Each patient in this dataset contains 10 clinical features (Age, ER, PgG, HER2,
+TrippleNegative Status, Chemotherapy Grade, Tumour Proliferation, Histology 2
+Type, Lymph node Status and Tumour Stage) and 107 MRI-based features.
+The image-based features were extracted from the tumour region of MRIs using
+a radiomics feature extraction package (known as Pyradiomics:
+https://pyradiomics.readthedocs.io/en/latest/ ). You do not need to understand
+the meaning of these clinical feature and image-based features to complete this
+assignment but worth reading background information on the I-SPY 2 Trial
+website. “999” in the spreadsheet means a missing data value. A training
+dataset (trainDataset.xls) is provided and available on Moodle that contains
+400 patients. A test dataset that contains N patients is reserved (hidden from
+you) for final performance evaluation. You can assume that the test set and
+training set are sampled from the same data distribution, but the ratio of PCR
+positive and negative could be different.