## MRNet Baseline Models
Seeking to replicate and extend basic CNN models on the MRNet data.

You should have run the `save_middle_slices_as_images.ipynb` notebook to generate directories containing just the center slice of the scans from the three planes, directory `mid1`, as well as an RGB image generated from the three centered slices from the scans, skipping d={0,1,2} slices between the taken slices, eg directory `mid3d2`.

For each model architecture (eg AlexNet or ResNet50), there are thus nine models to fit, for each combination of outcome and plane. Each architecture will have a corresponding 3x3 model performance grid. The competition measure is average AUC across outcome (Abnormal, Meniscus tear, and ACL tear). Simple first approach is to keep predictions for an outcome based on the best model of only scans from one plane.

Outcomes predicted:
- Abnormal/Normal
- Meniscus tear/Not
- ACL tear/Not

Input images from planes:
- Axial
- Coronal
- Sagittal

### Load libraries

In [1]:
import numpy as np
import pandas as pd
import os
import pickle
from time import strftime, gmtime 

import matplotlib.pyplot as plt
from fastai.vision import *
import torch

# for AUC ROC score calculations
from sklearn import metrics

from operator import itemgetter 

#from mrnet_orig import *
from mrnet_itemlist import *

#from ipywidgets import interact, Dropdown, IntSlider

%matplotlib notebook
plt.style.use('grayscale')

In [2]:
data_path = Path('../data/mid3d0') 

### Prepare the labels

In [3]:
train_abnl = pd.read_csv(data_path/'train-abnormal.csv', header=None,
                       names=['Case', 'Abnormal'], 
                       dtype={'Case': str, 'Abnormal': np.int64})
train_abnl['axial'] = 'train/axial/' + train_abnl.Case + '.png'
train_abnl['coronal'] = 'train/coronal/' + train_abnl.Case + '.png'
train_abnl['sagittal'] = 'train/sagittal/' + train_abnl.Case + '.png'

valid_abnl = pd.read_csv(data_path/'valid-abnormal.csv', header=None,
                       names=['Case', 'Abnormal'], 
                       dtype={'Case': str, 'Abnormal': np.int64})
valid_abnl['axial'] = 'valid/axial/' + valid_abnl.Case + '.png'
valid_abnl['coronal'] = 'valid/coronal/' + valid_abnl.Case + '.png'
valid_abnl['sagittal'] = 'valid/sagittal/' + valid_abnl.Case + '.png'
abnl = train_abnl.append(valid_abnl, ignore_index=True)

In [4]:
train_meni = pd.read_csv(data_path/'train-meniscus.csv', header=None,
                       names=['Case', 'Meniscus'], 
                       dtype={'Case': str, 'Meniscus': np.int64})

valid_meni = pd.read_csv(data_path/'valid-meniscus.csv', header=None,
                       names=['Case', 'Meniscus'], 
                       dtype={'Case': str, 'Meniscus': np.int64})
meni = train_meni.append(valid_meni, ignore_index=True)
abnl = pd.merge(abnl, meni, on='Case')

In [5]:
train_acl = pd.read_csv(data_path/'train-acl.csv', header=None,
                       names=['Case', 'ACL'], 
                       dtype={'Case': str, 'ACL': np.int64})

valid_acl = pd.read_csv(data_path/'valid-acl.csv', header=None,
                       names=['Case', 'ACL'], 
                       dtype={'Case': str, 'ACL': np.int64})
acl = train_acl.append(valid_acl, ignore_index=True)
abnl = pd.merge(abnl, acl, on='Case')

In [6]:
abnl.head()

Unnamed: 0,Case,Abnormal,axial,coronal,sagittal,Meniscus,ACL
0,0,1,train/axial/0000.png,train/coronal/0000.png,train/sagittal/0000.png,0,0
1,1,1,train/axial/0001.png,train/coronal/0001.png,train/sagittal/0001.png,1,1
2,2,1,train/axial/0002.png,train/coronal/0002.png,train/sagittal/0002.png,0,0
3,3,1,train/axial/0003.png,train/coronal/0003.png,train/sagittal/0003.png,1,0
4,4,1,train/axial/0004.png,train/coronal/0004.png,train/sagittal/0004.png,0,0


In [7]:
abnl.shape

(1250, 7)

#### Restrict attention to Cases for which you have data
For local development, you might be working just with a subset of the MRNet data. If so, get the list of Cases you have data for and subset the dataframe accordingly.

In [8]:
train_cases = [e[:-4] for e in os.listdir(data_path/'train/axial') if e[-4:] == '.png']
valid_cases = [e[:-4] for e in os.listdir(data_path/'valid/axial') if e[-4:] == '.png']
cases_w_data = sorted(train_cases + valid_cases)

In [9]:
len(cases_w_data)

1250

In [10]:
df = abnl.loc[abnl.Case.isin(cases_w_data),:]

In [11]:
df.shape

(1250, 7)

## Fit Models

In [12]:
# if necessary, create /models by running the following
#!mkdir models

In [13]:
models_path = Path('./models')

## Cycling through planes and outcomes

In [None]:
accresults = []
aucresults = []

for outcome in ('Abnormal','Meniscus','ACL'):
    for plane in ('axial','coronal','sagittal'):
        print('---- Using AlexNet to predict {} using {} middle slice(s) [data in {}] ----'.format(outcome, plane, data_path))
        data  = ImageDataBunch.from_df(path=data_path, df=df, 
                                       fn_col=plane, label_col=outcome, bs=64)
        learn = cnn_learner(data, models.alexnet, metrics=accuracy)
        learn.fit_one_cycle(10)
        # collect accuracy metrics
        acc     = [float(e[0]) for e in learn.recorder.metrics]
        accresults.append([outcome,plane,acc])
        # collect AUC ROC scores
        yhat, y = learn.get_preds(ds_type=DatasetType.Valid)
        # TODO: save y and predicted y
        
        auc     = metrics.roc_auc_score(to_np(y), to_np(yhat)[:,1])
        aucresults.append([outcome,plane,auc])    
        
# save accuracy metrics
with open(models_path/('accresults_' + data_path.stem + '_AlexNet_' +  strftime('%Y%m%d_%H%M', gmtime())), 'wb') as f:
    pickle.dump(accresults, f)

# save AUC scores
with open(models_path/('aucresults_' + data_path.stem + '_AlexNet_' +  strftime('%Y%m%d_%H%M', gmtime())), 'wb') as f:
    pickle.dump(aucresults, f)

---- Using AlexNet to predict ACL using axial middle slice(s) [data in ../data/mid3d0] ----


epoch,train_loss,valid_loss,accuracy,time
0,0.970498,0.68994,0.548,00:12
1,0.818568,0.504058,0.784,00:13
2,0.663493,0.458726,0.784,00:13
3,0.564507,0.48603,0.792,00:12


In [31]:
sorted_auc = sorted(aucresults, key=itemgetter(0,2), reverse=True)
print(sorted_auc)
high_auc_per_task = sorted_auc[0::3]
print(high_auc_per_task)
ave_auc_across_tasks = np.mean([e[2] for e in high_auc_per_task])
print('Average AUC across tasks: {}'.format(np.round(ave_auc_across_tasks,2)))

[['Meniscus', 'axial', 0.7259965468057953], ['Meniscus', 'coronal', 0.6765074962023645], ['Meniscus', 'sagittal', 0.6744101103075463], ['Abnormal', 'axial', 0.8662922455414326], ['Abnormal', 'sagittal', 0.8261897723913686], ['Abnormal', 'coronal', 0.7856420626895854], ['ACL', 'axial', 0.756035077347522], ['ACL', 'sagittal', 0.738572295949345], ['ACL', 'coronal', 0.7013403263403263]]
[['Meniscus', 'axial', 0.7259965468057953], ['Abnormal', 'axial', 0.8662922455414326], ['ACL', 'axial', 0.756035077347522]]
Average AUC across tasks: 0.78


In [31]:
sorted_auc = sorted(aucresults, key=itemgetter(0,2), reverse=True)
print(sorted_auc)
high_auc_per_task = sorted_auc[0::3]
print(high_auc_per_task)
ave_auc_across_tasks = np.mean([e[2] for e in high_auc_per_task])
print('Average AUC across tasks: {}'.format(np.round(ave_auc_across_tasks,2)))

[['Meniscus', 'axial', 0.7259965468057953], ['Meniscus', 'coronal', 0.6765074962023645], ['Meniscus', 'sagittal', 0.6744101103075463], ['Abnormal', 'axial', 0.8662922455414326], ['Abnormal', 'sagittal', 0.8261897723913686], ['Abnormal', 'coronal', 0.7856420626895854], ['ACL', 'axial', 0.756035077347522], ['ACL', 'sagittal', 0.738572295949345], ['ACL', 'coronal', 0.7013403263403263]]
[['Meniscus', 'axial', 0.7259965468057953], ['Abnormal', 'axial', 0.8662922455414326], ['ACL', 'axial', 0.756035077347522]]
Average AUC across tasks: 0.78
