b/Notebooks/old/annotated_tutorial.ipynb
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "name": "annotated_tutorial.ipynb",
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    }
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "mLjN-UNzR7Sq"
+      },
+      "source": [
+        "Setup: These commands need to be run before using our program."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "fienv8YmR33s"
+      },
+      "source": [
+        "!pip install pytorch_lightning\r\n",
+        "!pip install torchsummaryX\r\n",
+        "!pip install webdataset\r\n",
+        "!git clone --branch master https://github.com/McMasterAI/Radiology-and-AI.git  \r\n",
+        "!git clone https://github.com/black0017/MedicalZooPytorch.git"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "muAuDLkxSKV9"
+      },
+      "source": [
+        "We can get set-up with Google Colab if were using it."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "SZKKEduBSJ6w"
+      },
+      "source": [
+        "from google.colab import drive\r\n",
+        "drive.mount('/content/drive', force_remount=True)"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "m-r2Ki5nTBJJ"
+      },
+      "source": [
+        "cd drive/MyDrive/MacAI"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "lmmHx4NYTUF4"
+      },
+      "source": [
+        "Imports"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "tnMPgjbKTC8h"
+      },
+      "source": [
+        "import sys\r\n",
+        "sys.path.append('./Radiology-and-AI/Radiology_and_AI')\r\n",
+        "sys.path.append('./MedicalZooPytorch')\r\n",
+        "import os\r\n",
+        "import torch\r\n",
+        "import numpy as np\r\n",
+        "from torch.utils.data import Dataset, DataLoader, random_split\r\n",
+        "from pytorch_lightning.loggers import WandbLogger, TensorBoardLogger\r\n",
+        "import pytorch_lightning as pl\r\n",
+        "import sys\r\n",
+        "import nibabel as nb\r\n",
+        "from skimage import transform\r\n",
+        "import matplotlib.pyplot as plt\r\n",
+        "import webdataset as wds\r\n",
+        "from collators.brats_collator import col_img\r\n",
+        "from lightning_modules.segmentation import TumourSegmentation\r\n",
+        "from scipy.interpolate import RegularGridInterpolator\r\n",
+        "from scipy.ndimage.filters import gaussian_filter\r\n",
+        "from time import time"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "QwqglKvgTZZR"
+      },
+      "source": [
+        "Loading datasets. \r\n",
+        "Because neuroimages are really large files, we've decided to use the webdataset library to handle them during training. Essentially, we create a zip file representing our dataset and store them in some file path. However, we can work with any PyTorch dataset object (check PyTorch dataset documentation for details)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "spU-XeiFT3AL"
+      },
+      "source": [
+        "train_dataset = wds.Dataset(\"macai_datasets/brats/train/brats_train.tar.gz\")\r\n",
+        "eval_dataset = wds.Dataset(\"macai_datasets/brats/validation/brats_validation.tar.gz\")"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "gMGMCMzwU67B"
+      },
+      "source": [
+        "To modify/load in the dataset, we use a *collator function* which is also imported (called col_img). You should create a lambda function only taking in the DataLoader batch as an argument, and using whatever arguments you want afterwards. This sounds complex, so just check the next examples:\r\n",
+        "\r\n",
+        "A few notes:\r\n",
+        "- Image augmentations randomly change training images, to artificially increase the sample size by a bit. The available augmentations, demonstrated to be most effective in literature, are the power-law transformation and elastic transformation. However, elastic transformation is relatively slow as of now. Set the augmentation probabilities (pl_prob and elastic_prob) to 0 during evaluation, but you can set them between 0 and 1 for training.\r\n",
+        "- Image normalization is used to make the image intensity distributions more similar. We currently support two types: Nyul normalization and Z-score normalization. To use Z-score normalization, set use_zscore to True. To use Nyul normalization, the *standard_scales* and *percs* have to be trained first (more details later)\r\n",
+        "\r\n",
+        "Note: both Nyul normalization and Z-score normalization will normalize based on the non-background (black) pixels of the entire image, including the tumor region."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "q7YRJmylU5ZO"
+      },
+      "source": [
+        "training_collator_function = lambda batch: col_img(batch, to_tensor=True, nyul_params=None, use_zscore=True, pl_prob=0.5, elastic_prob=0)"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "0Kz7Q1rpXNjy"
+      },
+      "source": [
+        "eval_collator_function = lambda batch: col_img(batch, to_tensor=True, nyul_params=None, use_zscore=True, pl_prob=0, elastic_prob=0)"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "MOm66YuoXeZo"
+      },
+      "source": [
+        "Nyul normalization can be trained using the training dataset. We first create a dataloader that uses a collator function that makes no changes to the image, then feed it to an imported nyul_train_dataloader function. While this currently ignores the segmented region and background (for more accurate use in radiomics), we will create an option to also take into account the segmented region (as we won't have access to a segmentation before performing automated segmentation)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "Wfh9ia72X7gb"
+      },
+      "source": [
+        "train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=5, collate_fn=lambda batch:col_img(batch, to_tensor=False))\r\n",
+        "standard_scales, percss = nyul_train_dataloader(train_dataloader, step = 5)"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "4a_iN2jXaNN9"
+      },
+      "source": [
+        "After training, we can apply Nyul normalization to our dataset."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "5VnWBly7aDx9"
+      },
+      "source": [
+        "nyul_collator_function = lambda batch: col_img(batch, to_tensor=True, nyul_params=nyul_params={'percs': percss, 'standard_scales':standard_scales})"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "AtYAju4MUCDM"
+      },
+      "source": [
+        "We have a lot going on in this line.\r\n",
+        "Our model is a PyTorch Lightning model called TumourSegmentation, which we import above. This instantiates a new instance of the model, and i used during training.\r\n",
+        "- The learning rate controls how quickly the model learns. Too high, and the model won't converge; too low, and it will take too long to train.\r\n",
+        "- The collator is described previously.\r\n",
+        "- The train_dataset is what we train the model using, and the eval_dataset is to ensure that our model is truly learning (rather than memorizing the train_dataset).\r\n",
+        "- batch_size has to be set to the number of images in each series (including the segmentation image). In this case, we have 4 (T1, T2, T1ce, T1 FLAIR) plus a segmentation, to make a total of 5.\r\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "Hb72wEgyT_3D"
+      },
+      "source": [
+        "model = TumourSegmentation(learning_rate = 4e-4, collator=collator_function, batch_size=5, train_dataset=train_dataset, eval_dataset=eval_dataset)"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "sZ3udcmzZezT"
+      },
+      "source": [
+        "This code deals with training. We can check tensorboard to see how well it's been running after training; you can also use any other type of logger. I use tensorboard here, but there exists another (WandB) that handles automatic updating on Colab."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "GA6zv9gGZcGi"
+      },
+      "source": [
+        "%load_ext tensorboard"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "S1fasCzOZZGq"
+      },
+      "source": [
+        "#Training\r\n",
+        "#wandb_logger = WandbLogger(project='macai',name='test_run', offline = True)\r\n",
+        "trainer = pl.Trainer(\r\n",
+        "    accumulate_grad_batches = 1,\r\n",
+        "    gpus = 1,\r\n",
+        "    max_epochs = 10,\r\n",
+        "    precision=16,\r\n",
+        "    check_val_every_n_epoch = 1,\r\n",
+        "    logger = tensorboard_logger,\r\n",
+        "    log_every_n_steps=10,            \r\n",
+        ")\r\n",
+        "trainer.fit(model)"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "2yY14W3RZdFs"
+      },
+      "source": [
+        "%tensorboard --logdir logs/"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Y7xR-qZ3ZxGs"
+      },
+      "source": [
+        "The trainer automatically creates checkpoints, but we can interrupt the trainer and save a checkpoint like so whenever we wish."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "JP05cPb_Zyuh"
+      },
+      "source": [
+        "trainer.save_checkpoint(\"last_ckpt.ckpt\")"
+      ],
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "qv1lcol5aVoj"
+      },
+      "source": [
+        "Finally, it is possible to load saved models and to see the outputs. We can either visualize this in a Python notebook, or by saving the segmentation somewhere and visualizing it using a neuroimaging software (I use 3D Slicer, but I think anything will do)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "xRCEfRG_afOo"
+      },
+      "source": [
+        "# Load the model\r\n",
+        "model = TumourSegmentation.load_from_checkpoint('last_ckpt.ckpt').cuda().half()\r\n",
+        "i=0\r\n",
+        "\r\n",
+        "for z in model.val_dataloader():\r\n",
+        "  print('======================================================')\r\n",
+        "  prediction = model.forward(torch.unsqueeze(z[0], axis=0).cuda().half())\r\n",
+        "\r\n",
+        "  # Save predictions to file for further visualization\r\n",
+        "  prediction_img = nb.Nifti1Image(prediction, np.eye(4))\r\n",
+        "  nb.save(prediction_img, 'prediction_'+str(i)+'.nii.gz')\r\n",
+        "\r\n",
+        "  # Simple visualization of a slice, but we can use Cameron's visualization method\r\n",
+        "  # for improvements to this process.\r\n",
+        "\r\n",
+        "  sl = z[1][0, :, 100]\r\n",
+        "\r\n",
+        "  plt.title('Label')\r\n",
+        "  plt.imshow(sl, vmin = 0, vmax=4)\r\n",
+        "  plt.show()\r\n",
+        "\r\n",
+        "  prediction = prediction[0].cpu().detach().numpy().astype('float32')\r\n",
+        "\r\n",
+        "  plt.title('Prediction core')\r\n",
+        "  plt.imshow(prediction[0, :, 100], vmin = 0, vmax=1)\r\n",
+        "  plt.show()\r\n",
+        "\r\n",
+        "  plt.title('Prediction enhancing')\r\n",
+        "  plt.imshow(prediction[1, :, 100], vmin = 0, vmax=1)\r\n",
+        "  plt.show()\r\n",
+        "\r\n",
+        "  plt.title('Prediction edema')\r\n",
+        "  plt.imshow(prediction[2, :, 100], vmin = 0, vmax=1)\r\n",
+        "  plt.show()\r\n",
+        "\r\n",
+        "  i += 1\r\n",
+        "  if i >= 10:\r\n",
+        "    break"
+      ],
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}