Diff of /README.md [000000] .. [190ca4]

Switch to unified view

a b/README.md
1
# A Large-scale Multi Domain Leukemia Dataset for the White Blood Cells Detection with Morphological Attributes for Explainability
2
3
![architecture_AttriDet](https://github.com/intelligentMachines-ITU/Blood-Cancer-Dataset/assets/155678287/e2004432-3411-4eea-bc27-cf2a6a6daab9)
4
5
6
**Authors:** Abdul Rehman, Talha Meraj, Aiman Mahmood Minhas, Ayisha Imran, Mohsen Ali, Waqas Sultani
7
8
**MICCAI 2024**
9
10
**Paper:** [ArXiv](https://arxiv.org/abs/2405.10803)
11
12
**Hugging Face** [Demo](https://huggingface.co/spaces/ryhm/AttriDet)
13
14
**Abstract:** _Earlier diagnosis of Leukemia can save thousands of lives annually. The prognosis of leukemia is challenging without the morphological information of White Blood Cells (WBC) and relies on the accessibility of expensive microscopes and the availability of hematologists to analyze Peripheral Blood Samples (PBS). Deep Learning based methods can be employed to assist hematologists. However, these algorithms require a large amount of labeled data, which is not readily available. To overcome this limitation, we have acquired a realistic, generalized, and large dataset. To collect this comprehensive dataset for real-world applications, two microscopes from two different cost spectrums (high-cost HCM and low-cost LCM) are used for dataset capturing at three magnifications (100x, 40x, 10x) through different sensors (high-end camera for HCM, middle-level camera for LCM and mobile-phone camera for both). The high-sensor camera is 47 times more expensive than the middle-level camera and HCM is 17 times more expensive than LCM. In this collection, using HCM at high resolution (100x), experienced hematologists annotated 10.3k WBC types (14) and artifacts, having 55k morphological labels (Cell Size, Nuclear Chromatin, Nuclear Shape, etc.) from 2.4k images of several PBS leukemia patients. Later on, these annotations are transferred to other 2 magnifications of HCM, and 3 magnifications of LCM, and on each camera captured images. Along with the LeukemiaAttri dataset, we provide baselines over multiple object detectors and Unsupervised Domain Adaptation (UDA) strategies, along with morphological information-based attribute prediction. The dataset will be publicly available after publication to facilitate the research in this direction._
15
16
*The **journal version** of this paper is "Leveraging Sparse Annotations for Leukemia Diagnosis on the Large Leukemia Dataset"*
17
18
# Installation
19
20
We recommend the use of a Linux machine equipped with CUDA compatible GPUs. The execution environment can be installed through Conda.
21
22
Clone repo:
23
```
24
git clone https://github.com/AttriDet/AttriDet
25
cd AttriDet
26
```
27
 
28
Conda
29
Install requirements.txt in a Python>=3.7.16 environment, require PyTorch version 1.13.1 with CUDA version 11.7 support. The environment can be installed and activated with:
30
```
31
conda create --name AttriDet python=3.7.16
32
conda activate AttriDet
33
pip install -r requirements.txt  # install
34
```
35
36
# Dataset 
37
LeukemiaAttri dataset can be downloaded from the given link: [link](https://drive.google.com/drive/folders/1J5ld-tK6cewj9wXWUi3rs6UdlHnDBe8U?usp=sharing) 
38
39
# JSON COCO Format
40
```
41
|-COCO Dataset
42
      |---Annotations
43
                     |---train.json
44
                     |---test.json
45
      |---Images
46
                |---train
47
                |---test
48
```
49
50
# YOLO Format
51
52
We construct the training and testing set for the yolo format settings, dataset can be downloaded from:
53
54
labels prepared in YOLO format but with attributes information as: cls x y w h a1 a2 a3 a4 a5 a6 whereas standard yolo format of labels was cls x y w h 
55
56
data -> WBC_v1.yaml
57
```
58
train: ../images/train
59
test: ../images/test
60
61
62
# number of classes
63
nc: 14
64
65
# class names
66
names: ["None","Myeloblast","Lymphoblast", "Neutrophil","Atypical lymphocyte","Promonocyte","Monoblast","Lymphocyte","Myelocyte","Abnormal promyelocyte", "Monocyte","Metamyelocyte","Eosinophil","Basophil"]
67
```
68
69
# Training
70
To reproduce the experimental result, we recommend training the model with the following steps.
71
72
Before training, please check data/WBC_v1.yaml, and enter the correct data paths.
73
74
The model is trained in 2 successive phases:
75
76
Phase 1: Model pre-train # 100 Epochs
77
78
Phase 2: Pre-trained weights used for further training # 200 Epochs
79
80
81
# Phase 1: Model pre-train
82
The first phase of training consists in the pre-training of the model. Training can be performed by running the following bash script:
83
84
```
85
python train.py \
86
 --name AttriDet_Phase1 \
87
 --batch 8 \
88
 --imgsz 640 \
89
 --epochs 100 \
90
 --data data/WBC_v1.yaml \
91
 --hyp data/hyps/hyp.scratch-high.yaml
92
 --weights yolov5x.pt
93
```
94
95
# Phase 2: Pre-trained weights used for further training 
96
The Pre-trained weights used for further training. Training can be performed by running the following bash script:
97
98
```
99
python train.py \
100
 --name AttriDet_Phase2 \
101
 --batch 8 \
102
 --imgsz 640 \
103
 --epochs 200 \
104
 --data data/WBC_v1.yaml \
105
 --hyp data/hyps/hyp.scratch-high.yaml
106
 --weights runs/AttriDet_Phase1/weights/last.pt
107
```
108
109
# Testing phase
110
once model training will be done, an Attribute_model directory will be created, containing ground truth vs predicted attributes csv files, additionally it will contain the attribute model weights saved with f1 score as best weights whereas last.pt will also be saved. These files and weights will be saved based on validation of model. To get model testing, the last.pt of YOLO and last.pt of attribute model will be used to run the test.py file. In result, in Attribute_model directory, a test subdirectory will be created, containing test.csv of ground truth vs predicted attributes. The yolo weights and testing will be save correspondingly in runs/val/exp.
111
112
```
113
python test.py \
114
 --weights /runs/train/AttriDet/weights//last.pt,
115
 --data, data/WBC_v1.yaml, 
116
 --save-csv,
117
 --imgsz,640
118
```