a b/README.md
1
# Automated Radiology Report Generation from Chest X-Rays
2
3
## Table of Contents
4
- [Introduction](#introduction)
5
- [Dataset Description](#dataset-description)
6
- [Methodology](#methodology)
7
  - [1. Data Collection and Preprocessing](#1-data-collection-and-preprocessing)
8
    - [a. Data Extraction](#a-data-extraction)
9
    - [b. Data Pre-processing](#b-data-pre-processing)
10
    - [c. Dataset Split](#c-dataset-split)
11
  - [2. Extracting Labels Using CheXbert](#2-extracting-labels-using-chexbert)
12
  - [3. ChexNet for Structural Findings Extraction](#3-chexnet-for-structural-findings-extraction)
13
  - [4. Model Architectures](#4-model-architectures)
14
    - [Model 1: BioVilt + Alignment + BioGPT](#model-1-biovilt--alignment--biogpt)
15
    - [Model 2: BioVilt + ChexNet + Alignment + BioGPT](#model-2-biovilt--chexnet--alignment--biogpt)
16
- [Results](#results)
17
- [Challenges Faced](#challenges-faced)
18
- [Deployment](#deployment)
19
- [References](#references)
20
21
---
22
23
## Introduction
24
25
In modern healthcare, radiology plays an essential role in diagnosing and managing numerous medical conditions. **Chest X-rays** are among the most widely used diagnostic tools to detect abnormalities such as **Pneumonia, Hernia, and Cardiomegaly**.
26
27
**Project Motivation:**  
28
This project aims to **automate the generation of preliminary radiology reports** from chest X-ray images by leveraging advanced computer vision techniques and large language models. This system serves as an aid for radiologists by:
29
- Enhancing productivity
30
- Reducing delays
31
- Minimizing errors due to workload fatigue
32
33
The report covers:
34
- An overview of the dataset structure and features
35
- Detailed methodology and preprocessing steps
36
- Model design, training, evaluation, and optimization techniques
37
- Performance metrics and analysis
38
- Potential further improvements
39
40
---
41
42
## Dataset Description
43
44
The project uses the **MIMIC-CXR** dataset, which includes:
45
- **15,000 chest X-ray images** (originally in DICOM format, converted to PNG)
46
- Associated radiology reports in XML format
47
48
**Key Dataset Features:**
49
- **Image File Path:** Location/link of the corresponding chest X-ray image.
50
- **Findings:** Textual descriptions of abnormalities or observations.
51
- **Impression:** A concise summary of the primary conclusions.
52
53
**Pathology Labels (14 Total):**
54
- Atelectasis
55
- Cardiomegaly
56
- Consolidation
57
- Edema
58
- Enlarged Cardiomediastinum
59
- Fracture
60
- Lung Lesion
61
- Lung Opacity
62
- Pleural Effusion
63
- Pleural Other
64
- Pneumonia
65
- Pneumothorax
66
- Support Devices
67
- No Finding
68
69
---
70
71
## Methodology
72
73
The project is structured into several key stages:
74
75
### 1. Data Collection and Preprocessing
76
77
#### a. Data Extraction
78
- **DICOM to PNG Conversion:**  
79
  A custom script converts the original DICOM images to PNG format, reducing file size while preserving image quality for efficient loading and processing.
80
81
- **CSV Creation:**  
82
  A dedicated script extracts the following fields:
83
  - `image_ID`: Unique identifier for each image.
84
  - `image_path`: Consolidated file paths to each PNG image.
85
  - `findings` and `impressions`: Parsed from XML reports.
86
87
#### b. Data Pre-processing
88
- **Text Cleaning:**  
89
  - Expanding abbreviations (e.g., "lat" → "lateral")
90
  - Removing special characters
91
  - Fixing spacing around punctuation
92
93
- **Filtering and Label Mapping:**  
94
  Invalid or missing entries are removed, and findings are mapped to a list of specific disease labels.
95
96
- **Image Augmentation:**  
97
  Applied techniques include:
98
  - Resizing to (224, 224)
99
  - Random rotations and flips
100
  - Noise addition
101
  - Normalization
102
103
#### c. Dataset Split
104
A custom function `get_dataloaders` creates PyTorch DataLoader objects for training and validation with parameters:
105
- **Batch Size:** Default is 8.
106
- **Train Split:** 85% training, 15% validation.
107
- **Num Workers:** Default is 4 for faster loading.
108
- **Collate Function:** Custom function to merge samples, particularly for variable-length inputs like text.
109
110
---
111
112
### 2. Extracting Labels Using CheXbert
113
114
**CheXbert** is a transformer-based model fine-tuned for medical text classification using the BERT architecture. It extracts multi-label classifications from chest X-ray radiology reports.
115
116
#### Process:
117
1. **Text Processing:**  
118
   - Extract "Findings" and "Impressions" from reports.
119
   - Tokenize and format text for CheXbert.
120
   - Generate high-dimensional contextual embeddings.
121
122
2. **Label Extraction:**  
123
   - A classification layer predicts probabilities for each clinical condition.
124
   - Probabilities are thresholded at **0.5** to produce binary labels.
125
126
3. **Dataset Preparation:**  
127
   The binary labels are integrated into a CSV file to enrich the dataset for multi-label classification.
128
129
![CheXbert Workflow](https://github.com/user-attachments/assets/29b4921c-d5e8-431d-ba86-8b73ca16b8b6)
130
131
---
132
133
### 3. ChexNet for Structural Findings Extraction
134
135
**ChexNet** (based on DenseNet-121) is fine-tuned for multi-label classification of chest X-rays, focusing on structural abnormalities.
136
137
#### Key Points:
138
- **Base Model:** DenseNet-121 with pre-trained ImageNet weights.
139
- **Layer Freezing:**  
140
  Initial layers are frozen; only the last two dense blocks and the classifier head are fine-tuned.
141
- **Custom Classifier:**  
142
  - **Input:** 1024 features from DenseNet-121.
143
  - **Hidden Layer:** 512 units with ReLU activation.
144
  - **Dropout:** 0.3 for regularization.
145
  - **Output:** 14 sigmoid-activated nodes for multi-label classification.
146
- **Training Procedure:**  
147
  - **Loss Function:** Custom Weighted Binary Cross-Entropy Loss (WeightedBCELoss)
148
  - **Optimizer:** Adam with differential learning rates.
149
  - **Scheduler:** ReduceLROnPlateau.
150
  - **Metric:** Achieved an F1-micro score of **0.70**.
151
152
![ChexNet Workflow](https://github.com/user-attachments/assets/eaf445e8-5696-43d0-998b-4905b36507e6)
153
154
---
155
156
### 4. Model Architectures
157
158
Two distinct model architectures were experimented with to generate medical reports:
159
160
#### Model 1: BioVilt + Alignment + BioGPT
161
162
1. **Components:**
163
   - **BioVilt:**  
164
     - Uses a ResNet backbone (ResNet-50/ResNet-18) for feature extraction.
165
     - Produces a 512-dimensional global embedding.
166
   - **Alignment Module:**  
167
     - Bridges image embeddings with textual representations.
168
   - **BioGPT:**  
169
     - A powerful GPT-2 based language model pre-trained on biomedical literature (approx. 347M parameters).
170
171
2. **Configuration:**
172
   - **BioVilt:**  
173
     - Backbone: ResNet-50  
174
     - Output: 512-dimensional embedding.
175
   - **Alignment Module:**  
176
     - Text encoder: Microsoft BioGPT.
177
     - Projection layers map image embeddings to BioGPT’s 768-dimensional space.
178
     - **Loss Function:** Contrastive Loss.
179
   - **BioGPT (PEFT via LoRA):**  
180
     - **Rank:** 16  
181
     - **Alpha:** 32  
182
     - **Dropout:** 0.1  
183
   - **Generation Parameters:**  
184
     - `max_length`: 150 tokens  
185
     - `temperature`: 0.8  
186
     - `top_k`: 50  
187
     - `top_p`: 0.85  
188
189
3. **Integration and Flow:**
190
   - **Image Preprocessing:** Resize and augment PNG images.
191
   - **Image Encoding:** BioVilt extracts image features.
192
   - **Alignment:** Projects image embeddings to align with BioGPT's text embeddings.
193
   - **Report Generation:** The aligned embeddings are fed into BioGPT to generate the final report.
194
195
---
196
197
#### Model 2: BioVilt + ChexNet + Alignment + BioGPT
198
199
![Model 2 Overview](https://github.com/user-attachments/assets/f7da053d-97ec-43d3-b66c-c976ecd269ed)
200
201
1. **Components:**
202
   - **BioVilt:**  
203
     - ResNet-50 based image encoder.
204
   - **ChexNet:**  
205
     - Multi-label classifier (DenseNet-121) for structural findings.
206
   - **Alignment Module:**  
207
     - Integrates image and label embeddings with text embeddings.
208
   - **BioGPT:**  
209
     - Fine-tuned for biomedical report generation.
210
211
2. **Configuration:**
212
   - **BioVilt:**  
213
     - Backbone: ResNet-50  
214
     - Output: 512-dimensional embedding.
215
   - **ChexNet:**  
216
     - Backbone: DenseNet-121  
217
     - Output: Multi-label predictions for 14 clinical findings.
218
   - **Alignment Module:**  
219
     - Text encoder: Microsoft BioGPT.
220
     - Projection layers map image embeddings to 768 dimensions and separately project text from the ground truth reports.
221
     - **Loss Function:** Contrastive Loss.
222
   - **BioGPT (PEFT via LoRA):**  
223
     - **Rank:** 16  
224
     - **Alpha:** 32  
225
     - **Dropout:** 0.1  
226
   - **Generation Parameters:**  
227
     - `max_length`: 150 tokens  
228
     - `temperature`: 0.8  
229
     - `top_k`: 50  
230
     - `top_p`: 0.85  
231
232
3. **Integration and Flow:**
233
   - **Image Preprocessing:** Resize and augment PNG images.
234
   - **Image Encoding:** BioVilt extracts image features.
235
   - **ChexNet Classification:** Identifies structural findings and generates binary labels.
236
   - **Alignment:** Combines image embeddings with label information and projects them to align with BioGPT’s text embeddings.
237
   - **Concatenation:** The image embeddings and prompt text embeddings (with a `<SEP>` token separator) are concatenated.
238
   - **Report Generation:** The concatenated embeddings are fed into BioGPT to generate the final report.
239
240
---
241
242
## Results
243
244
In this analysis, a comprehensive comparison is conducted between the two distinct models. The **ROUGE** metric (Recall Oriented Understudy for Gisting Evaluation) is used as the primary evaluation metric, measuring the overlap between generated and reference text across several dimensions such as recall, precision, and F1-score.
245
246
**ROUGE-L (Longest Common Subsequence):**  
247
This metric evaluates the longest common subsequence between the generated and reference texts, giving credit for correctly ordered content even if the content is spread out.
248
249
Graph snippets for **(BioGPT + Image Encoder)** and **(BioGPT + Image Encoder + ChexNet Labels)** are provided below:
250
251
![image](https://github.com/user-attachments/assets/7db55f12-ca80-4f3d-8d9c-7ac39579754e)
252
253
- **Model 1: BioVilt + Alignment + BioGPT**
254
255
  ![Model 1 Results](https://github.com/user-attachments/assets/a45cb640-50bc-4556-89f6-06e068e8a24a)
256
257
- **Model 2: BioVilt + ChexNet + Alignment + BioGPT**
258
259
  ![Model 2 Results](https://github.com/user-attachments/assets/598b4263-2dc2-4620-9587-648e3701a79b)
260
261
---
262
263
## Challenges Faced
264
265
- **Limited Computation Power:**  
266
  Resource constraints affected training and model size selection.
267
268
- **Model Complexity:**  
269
  Smaller models failed to capture detailed findings, while larger models were required for improved accuracy.
270
271
- **Error Propagation:**  
272
  The clinical findings extraction model introduces some errors that can impact the final report quality.
273
274
---
275
276
## Deployment
277
278
The model is deployed using **Streamlit** on an **AWS EC2** instance for real-time inference.
279
280
---
281
282
## References
283
284
- **CheXbert:** [CheXbert GitHub Repository](https://github.com/stanfordmlgroup/CheXbert)
285
- **ChexNet:** [ChexNet: Radiologist-Level Pneumonia Detection on Chest X-Rays (arXiv)](https://arxiv.org/abs/1711.05225)
286
- **BioVilt:** [BioViLT: A Vision-Language Transformer for Medical Image Report Generation (arXiv)](https://arxiv.org/abs/2206.09993)
287
- **BioGPT:** [BioGPT BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining](https://arxiv.org/abs/2210.10341)
288
- **PEFT Techniques (LoRA):** [LoRA: Low-Rank Adaptation for Fast Training of Neural Networks (arXiv)](https://arxiv.org/abs/2106.09685)
289
290
---
291
292
*This project demonstrates a synergistic approach combining computer vision and natural language processing to assist radiologists by generating detailed preliminary reports from chest X-ray images.*
293
294
Feel free to explore the repository for code, experiments, and further documentation.