In modern healthcare, radiology plays an essential role in diagnosing and managing numerous medical conditions. Chest X-rays are among the most widely used diagnostic tools to detect abnormalities such as Pneumonia, Hernia, and Cardiomegaly.
Project Motivation:
This project aims to automate the generation of preliminary radiology reports from chest X-ray images by leveraging advanced computer vision techniques and large language models. This system serves as an aid for radiologists by:
- Enhancing productivity
- Reducing delays
- Minimizing errors due to workload fatigue
The report covers:
- An overview of the dataset structure and features
- Detailed methodology and preprocessing steps
- Model design, training, evaluation, and optimization techniques
- Performance metrics and analysis
- Potential further improvements
The project uses the MIMIC-CXR dataset, which includes:
- 15,000 chest X-ray images (originally in DICOM format, converted to PNG)
- Associated radiology reports in XML format
Key Dataset Features:
- Image File Path: Location/link of the corresponding chest X-ray image.
- Findings: Textual descriptions of abnormalities or observations.
- Impression: A concise summary of the primary conclusions.
Pathology Labels (14 Total):
- Atelectasis
- Cardiomegaly
- Consolidation
- Edema
- Enlarged Cardiomediastinum
- Fracture
- Lung Lesion
- Lung Opacity
- Pleural Effusion
- Pleural Other
- Pneumonia
- Pneumothorax
- Support Devices
- No Finding
The project is structured into several key stages:
DICOM to PNG Conversion:
A custom script converts the original DICOM images to PNG format, reducing file size while preserving image quality for efficient loading and processing.
CSV Creation:
A dedicated script extracts the following fields:
image_ID
: Unique identifier for each image.image_path
: Consolidated file paths to each PNG image.findings
and impressions
: Parsed from XML reports.Fixing spacing around punctuation
Filtering and Label Mapping:
Invalid or missing entries are removed, and findings are mapped to a list of specific disease labels.
Image Augmentation:
Applied techniques include:
A custom function get_dataloaders
creates PyTorch DataLoader objects for training and validation with parameters:
- Batch Size: Default is 8.
- Train Split: 85% training, 15% validation.
- Num Workers: Default is 4 for faster loading.
- Collate Function: Custom function to merge samples, particularly for variable-length inputs like text.
CheXbert is a transformer-based model fine-tuned for medical text classification using the BERT architecture. It extracts multi-label classifications from chest X-ray radiology reports.
Generate high-dimensional contextual embeddings.
Label Extraction:
Probabilities are thresholded at 0.5 to produce binary labels.
Dataset Preparation:
The binary labels are integrated into a CSV file to enrich the dataset for multi-label classification.
ChexNet (based on DenseNet-121) is fine-tuned for multi-label classification of chest X-rays, focusing on structural abnormalities.
Two distinct model architectures were experimented with to generate medical reports:
BioGPT:
Configuration:
Generation Parameters:
max_length
: 150 tokens temperature
: 0.8 top_k
: 50 top_p
: 0.85 Integration and Flow:
BioGPT:
Configuration:
Generation Parameters:
max_length
: 150 tokens temperature
: 0.8 top_k
: 50 top_p
: 0.85 Integration and Flow:
<SEP>
token separator) are concatenated.In this analysis, a comprehensive comparison is conducted between the two distinct models. The ROUGE metric (Recall Oriented Understudy for Gisting Evaluation) is used as the primary evaluation metric, measuring the overlap between generated and reference text across several dimensions such as recall, precision, and F1-score.
ROUGE-L (Longest Common Subsequence):
This metric evaluates the longest common subsequence between the generated and reference texts, giving credit for correctly ordered content even if the content is spread out.
Graph snippets for (BioGPT + Image Encoder) and (BioGPT + Image Encoder + ChexNet Labels) are provided below:
Limited Computation Power:
Resource constraints affected training and model size selection.
Model Complexity:
Smaller models failed to capture detailed findings, while larger models were required for improved accuracy.
Error Propagation:
The clinical findings extraction model introduces some errors that can impact the final report quality.
The model is deployed using Streamlit on an AWS EC2 instance for real-time inference.
This project demonstrates a synergistic approach combining computer vision and natural language processing to assist radiologists by generating detailed preliminary reports from chest X-ray images.
Feel free to explore the repository for code, experiments, and further documentation.