Diff of /README.md [000000] .. [03245f]

Switch to unified view

a b/README.md
1
# BSc Thesis research in Diagnostic Captioning
2
3
## Thesis paper
4
[Exploring Uni-modal, Cross-modal, and Multi-modal Diagnostic Captioning](http://nlp.cs.aueb.gr/theses/g_zachariadis_bsc_thesis.pdf)
5
6
## Abstract
7
Recent years have witnessed an increase in studies associated with image captioning, but little of that knowledge has been utilised in the biomedical field. This thesis addresses medical image captioning, referred as Diagnostic Captioning (DC), the task of assisting medical experts in diagnosis/report drafting. We present deep learning uni-modal, cross-modal and multi-modal methods that aim to generate a representative ``diagnostic text'' for a given medical image. The multi-modal approaches, utilise the radiology concepts (tags) used by clinicians to describe a patient's image (e.g., X-Ray, CT scan, etc.) as an additional input data. These methods, have not been adequately applied to biomedical research. We also experimented with a novel technique that utilises the captions generated from all the systems implemented as part of this thesis. Lastly, this thesis concerns the participation of AUEB's NLP Group, with the author being the main driver, on the 2022 ImageCLEFmedical Caption Prediction task. Out of 10 teams, our team came in second based on the primary evaluation metric, using an encoder-decoder approach, and first based on the secondary metric, utilising an ensemble technique applied on our generated captions. More about our paper can be found [here](http://ceur-ws.org/Vol-3180/paper-101.pdf)
8
9
10
## Enviroment setup
11
If you have GPU installed on your system, it is highly suggested to use conda as your virtual enviroment to run code. You can download conda from [here](https://conda.io/projects/conda/en/latest/user-guide/install/index.html)
12
13
After the installation is completed, open a terminal inside this project and run the following commands, to setup conda enviroment. The latter will be compatible with Tensorflow.
14
```
15
  1. conda create --name tf_gpu
16
  2. activate tf_gpu
17
  3. conda install tensorflow-gpu
18
  4. pip install -r requirements.txt
19
```
20
21
If you decide to use **ClinicalBERT** as the main text embeddings extraction model, you have to execute the `dc.py` in Pytorch-based enviroment. Thus, follow the next steps:
22
```
23
  1. conda create --name torch_gpu
24
  2. activate torch_gpu
25
  3. conda install torch-gpu
26
  4. pip install -r requirements.txt
27
```
28
Now, your environment will be compatible with Pytorch. Then comment-out the imports from `models/__init__.py` and `models/kNN.py`
29
30
## Dataset Instructions
31
As mentioned in the `Abstract` section, I participated in ImageCLEFmedical 2022 Caption Prediction task. The code also handles ImageCLEF dataset, but the latter as well as evaluation measures are not provided, due to the fact that we, as a group, signed an End User Agreement. Thus, only the IU X-Ray dataset is available. Go to [Datasets](https://github.com/zaaachos/Thesis-Diagnostic-Captioning/tree/main/data), download the dataset (i.e. IU X-Ray) and store it to the `data` directory
32
33
*You have to have something like this*:
34
```
35
.
36
├── data
37
│   ├── iu_xray
38
|   |   ├──two_captions.json
39
|   |   ├──two_images.json
40
|   |   ├──two_tags.json
41
|   |   └──densenet121.pkl     
42
|   |
43
|   ├──fasttext_voc.pkl
44
|   └──fasttext.npy
45
```
46
47
## Execution Instructions
48
### Disclaimer
49
Throughout my research on this Thesis, I experimented with models that had state-of-the-art performance (SOTA) on several biomedical datasets (like IU X-Ray, MIMIC III. etc.). These models are provided in `SOTA_models` directory as submodules repos. More details about each model are provided on my Thesis paper. I do not provide any additional data loaders, which I created for this models. Thus, if you want to further experiment with these models, please do so according to the guidelines provided in each of these repositories.
50
51
### Main applications
52
Follow the aforementioned steps to use conda and run the following command, to train my implemented methods (i.e. CNN-RNN, kNN). Default arguments are set.
53
```py
54
python3 dc.py
55
```
56
57
For arguments passing, run the following command in order to watch the available args.
58
```py
59
python3 dc.py -h
60
```
61
62
### Particular training procedures
63
It is suggested to use a Unix-like OS (like Linux) to execute the following specific processes or using WSL in Windows OS.
64
* Cross-modal CNN-RNN: `bash cross_modal_cnn_rnn.sh`
65
* Multi-modal CNN-RNN: `bash multi_modal_cnn_rnn.sh`
66
* Cross-modal k-NN: `bash cross_modal_kNN.sh`
67
* Multi-modal CNN-RNN: `bash multi_modal_kNN.sh`
68
69
## Citations
70
If you use or extend my work, please cite my paper.
71
```
72
@unpublished{Zachariadis2022,
73
  author = "G. Zachariadis",
74
  title = "Exploring Uni-modal, Cross-modal, and Multi-modal Diagnostic Captioning",
75
  year = "2022",
76
  note = "B.Sc. thesis, Department of Informatics, Athens University of Economics and Business}
77
}
78
```
79
80
You can read our publication ***"AUEB NLP Group at ImageCLEFmedical Caption 2022", Proceedings of the CLEF 2022*** at this [link](https://ceur-ws.org/Vol-3180/paper-101.pdf). If you use or extend our work, please cite our paper:
81
```
82
@article{charalampakos2022aueb,
83
  title={Aueb nlp group at imageclefmedical caption 2022},
84
  author={Charalampakos, Foivos and Zachariadis, Giorgos and Pavlopoulos, John and Karatzas, Vasilis and Trakas, Christoforos and Androutsopoulos, Ion},
85
  year={2022}
86
}
87
```
88
89
## License
90
[MIT License](https://github.com/zaaachos/bsc-thesis-in-diagnostic-captioning/blob/main/LICENSE)