Switch to unified view

a/README.md b/README.md
1
Development Status :: 3 - Alpha <br>
1
Development Status :: 3 - Alpha <br>
2
*Copyright (c) 2023 MinWoo Park*
2
*Copyright (c) 2023 MinWoo Park*
3
<br>
3
<br>
4
4
5
# GPT-BERT Medical QA Chatbot
5
# GPT-BERT Medical QA Chatbot
6
[![Contributor Covenant](https://img.shields.io/badge/contributor%20covenant-v2.0%20adopted-black.svg)](code_of_conduct.md)
6
[![Contributor Covenant](https://img.shields.io/badge/contributor%20covenant-v2.0%20adopted-black.svg)](code_of_conduct.md)
7
[![Python Version](https://img.shields.io/badge/python-3.6%2C3.7%2C3.8-black.svg)](code_of_conduct.md)
7
[![Python Version](https://img.shields.io/badge/python-3.6%2C3.7%2C3.8-black.svg)](code_of_conduct.md)
8
![Code convention](https://img.shields.io/badge/code%20convention-pep8-black)
8
![Code convention](https://img.shields.io/badge/code%20convention-pep8-black)
9
![Black Fomatter](https://img.shields.io/badge/code%20style-black-000000.svg)
9
![Black Fomatter](https://img.shields.io/badge/code%20style-black-000000.svg)
10
10
11
> **Be careful when cloning this repository**: It contains large NLP model weight. (>0.45GB, [`git-lfs`](https://git-lfs.com/)) <br>
11
 **Be careful when cloning this repository**: It contains large NLP model weight. (>0.45GB, [`git-lfs`](https://git-lfs.com/)) <br>
12
> If you want to clone without git-lfs, use this command before `git clone`. *The bandwidth provided by git-lfs for free is only 1GB per month, so there is almost no chance that a 0.45GB git-lfs download will work. So please download it manually.*
12
If you want to clone without git-lfs, use this command before `git clone`. *The bandwidth provided by git-lfs for free is only 1GB per month, so there is almost no chance that a 0.45GB git-lfs download will work. So please download it manually.*
13
```
13
```
14
git lfs install --skip-smudge &
14
git lfs install --skip-smudge &
15
export GIT_LFS_SKIP_SMUDGE=1
15
export GIT_LFS_SKIP_SMUDGE=1
16
```
16
```
17
17
18
[](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/medichatbot_walle.png)
18
[](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/medichatbot_walle.png?raw=true)
19
19
20
Since the advent of Chat GPT-4, there have been significant changes in the field. Nevertheless, Chat GPT-2 and Chat GPT-3 continue to be effective in specific domains as large-scale auto-regressive natural language processing models. This repository aims to qualitatively compare the performance of Chat GPT-2 and Chat GPT-4 in the medical domain, and estimate the resources and costs needed for Chat GPT-2 fine-tuning to reach the performance level of Chat GPT-4. Additionally, it seeks to assess how well up-to-date information can be incorporated and applied.
20
Since the advent of Chat GPT-4, there have been significant changes in the field. Nevertheless, Chat GPT-2 and Chat GPT-3 continue to be effective in specific domains as large-scale auto-regressive natural language processing models. This repository aims to qualitatively compare the performance of Chat GPT-2 and Chat GPT-4 in the medical domain, and estimate the resources and costs needed for Chat GPT-2 fine-tuning to reach the performance level of Chat GPT-4. Additionally, it seeks to assess how well up-to-date information can be incorporated and applied.
21
21
22
Although a few years behind GPT-4, the ultimate goal of this repository is to minimize costs and resources required for updating and obtaining usable weights after acquiring them. We plan to design experiments for few-shot learning in large-scale natural language processing models and test existing research. Please note that this repository is intended for research and practice purposes only, and we do not assume responsibility for any usage.
22
Although a few years behind GPT-4, the ultimate goal of this repository is to minimize costs and resources required for updating and obtaining usable weights after acquiring them. We plan to design experiments for few-shot learning in large-scale natural language processing models and test existing research. Please note that this repository is intended for research and practice purposes only, and we do not assume responsibility for any usage.
23
23
24
Additionally, this repository ultimately aims to achieve similar qualitative and quantitative performance as GPT-4 in certain domain areas through model lightweighting and optimization. For more details, please refer to my technical blog.
24
Additionally, this repository ultimately aims to achieve similar qualitative and quantitative performance as GPT-4 in certain domain areas through model lightweighting and optimization. For more details, please refer to my technical blog.
25
25
26
*Keywords: GPT-2, Streamlit, Vector DB, Medical*
26
*Keywords: GPT-2, Streamlit, Vector DB, Medical*
27
27
28
<br><br><br><br><br><br>
28
<br><br><br><br><br><br>
29
29
30
# Contents
30
# Contents
31
- [GPT-BERT Medical QA Chatbot](#gpt-bert-medical-qa-chatbot)
31
- [GPT-BERT Medical QA Chatbot](#gpt-bert-medical-qa-chatbot)
32
- [Contents](#contents)
32
- [Contents](#contents)
33
- [Quick Start](#quick-start)
33
- [Quick Start](#quick-start)
34
  - [Command-Line Interface](#command-line-interface)
34
  - [Command-Line Interface](#command-line-interface)
35
  - [Streamlit application](#streamlit-application)
35
  - [Streamlit application](#streamlit-application)
36
- [Docker](#docker)
36
- [Docker](#docker)
37
  - [Build from Docker Image](#build-from-docker-image)
37
  - [Build from Docker Image](#build-from-docker-image)
38
  - [Build from Docker Compose](#build-from-docker-compose)
38
  - [Build from Docker Compose](#build-from-docker-compose)
39
  - [Build from Docker Hub](#build-from-docker-hub)
39
  - [Build from Docker Hub](#build-from-docker-hub)
40
  - [Pre-trained model infomation](#pre-trained-model-infomation)
40
  - [Pre-trained model infomation](#pre-trained-model-infomation)
41
- [Dataset](#dataset)
41
- [Dataset](#dataset)
42
- [Pretrained Models](#pretrained-models)
42
- [Pretrained Models](#pretrained-models)
43
- [Cites](#cites)
43
- [Cites](#cites)
44
- [How to cite this project](#how-to-cite-this-project)
44
- [How to cite this project](#how-to-cite-this-project)
45
- [Tips](#tips)
45
- [Tips](#tips)
46
  - [About data handling](#about-data-handling)
46
  - [About data handling](#about-data-handling)
47
  - [About Tensorflow-GPU handling](#about-tensorflow-gpu-handling)
47
  - [About Tensorflow-GPU handling](#about-tensorflow-gpu-handling)
48
  - [Remark](#remark)
48
  - [Remark](#remark)
49
- [References](#references)
49
- [References](#references)
50
50
51
<br><br><br><br><br><br>
51
<br><br><br><br><br><br>
52
52
53
53
54
54
55
55
56
<br>
56
<br>
57
57
58
# Quick Start
58
# Quick Start
59
## Command-Line Interface
59
## Command-Line Interface
60
You can chat with the chatbot through the command-line interface using the following command.
60
You can chat with the chatbot through the command-line interface using the following command.
61
![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/medichatbot.gif)
61
![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/medichatbot.gif)
62
```
62
```
63
git clone https://github.com/DSDanielPark/medical-qa-bert-chatgpt.git
63
git clone https://github.com/DSDanielPark/medical-qa-bert-chatgpt.git
64
cd medical-qa-bert-chatgpt
64
cd medical-qa-bert-chatgpt
65
pip install -e .
65
pip install -e .
66
python main.py
66
python main.py
67
```
67
```
68
![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/medichatbot.png)
68
![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/medichatbot.png?raw=true)
69
69
70
<br>
70
<br>
71
71
72
## Streamlit application
72
## Streamlit application
73
A simple application can be implemented with streamlit as follows: <br>
73
A simple application can be implemented with streamlit as follows: <br>
74
![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/streamlit_app2.gif)
74
![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/streamlit_app2.gif)
75
```
75
```
76
git clone https://github.com/DSDanielPark/medical-qa-bert-chatgpt.git
76
git clone https://github.com/DSDanielPark/medical-qa-bert-chatgpt.git
77
cd medical-qa-bert-chatgpt
77
cd medical-qa-bert-chatgpt
78
pip install -e .
78
pip install -e .
79
streamlit run chatbot.py
79
streamlit run chatbot.py
80
```
80
```
81
<!-- ![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/streamlit3.png) -->
81
<!-- ![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/streamlit3.png?raw=true) -->
82
82
83
# Docker
83
# Docker
84
Check Docker Hub: https://hub.docker.com/r/parkminwoo91/medical-chatgpt-streamlit-v1 <br>
84
Check Docker Hub: https://hub.docker.com/r/parkminwoo91/medical-chatgpt-streamlit-v1 <br>
85
Docker version 20.10.24, build 297e128
85
Docker version 20.10.24, build 297e128
86
86
87
## Build from Docker Image
87
## Build from Docker Image
88
```
88
```
89
git clone https://github.com/DSDanielPark/medical-qa-bert-chatgpt.git
89
git clone https://github.com/DSDanielPark/medical-qa-bert-chatgpt.git
90
cd medical-qa-bert-chatgpt
90
cd medical-qa-bert-chatgpt
91
docker build -t chatgpt .
91
docker build -t chatgpt .
92
docker run -p 8501:8501 -v ${PWD}/:/usr/src/app/data chatgpt     # There is no cost to pay for git-lfs, just download and mount it.
92
docker run -p 8501:8501 -v ${PWD}/:/usr/src/app/data chatgpt     # There is no cost to pay for git-lfs, just download and mount it.
93
```
93
```
94
##### Since git clone downloads what needs to be downloaded from git-lfs, the volume must be mounted as follows. Or modify `chatbot/config.py` to mount to a different folder.
94
##### Since git clone downloads what needs to be downloaded from git-lfs, the volume must be mounted as follows. Or modify `chatbot/config.py` to mount to a different folder.
95
95
96
## Build from Docker Compose
96
## Build from Docker Compose
97
You can also implement it in a docker container like this: <br>
97
You can also implement it in a docker container like this: <br>
98
![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/docker_build.gif)
98
![](https://github.com/DSDanielPark/medical-qa-bert-chatgpt/blob/main/assets/imgs/docker_build.gif)
99
```
99
```
100
git clone https://github.com/DSDanielPark/medical-qa-bert-chatgpt.git
100
git clone https://github.com/DSDanielPark/medical-qa-bert-chatgpt.git
101
cd medical-qa-bert-chatgpt
101
cd medical-qa-bert-chatgpt
102
102
103
docker compose up
103
docker compose up
104
```
104
```
105
105
106
## Build from Docker Hub
106
## Build from Docker Hub
107
107
108
```
108
```
109
docker pull parkminwoo91/medical-chatgpt-streamlit-v1:latest
109
docker pull parkminwoo91/medical-chatgpt-streamlit-v1:latest
110
docker compose up
110
docker compose up
111
```
111
```
112
http://localhost:8501/
112
http://localhost:8501/
113
113
114
###### Streamlit is very convenient and quick to view landing pages, but lacks design flexibility and lacks control over the application layout. Also, if your application or data set is large, the entire source code will be re-run on every new change or interaction, so application flow can cause speed issues. That landing page will be replaced by flask with further optimizations. Streamlit chatbot has been recently developed, so it seems difficult to have the meaning of a simple demo now.
114
###### Streamlit is very convenient and quick to view landing pages, but lacks design flexibility and lacks control over the application layout. Also, if your application or data set is large, the entire source code will be re-run on every new change or interaction, so application flow can cause speed issues. That landing page will be replaced by flask with further optimizations. Streamlit chatbot has been recently developed, so it seems difficult to have the meaning of a simple demo now.
115
115
116
## Pre-trained model infomation
116
## Pre-trained model infomation
117
`Pre-trained model weight needed`
117
`Pre-trained model weight needed`
118
Downloading datasets and model weights through the Hugging Face Hub is executed, but for some TensorFlow models, you need to manually download and place them at the top of the project folder. The information for the downloadable model is as follows, and you can visit my Hugging Face repository to check it. <br>
118
Downloading datasets and model weights through the Hugging Face Hub is executed, but for some TensorFlow models, you need to manually download and place them at the top of the project folder. The information for the downloadable model is as follows, and you can visit my Hugging Face repository to check it. <br>
119
<br>
119
<br>
120
`modules/chatbot/config.py`
120
`modules/chatbot/config.py`
121
```python
121
```python
122
class Config:
122
class Config:
123
    chat_params = {"gpt_tok":"danielpark/medical-QA-chatGPT2-tok-v1",
123
    chat_params = {"gpt_tok":"danielpark/medical-QA-chatGPT2-tok-v1",
124
                   "tf_gpt_model":"danielpark/medical-QA-chatGPT2-v1",
124
                   "tf_gpt_model":"danielpark/medical-QA-chatGPT2-v1",
125
                   "bert_tok":"danielpark/medical-QA-BioRedditBERT-uncased-v1",
125
                   "bert_tok":"danielpark/medical-QA-BioRedditBERT-uncased-v1",
126
                   "tf_q_extractor": "question_extractor_model",
126
                   "tf_q_extractor": "question_extractor_model",
127
                   "data":"danielpark/MQuAD-v1",
127
                   "data":"danielpark/MQuAD-v1",
128
                   "max_answer_len": 20,
128
                   "max_answer_len": 20,
129
                   "isEval": False,
129
                   "isEval": False,
130
                   "runDocker":True, # Exceeds the bandwidth of git-lfs, mounts to local storage to find folder location for free use. I use the python utifunction package.
130
                   "runDocker":True, # Exceeds the bandwidth of git-lfs, mounts to local storage to find folder location for free use. I use the python utifunction package.
131
                   "container_mounted_folder_path": "/usr/src/app/data"} 
131
                   "container_mounted_folder_path": "/usr/src/app/data"} 
132
```
132
```
133
133
134
<br>
134
<br>
135
135
136
# Dataset
136
# Dataset
137
The Medical Question and Answering dataset(MQuAD) has been refined, including the following datasets. You can download it through the Hugging Face dataset. Use the DATASETS method as follows. You can find more infomation at [here.](https://huggingface.co/datasets/danielpark/MQuAD-v1)
137
The Medical Question and Answering dataset(MQuAD) has been refined, including the following datasets. You can download it through the Hugging Face dataset. Use the DATASETS method as follows. You can find more infomation at [here.](https://huggingface.co/datasets/danielpark/MQuAD-v1)
138
138
139
```python
139
```python
140
from datasets import load_dataset
140
from datasets import load_dataset
141
dataset = load_dataset("danielpark/MQuAD-v1")
141
dataset = load_dataset("danielpark/MQuAD-v1")
142
```
142
```
143
143
144
Medical Q/A datasets gathered from the following websites.
144
Medical Q/A datasets gathered from the following websites.
145
- eHealth Forum
145
- eHealth Forum
146
- iCliniq
146
- iCliniq
147
- Question Doctors
147
- Question Doctors
148
- WebMD
148
- WebMD
149
Data was gathered at the 5th of May 2017.
149
Data was gathered at the 5th of May 2017.
150
150
151
<br>
151
<br>
152
152
153
# Pretrained Models
153
# Pretrained Models
154
Hugging face pretrained models
154
Hugging face pretrained models
155
- GPT2 pretrained model [[download]](https://huggingface.co/danielpark/medical-QA-chatGPT2-v1)
155
- GPT2 pretrained model [[download]](https://huggingface.co/danielpark/medical-QA-chatGPT2-v1)
156
- GPT2 tokenizer [[download]](https://huggingface.co/danielpark/medical-QA-chatGPT2-tok-v1)
156
- GPT2 tokenizer [[download]](https://huggingface.co/danielpark/medical-QA-chatGPT2-tok-v1)
157
- BIO Reddit BERT pretrained model [[download]](https://huggingface.co/danielpark/medical-QA-BioRedditBERT-uncased-v1)
157
- BIO Reddit BERT pretrained model [[download]](https://huggingface.co/danielpark/medical-QA-BioRedditBERT-uncased-v1)
158
158
159
TensorFlow models for extracting context from QA.
159
TensorFlow models for extracting context from QA.
160
I temporarily share TensorFlow model weights through my personal Google Drive.
160
I temporarily share TensorFlow model weights through my personal Google Drive.
161
- Q extractor [[download]](https://drive.google.com/drive/folders/1VjljBW_HXXIXoh0u2Y1anPCveQCj9vnQ?usp=share_link)
161
- Q extractor [[download]](https://drive.google.com/drive/folders/1VjljBW_HXXIXoh0u2Y1anPCveQCj9vnQ?usp=share_link)
162
- A extractor [[download]](https://drive.google.com/drive/folders/1iZ6jCiZPqjsNOyVoHcagEf3hDC5H181j?usp=share_link)
162
- A extractor [[download]](https://drive.google.com/drive/folders/1iZ6jCiZPqjsNOyVoHcagEf3hDC5H181j?usp=share_link)
163
163
164
164
165
<br>
165
<br>
166
166
167
# Cites
167
# Cites
168
```BibTex
168
```BibTex
169
@misc {hf_canonical_model_maintainers_2022,
169
@misc {hf_canonical_model_maintainers_2022,
170
        author       = { {HF Canonical Model Maintainers} },
170
        author       = { {HF Canonical Model Maintainers} },
171
        title        = { gpt2 (Revision 909a290) },
171
        title        = { gpt2 (Revision 909a290) },
172
        year         = 2022,
172
        year         = 2022,
173
        url          = { https://huggingface.co/gpt2 },
173
        url          = { https://huggingface.co/gpt2 },
174
        doi          = { 10.57967/hf/0039 },
174
        doi          = { 10.57967/hf/0039 },
175
        publisher    = { Hugging Face }
175
        publisher    = { Hugging Face }
176
}
176
}
177
177
178
@misc{vaswani2017attention,
178
@misc{vaswani2017attention,
179
      title = {Attention Is All You Need}, 
179
      title = {Attention Is All You Need}, 
180
      author = {Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
180
      author = {Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
181
      year = {2017},
181
      year = {2017},
182
      eprint = {1706.03762},
182
      eprint = {1706.03762},
183
      archivePrefix = {arXiv},
183
      archivePrefix = {arXiv},
184
      primaryClass = {cs.CL}
184
      primaryClass = {cs.CL}
185
}
185
}
186
```
186
```
187
<br>
187
<br>
188
188
189
189
190
# How to cite this project
190
# How to cite this project
191
```BibTex
191
```BibTex
192
@misc{medical_qa_bert_chatgpt,
192
@misc{medical_qa_bert_chatgpt,
193
      title  = {Medical QA Bert Chat GPT}, 
193
      title  = {Medical QA Bert Chat GPT}, 
194
      author = {Minwoo Park},
194
      author = {Minwoo Park},
195
      year   = {2023},
195
      year   = {2023},
196
      url    = {https://github.com/dsdanielpark/medical-qa-bert-chatgpt},
196
      url    = {https://github.com/dsdanielpark/medical-qa-bert-chatgpt},
197
}
197
}
198
```
198
```
199
199
200
200
201
<br>
201
<br>
202
202
203
# Tips
203
# Tips
204
204
205
## About data handling
205
## About data handling
206
The MQuAD provides embedded question and answer arrays in string format, so it is recommended to convert the string-formatted arrays into float format as follows. This measure has been applied to save resources and time used for embedding.
206
The MQuAD provides embedded question and answer arrays in string format, so it is recommended to convert the string-formatted arrays into float format as follows. This measure has been applied to save resources and time used for embedding.
207
207
208
```python
208
```python
209
from datasets import load_dataset
209
from datasets import load_dataset
210
from utilfunction import col_convert
210
from utilfunction import col_convert
211
import pandas as pd
211
import pandas as pd
212
212
213
qa = load_dataset("danielpark/MQuAD-v1", "csv")
213
qa = load_dataset("danielpark/MQuAD-v1", "csv")
214
df_qa = pd.DataFrame(qa['train'])
214
df_qa = pd.DataFrame(qa['train'])
215
df_qa = col_convert(df_qa, ['Q_FFNN_embeds', 'A_FFNN_embeds'])
215
df_qa = col_convert(df_qa, ['Q_FFNN_embeds', 'A_FFNN_embeds'])
216
```
216
```
217
217
218
## About Tensorflow-GPU handling
218
## About Tensorflow-GPU handling
219
Since the nvidia GPU driver fully supports wsl2, the method of supporting TensorFlow's gpu has changed. Please refer to the following pages to install it.
219
Since the nvidia GPU driver fully supports wsl2, the method of supporting TensorFlow's gpu has changed. Please refer to the following pages to install it.
220
- https://docs.nvidia.com/cuda/wsl-user-guide/index.html
220
- https://docs.nvidia.com/cuda/wsl-user-guide/index.html
221
- https://www.tensorflow.org/install/pip?hl=ko
221
- https://www.tensorflow.org/install/pip?hl=ko
222
222
223
<br>
223
<br>
224
224
225
## Remark
225
## Remark
226
I have trained the model for 2 epochs using the mentioned dataset, utilizing 40 computing units from Google Colab Pro. The training was conducted for about 12 hours using an A100 multi-GPU with 56 GB of RAM or more. In the case of relatively simple question extractor or answer extractor models that perform summarization and indexing, the time required for training is minimal, and they are included in the inference module to evaluate whether the learning has been carried out appropriately. If the model is only responding to simple questions, the inference module should be changed; 
226
I have trained the model for 2 epochs using the mentioned dataset, utilizing 40 computing units from Google Colab Pro. The training was conducted for about 12 hours using an A100 multi-GPU with 56 GB of RAM or more. In the case of relatively simple question extractor or answer extractor models that perform summarization and indexing, the time required for training is minimal, and they are included in the inference module to evaluate whether the learning has been carried out appropriately. If the model is only responding to simple questions, the inference module should be changed; 
227
however, it is currently included in the evaluation unnecessarily to check performance and calculate the time and resources consumed. I plan to update this information once sufficient training is completed (by incorporating additional datasets), or when funding for experiments and resources to derive adequate learning. <br>
227
however, it is currently included in the evaluation unnecessarily to check performance and calculate the time and resources consumed. I plan to update this information once sufficient training is completed (by incorporating additional datasets), or when funding for experiments and resources to derive adequate learning. <br>
228
228
229
- Training 2 Epoch with `MQuAD` dataset, Comsuming 40 Google Colab Pro Computing unit, Take 12 hours using an A100 multi-GPU with 56 GB of RAM or more.
229
- Training 2 Epoch with `MQuAD` dataset, Comsuming 40 Google Colab Pro Computing unit, Take 12 hours using an A100 multi-GPU with 56 GB of RAM or more.
230
230
231
<br>
231
<br>
232
232
233
# References
233
# References
234
1. [Paper: Attention is All You Need](https://arxiv.org/abs/1706.03762)
234
1. [Paper: Attention is All You Need](https://arxiv.org/abs/1706.03762)
235
2. [Paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
235
2. [Paper: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
236
3. [Paper: GPT-2: Language Models are Unsupervised Multitask Learners](https://arxiv.org/ftp/arxiv/papers/1901/1901.08746.pdf)
236
3. [Paper: GPT-2: Language Models are Unsupervised Multitask Learners](https://arxiv.org/ftp/arxiv/papers/1901/1901.08746.pdf)
237
4. [Paper: Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/languagemodels.pdf%C2%A0)
237
4. [Paper: Language Models are Unsupervised Multitask Learners](https://d4mucfpksywv.cloudfront.net/better-language-models/languagemodels.pdf%C2%A0)
238
5. [GitHub Repository: DocProduct](https://github.com/ash3n/DocProduct#start-of-content)
238
5. [GitHub Repository: DocProduct](https://github.com/ash3n/DocProduct#start-of-content)
239
6. [Applied AI Course](https://appliedaicourse.com)
239
6. [Applied AI Course](https://appliedaicourse.com)
240
7. [Medium Article: Medical Chatbot using BERT and GPT-2](https://suniljammalamadaka.medium.com/medical-chatbot-using-bert-and-gpt2-62f0c973162f)
240
7. [Medium Article: Medical Chatbot using BERT and GPT-2](https://suniljammalamadaka.medium.com/medical-chatbot-using-bert-and-gpt2-62f0c973162f)
241
8. [GitHub Repository: Medical Question Answer Data](https://github.com/LasseRegin/medical-question-answer-data)
241
8. [GitHub Repository: Medical Question Answer Data](https://github.com/LasseRegin/medical-question-answer-data)
242
9. [Hugging Face Model Hub: GPT-2](https://huggingface.co/gpt2)
242
9. [Hugging Face Model Hub: GPT-2](https://huggingface.co/gpt2)
243
10. [GitHub Repository: Streamlit Chat](https://github.com/AI-Yash/st-chat)
243
10. [GitHub Repository: Streamlit Chat](https://github.com/AI-Yash/st-chat)
244
11. [Streamlit Documentation](https://streamlit.io/)
244
11. [Streamlit Documentation](https://streamlit.io/)
245
12. [Streamlit Tutorial: Deploying Streamlit Apps with Docker](https://docs.streamlit.io/knowledge-base/tutorials/deploy/docker)
245
12. [Streamlit Tutorial: Deploying Streamlit Apps with Docker](https://docs.streamlit.io/knowledge-base/tutorials/deploy/docker)
246
13. [ChatterBot Documentation](https://chatterbot.readthedocs.io/en/stable/logic/index.html)
246
13. [ChatterBot Documentation](https://chatterbot.readthedocs.io/en/stable/logic/index.html)
247
14. [Blog Post: 3 Steps to Fix App Memory Leaks](https://blog.streamlit.io/3-steps-to-fix-app-memory-leaks/)
247
14. [Blog Post: 3 Steps to Fix App Memory Leaks](https://blog.streamlit.io/3-steps-to-fix-app-memory-leaks/)
248
15. [Blog Post: Common App Problems & Resource Limits](https://blog.streamlit.io/common-app-problems-resource-limits/)
248
15. [Blog Post: Common App Problems & Resource Limits](https://blog.streamlit.io/common-app-problems-resource-limits/)
249
16. [GitHub Gist: Streamlit Chatbot Example](https://gist.github.com/DSDanielPark/5d34b2f53709a7007b0d3a5e9f23c0a6) (Lightweight and optimized)
249
16. [GitHub Gist: Streamlit Chatbot Example](https://gist.github.com/DSDanielPark/5d34b2f53709a7007b0d3a5e9f23c0a6) (Lightweight and optimized)
250
17. [Databricks Blog: Democratizing Magic: ChatGPT and Open Models](https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html)
250
17. [Databricks Blog: Democratizing Magic: ChatGPT and Open Models](https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html)
251
18. [GitHub Repository: Pyllama](https://github.com/juncongmoo/pyllama)
251
18. [GitHub Repository: Pyllama](https://github.com/juncongmoo/pyllama)