|
a/README.md |
|
b/README.md |
|
... |
|
... |
5 |
Our exploration into Social Determinants of Health (SDOH) classification using AI models has led to several insightful findings: |
5 |
Our exploration into Social Determinants of Health (SDOH) classification using AI models has led to several insightful findings: |
6 |
|
6 |
|
7 |
1. Fine-tuned Flan-T5 XL and XXL models exhibit superior performance when compared to the traditional BERT model and various GPT models. |
7 |
1. Fine-tuned Flan-T5 XL and XXL models exhibit superior performance when compared to the traditional BERT model and various GPT models. |
8 |
2. The implementation of synthetic data augmentation during the training phase improves model performance and data efficiency. |
8 |
2. The implementation of synthetic data augmentation during the training phase improves model performance and data efficiency. |
9 |
3. In a test involving synthetic sentences with altered demographic data, the fine-tuned Flan-T5 models consistently outperformed the GPT models in terms of robustness and overall performance. |
9 |
3. In a test involving synthetic sentences with altered demographic data, the fine-tuned Flan-T5 models consistently outperformed the GPT models in terms of robustness and overall performance. |
10 |
 |
10 |
 |
11 |
5. We will make the synthetic training and out-of-domain performance+robustness evaluation datasets available to the broader community for further research and development. |
11 |
5. We will make the synthetic training and out-of-domain performance+robustness evaluation datasets available to the broader community for further research and development. |
12 |
 |
12 |
 |
13 |
|
|
|
14 |
## Models |
13 |
## Models |
15 |
|
14 |
|
16 |
Our research involves the application of two primary models for the classification tasks: |
15 |
Our research involves the application of two primary models for the classification tasks: |
17 |
|
16 |
|
18 |
1. Model classifying the full label set of SDOH. |
17 |
1. Model classifying the full label set of SDOH. |
|
... |
|
... |
37 |
|
36 |
|
38 |
The figure below demostrates the creation process of the sythetic SDoH Human Annotated Demographic Robustness dataset (SHADR) `Partial_Iteration_2_demographic_annotated.csv`. |
37 |
The figure below demostrates the creation process of the sythetic SDoH Human Annotated Demographic Robustness dataset (SHADR) `Partial_Iteration_2_demographic_annotated.csv`. |
39 |
|
38 |
|
40 |
**If you want to evaluate your model on this,** you should first inference on the ***original sentence***, then use the same model to inference on the ***demographic modified sentences*** for robustness comparisons as shown in the figure below. |
39 |
**If you want to evaluate your model on this,** you should first inference on the ***original sentence***, then use the same model to inference on the ***demographic modified sentences*** for robustness comparisons as shown in the figure below. |
41 |
|
40 |
|
42 |
 |
41 |
 |
43 |
|
42 |
|
44 |
- The code and prompts used for synthetic data generation can be found in the Jupyter notebook `synthetic_data_generation_GPT.ipynb`. |
43 |
- The code and prompts used for synthetic data generation can be found in the Jupyter notebook `synthetic_data_generation_GPT.ipynb`. |
45 |
- JSON files that contain the prompts fed into GPT 3.5 Turbo. |
44 |
- JSON files that contain the prompts fed into GPT 3.5 Turbo. |
46 |
|
45 |
|
47 |
## Model Comparison |
46 |
## Model Comparison |