SDoH / Git / Diff of /README.md

Models:

philipB/

SDoH

Downloads: 1

Diff of /README.md [6a771f] .. [7c8370]

Switch to unified view


...
Our exploration into Social Determinants of Health (SDOH) classification using AI models has led to several insightful findings:

1. Fine-tuned Flan-T5 XL and XXL models exhibit superior performance when compared to the traditional BERT model and various GPT models.
2. The implementation of synthetic data augmentation during the training phase improves model performance and data efficiency.
3. In a test involving synthetic sentences with altered demographic data, the fine-tuned Flan-T5 models consistently outperformed the GPT models in terms of robustness and overall performance.
   ![fig1](https://github.com/AIM-Harvard/SDoH/blob/main/resource/fig1.png?raw=true)
5. We will make the synthetic training and out-of-domain performance+robustness evaluation datasets available to the broader community for further research and development.
   ![fig2](https://github.com/AIM-Harvard/SDoH/blob/main/resource/fig3.png?raw=true)

## Models

Our research involves the application of two primary models for the classification tasks:

1. Model classifying the full label set of SDOH.
...

The figure below demostrates the creation process of the sythetic SDoH Human Annotated Demographic Robustness dataset (SHADR) `Partial_Iteration_2_demographic_annotated.csv`.

**If you want to evaluate your model on this,** you should first inference on the ***original sentence***, then use the same model to inference on the ***demographic modified sentences*** for robustness comparisons as shown in the figure below.

![data flow Diagram](https://github.com/AIM-Harvard/SDoH/blob/main/resource/fig2.png?raw=true)

- The code and prompts used for synthetic data generation can be found in the Jupyter notebook `synthetic_data_generation_GPT.ipynb`.
- JSON files that contain the prompts fed into GPT 3.5 Turbo.

## Model Comparison

	a/README.md		b/README.md
	...		...
5	Our exploration into Social Determinants of Health (SDOH) classification using AI models has led to several insightful findings:	5	Our exploration into Social Determinants of Health (SDOH) classification using AI models has led to several insightful findings:
6		6
7	1. Fine-tuned Flan-T5 XL and XXL models exhibit superior performance when compared to the traditional BERT model and various GPT models.	7	1. Fine-tuned Flan-T5 XL and XXL models exhibit superior performance when compared to the traditional BERT model and various GPT models.
8	2. The implementation of synthetic data augmentation during the training phase improves model performance and data efficiency.	8	2. The implementation of synthetic data augmentation during the training phase improves model performance and data efficiency.
9	3. In a test involving synthetic sentences with altered demographic data, the fine-tuned Flan-T5 models consistently outperformed the GPT models in terms of robustness and overall performance.	9	3. In a test involving synthetic sentences with altered demographic data, the fine-tuned Flan-T5 models consistently outperformed the GPT models in terms of robustness and overall performance.
10	![fig1](https://github.com/AIM-Harvard/SDoH/blob/main/resource/fig1.png)	10	![fig1](https://github.com/AIM-Harvard/SDoH/blob/main/resource/fig1.png?raw=true)
11	5. We will make the synthetic training and out-of-domain performance+robustness evaluation datasets available to the broader community for further research and development.	11	5. We will make the synthetic training and out-of-domain performance+robustness evaluation datasets available to the broader community for further research and development.
12	![fig2](https://github.com/AIM-Harvard/SDoH/blob/main/resource/fig3.png)	12	![fig2](https://github.com/AIM-Harvard/SDoH/blob/main/resource/fig3.png?raw=true)
13
14	## Models	13	## Models
15		14
16	Our research involves the application of two primary models for the classification tasks:	15	Our research involves the application of two primary models for the classification tasks:
17		16
18	1. Model classifying the full label set of SDOH.	17	1. Model classifying the full label set of SDOH.
	...		...
37		36
38	The figure below demostrates the creation process of the sythetic SDoH Human Annotated Demographic Robustness dataset (SHADR) `Partial_Iteration_2_demographic_annotated.csv`.	37	The figure below demostrates the creation process of the sythetic SDoH Human Annotated Demographic Robustness dataset (SHADR) `Partial_Iteration_2_demographic_annotated.csv`.
39		38
40	If you want to evaluate your model on this, you should first inference on the *original sentence, then use the same model to inference on the demographic modified sentences* for robustness comparisons as shown in the figure below.	39	If you want to evaluate your model on this, you should first inference on the *original sentence, then use the same model to inference on the demographic modified sentences* for robustness comparisons as shown in the figure below.
41		40
42	![data flow Diagram](https://github.com/AIM-Harvard/SDoH/blob/main/resource/fig2.png)	41	![data flow Diagram](https://github.com/AIM-Harvard/SDoH/blob/main/resource/fig2.png?raw=true)
43		42
44	- The code and prompts used for synthetic data generation can be found in the Jupyter notebook `synthetic_data_generation_GPT.ipynb`.	43	- The code and prompts used for synthetic data generation can be found in the Jupyter notebook `synthetic_data_generation_GPT.ipynb`.
45	- JSON files that contain the prompts fed into GPT 3.5 Turbo.	44	- JSON files that contain the prompts fed into GPT 3.5 Turbo.
46		45
47	## Model Comparison	46	## Model Comparison