|
a/README.md |
|
b/README.md |
1 |
# Diversity in Head and Neck Cancer Clinical Trials |
1 |
# Diversity in Head and Neck Cancer Clinical Trials |
2 |
|
2 |
|
3 |
This repository contains code and data for analyzing diversity in head and neck cancer clinical trials, specifically focusing on the inclusion of non-white participants in these studies. |
3 |
This repository contains code and data for analyzing diversity in head and neck cancer clinical trials, specifically focusing on the inclusion of non-white participants in these studies. |
4 |
|
4 |
|
5 |
## Project Overview |
5 |
## Project Overview |
6 |
|
6 |
|
7 |
Head and neck cancer disproportionately affects certain racial and ethnic groups. This analysis aims to understand factors that contribute to higher diversity in clinical trials and identify patterns that could lead to more inclusive research. |
7 |
Head and neck cancer disproportionately affects certain racial and ethnic groups. This analysis aims to understand factors that contribute to higher diversity in clinical trials and identify patterns that could lead to more inclusive research. |
8 |
|
8 |
|
9 |
### Key Components |
9 |
### Key Components |
10 |
|
10 |
|
11 |
1. **Diversity Metric**: The analysis uses a metric defined as the percentage of non-white participants in each study to measure diversity. |
11 |
1. **Diversity Metric**: The analysis uses a metric defined as the percentage of non-white participants in each study to measure diversity. |
12 |
- Score = (# non-white participants) / (# total participants) × 100 |
12 |
- Score = (# non-white participants) / (# total participants) × 100 |
13 |
- Total participants = # white participants + # non-white participants |
13 |
- Total participants = # white participants + # non-white participants |
14 |
|
14 |
|
15 |
2. **Comparative Analysis**: Studies are categorized into high-diversity (top 20%) and low-diversity (bottom 20%) groups based on this metric. |
15 |
2. **Comparative Analysis**: Studies are categorized into high-diversity (top 20%) and low-diversity (bottom 20%) groups based on this metric. |
16 |
|
16 |
|
17 |
3. **Factor Identification**: Various factors are examined to understand what contributes to more diverse clinical trials. |
17 |
3. **Factor Identification**: Various factors are examined to understand what contributes to more diverse clinical trials. |
18 |
|
18 |
|
19 |
## Analysis Methodology |
19 |
## Analysis Methodology |
20 |
|
20 |
|
21 |
The analysis followed these key steps: |
21 |
The analysis followed these key steps: |
22 |
|
22 |
|
23 |
1. **Data Collection**: Collected data on all head and neck cancer clinical trials from ClinicalTrials.gov |
23 |
1. **Data Collection**: Collected data on all head and neck cancer clinical trials from ClinicalTrials.gov |
24 |
2. **Diversity Scoring**: Computed a diversity score for each trial based on participant demographics |
24 |
2. **Diversity Scoring**: Computed a diversity score for each trial based on participant demographics |
25 |
3. **Stratification**: Identified the top 20th percentile and bottom 20th percentile of trials by diversity |
25 |
3. **Stratification**: Identified the top 20th percentile and bottom 20th percentile of trials by diversity |
26 |
4. **Feature Extraction**: Extracted key features from each clinical trial: |
26 |
4. **Feature Extraction**: Extracted key features from each clinical trial: |
27 |
- **Study Characteristics**: Start/end dates, institutional setting, number of participants, location, etc. |
27 |
- **Study Characteristics**: Start/end dates, institutional setting, number of participants, location, etc. |
28 |
- **Eligibility Criteria**: Detailed analysis of inclusion/exclusion criteria |
28 |
- **Eligibility Criteria**: Detailed analysis of inclusion/exclusion criteria |
29 |
5. **Comparative Analysis**: Compared the distribution of features between high-diversity and low-diversity trials |
29 |
5. **Comparative Analysis**: Compared the distribution of features between high-diversity and low-diversity trials |
30 |
|
30 |
|
31 |
### Eligibility Features Analyzed |
31 |
### Eligibility Features Analyzed |
32 |
|
32 |
|
33 |
The study examined specific eligibility restrictions and their potential impact on diversity: |
33 |
The study examined specific eligibility restrictions and their potential impact on diversity: |
34 |
|
34 |
|
35 |
| Feature | Description | |
35 |
| Feature | Description | |
36 |
|---------|-------------| |
36 |
|---------|-------------| |
37 |
| age_restrict | 0 if the restriction is age>18, 1 for other restrictions (e.g., 18<age<75) | |
37 |
| age_restrict | 0 if the restriction is age>18, 1 for other restrictions (e.g., 18<age<75) | |
38 |
| stage_size | Restrictions on the cancer stage and the size of the tumor | |
38 |
| stage_size | Restrictions on the cancer stage and the size of the tumor | |
39 |
| cancer_site | Restrictions on the cancer site | |
39 |
| cancer_site | Restrictions on the cancer site | |
40 |
| histological_type | Whether the study was limited to SCC (Squamous Cell Carcinoma) or any other type | |
40 |
| histological_type | Whether the study was limited to SCC (Squamous Cell Carcinoma) or any other type | |
41 |
| performance_score | Restrictions on performance score (e.g., ECOG performance) | |
41 |
| performance_score | Restrictions on performance score (e.g., ECOG performance) | |
42 |
| comorbidities | Restrictions on comorbidities | |
42 |
| comorbidities | Restrictions on comorbidities | |
43 |
| hx_of_tt | Restrictions on treatment history for cancer | |
43 |
| hx_of_tt | Restrictions on treatment history for cancer | |
44 |
| lab_values | Restrictions on lab test values | |
44 |
| lab_values | Restrictions on lab test values | |
45 |
| pregnancy_or_contraception | Restrictions on pregnancy or particular contraceptives | |
45 |
| pregnancy_or_contraception | Restrictions on pregnancy or particular contraceptives | |
46 |
| misc | Other restrictions (e.g., smoking status, ethnicity requirements) | |
46 |
| misc | Other restrictions (e.g., smoking status, ethnicity requirements) | |
47 |
| eligibility_score | Sum of all restriction scores above | |
47 |
| eligibility_score | Sum of all restriction scores above | |
48 |
|
48 |
|
49 |
### General Features Analyzed |
49 |
### General Features Analyzed |
50 |
|
50 |
|
51 |
The analysis also included general study characteristics: |
51 |
The analysis also included general study characteristics: |
52 |
|
52 |
|
53 |
1. Study start date and end date |
53 |
1. Study start date and end date |
54 |
2. Single vs. multi-institutional study |
54 |
2. Single vs. multi-institutional study |
55 |
3. Stringency in eligibility criteria (composite score) |
55 |
3. Stringency in eligibility criteria (composite score) |
56 |
4. Modality (Drug/Radiation/Biologic/Combination) |
56 |
4. Modality (Drug/Radiation/Biologic/Combination) |
57 |
5. Number of participants |
57 |
5. Number of participants |
58 |
6. Geographic location |
58 |
6. Geographic location |
59 |
7. Male/female ratio |
59 |
7. Male/female ratio |
60 |
8. Trial type (Primary/Palliative/Recurrent/Metastatic) |
60 |
8. Trial type (Primary/Palliative/Recurrent/Metastatic) |
61 |
|
61 |
|
62 |
## Repository Structure |
62 |
## Repository Structure |
63 |
|
63 |
|
64 |
``` |
64 |
``` |
65 |
├── README.md # Project documentation |
65 |
├── README.md # Project documentation |
66 |
├── src/ # Source code directory |
66 |
├── src/ # Source code directory |
67 |
│ ├── data_processing.py # Functions for data loading and preprocessing |
67 |
│ ├── data_processing.py # Functions for data loading and preprocessing |
68 |
│ ├── analysis.py # Functions for statistical analysis |
68 |
│ ├── analysis.py # Functions for statistical analysis |
69 |
│ ├── visualization.py # Functions for creating visualizations |
69 |
│ ├── visualization.py # Functions for creating visualizations |
70 |
│ └── main.py # Main script that orchestrates the analysis |
70 |
│ └── main.py # Main script that orchestrates the analysis |
71 |
├── plots/ # Generated visualizations |
71 |
├── plots/ # Generated visualizations |
72 |
│ ├── box_plot_eligbility_score_diverse_vs_non_diverse.png |
72 |
│ ├── box_plot_eligbility_score_diverse_vs_non_diverse.png |
73 |
│ ├── box_plot_num_participants_top_vs_bottom.png |
73 |
│ ├── box_plot_num_participants_top_vs_bottom.png |
74 |
│ ├── distribution_age_restrict.png |
74 |
│ ├── distribution_age_restrict.png |
75 |
│ ├── distribution_comorbidities.png |
75 |
│ ├── distribution_comorbidities.png |
76 |
│ ├── distribution_histological_type.png |
76 |
│ ├── distribution_histological_type.png |
77 |
│ ├── distribution_hx_of_tt.png |
77 |
│ ├── distribution_hx_of_tt.png |
78 |
│ ├── distribution_is_single_institution.png |
78 |
│ ├── distribution_is_single_institution.png |
79 |
│ ├── distribution_lab_values.png |
79 |
│ ├── distribution_lab_values.png |
80 |
│ ├── distribution_misc.png |
80 |
│ ├── distribution_misc.png |
81 |
│ ├── distribution_num_participants_top_vs_bottom_studies_strat_gender.png |
81 |
│ ├── distribution_num_participants_top_vs_bottom_studies_strat_gender.png |
82 |
│ ├── distribution_performance_score.png |
82 |
│ ├── distribution_performance_score.png |
83 |
│ ├── distribution_site.png |
83 |
│ ├── distribution_site.png |
84 |
│ ├── distribution_stage_size.png |
84 |
│ ├── distribution_stage_size.png |
85 |
│ └── geo_distribution.png |
85 |
│ └── geo_distribution.png |
86 |
├── top_20_studies.csv # Dataset of top 20% diverse studies |
86 |
├── top_20_studies.csv # Dataset of top 20% diverse studies |
87 |
├── bottom_20_studies.csv # Dataset of bottom 20% diverse studies |
87 |
├── bottom_20_studies.csv # Dataset of bottom 20% diverse studies |
88 |
├── Diversity in head and neck clinical trials - plots (2).pdf # PDF with plot descriptions |
88 |
├── Diversity in head and neck clinical trials - plots (2).pdf # PDF with plot descriptions |
89 |
├── Analysis.ipynb # Jupyter notebook with initial analysis |
89 |
├── Analysis.ipynb # Jupyter notebook with initial analysis |
90 |
└── Analysis top20 vs bottom20.ipynb # Jupyter notebook with comparative analysis |
90 |
└── Analysis top20 vs bottom20.ipynb # Jupyter notebook with comparative analysis |
91 |
``` |
91 |
``` |
92 |
|
92 |
|
93 |
## Key Findings |
93 |
|
94 |
|
94 |
## Data Source |
95 |
### 1. Eligibility Criteria |
95 |
|
96 |
|
96 |
The data for this analysis was extracted from [ClinicalTrials.gov](https://clinicaltrials.gov/), focusing on head and neck cancer clinical trials conducted in the United States. Only studies that reported race information were included in the analysis. |
97 |
The analysis of eligibility criteria revealed that more diverse studies tend to have fewer restrictive criteria: |
97 |
|
98 |
|
98 |
|
99 |
 |
99 |
## Conclusions |
100 |
|
100 |
|
101 |
*The above plot shows the distribution of eligibility scores for diverse vs. non-diverse studies. Higher scores indicate more restrictive eligibility criteria.* |
101 |
The analysis identified several factors that are associated with more diverse head and neck cancer clinical trials: |
102 |
|
102 |
|
103 |
### 2. Geographic Distribution |
103 |
1. **Less restrictive eligibility criteria**: Studies with fewer restrictions tend to have more diverse participation. |
104 |
|
104 |
- Specific criteria that appear to impact diversity include age restrictions, performance score requirements, and histological type restrictions. |
105 |
The geographic location of studies plays a significant role in diversity: |
105 |
|
106 |
|
106 |
2. **Geographic location**: Studies in areas with more diverse populations have higher diversity scores. |
107 |
 |
107 |
|
108 |
|
108 |
3. **Institutional setting**: Different types of institutions show varying levels of success in recruiting diverse participants. |
109 |
*This map shows the locations of the top and bottom diverse studies, with color indicating the population diversity score of each location.* |
109 |
|
110 |
|
110 |
4. **Study size**: There is a relationship between the number of participants and diversity. |
111 |
### 3. Participant Demographics |
111 |
|
112 |
|
112 |
These findings suggest potential strategies for improving diversity in future clinical trials, such as revisiting eligibility criteria, focusing on inclusive recruitment strategies, and considering geographic factors when planning trial sites. |
113 |
Studies with higher diversity had different participant demographics: |
113 |
|
114 |
|
114 |
## Running the Analysis |
115 |
 |
115 |
|
116 |
|
116 |
### Prerequisites |
117 |
*This plot shows the distribution of male and female participants in top vs. bottom diverse studies.* |
117 |
|
118 |
|
118 |
- Python 3.7+ |
119 |
### 4. Eligibility Restrictions |
119 |
- Required packages: pandas, numpy, plotly, scipy |
120 |
|
120 |
|
121 |
Specific eligibility criteria had different prevalence in diverse vs. non-diverse studies: |
121 |
### Usage |
122 |
|
122 |
|
123 |
- **Age Restrictions**: |
123 |
```bash |
124 |
|
124 |
# Run the main analysis script |
125 |
 |
125 |
python src/main.py |
126 |
|
126 |
``` |
127 |
*This plot compares the prevalence of age restrictions beyond the standard adult age (18+) between high and low diversity studies.* |
127 |
|
128 |
|
128 |
Or explore the Jupyter notebooks for an interactive analysis experience: |
129 |
- **Histological Type Restrictions**: |
129 |
|
130 |
|
130 |
```bash |
131 |
 |
131 |
jupyter notebook "Analysis.ipynb" |
132 |
|
132 |
jupyter notebook "Analysis top20 vs bottom20.ipynb" |
133 |
*This plot compares the prevalence of restrictions on cancer histological type (e.g., SCC only) between high and low diversity studies.* |
133 |
``` |
134 |
|
134 |
|
135 |
- **Performance Score Restrictions**: |
135 |
## License |
136 |
|
136 |
|
137 |
 |
137 |
This project is licensed under the MIT License - see the LICENSE file for details. |
138 |
|
138 |
|
139 |
*This plot compares the prevalence of ECOG or other performance score restrictions between high and low diversity studies.* |
139 |
## Acknowledgements |
140 |
|
140 |
|
141 |
- **Comorbidity Restrictions**: |
|
|
142 |
|
|
|
143 |
 |
|
|
144 |
|
|
|
145 |
*This plot compares the prevalence of comorbidity restrictions between high and low diversity studies.* |
|
|
146 |
|
|
|
147 |
- **Laboratory Value Restrictions**: |
|
|
148 |
|
|
|
149 |
 |
|
|
150 |
|
|
|
151 |
*This plot compares the prevalence of laboratory value restrictions between high and low diversity studies.* |
|
|
152 |
|
|
|
153 |
- **Stage/Size Restrictions**: |
|
|
154 |
|
|
|
155 |
 |
|
|
156 |
|
|
|
157 |
*This plot compares the prevalence of tumor stage or size restrictions between high and low diversity studies.* |
|
|
158 |
|
|
|
159 |
- **Site Restrictions**: |
|
|
160 |
|
|
|
161 |
 |
|
|
162 |
|
|
|
163 |
*This plot compares the prevalence of cancer site restrictions between high and low diversity studies.* |
|
|
164 |
|
|
|
165 |
- **History of Treatment Restrictions**: |
|
|
166 |
|
|
|
167 |
 |
|
|
168 |
|
|
|
169 |
*This plot compares the prevalence of previous treatment history restrictions between high and low diversity studies.* |
|
|
170 |
|
|
|
171 |
- **Miscellaneous Restrictions**: |
|
|
172 |
|
|
|
173 |
 |
|
|
174 |
|
|
|
175 |
*This plot compares the prevalence of other restrictions (such as smoking status or ethnicity requirements) between high and low diversity studies.* |
|
|
176 |
|
|
|
177 |
- **Institutional Setting**: |
|
|
178 |
|
|
|
179 |
 |
|
|
180 |
|
|
|
181 |
*This plot shows the distribution of single-institution vs. multi-institution studies among diverse and non-diverse trials.* |
|
|
182 |
|
|
|
183 |
## Data Source |
|
|
184 |
|
|
|
185 |
The data for this analysis was extracted from [ClinicalTrials.gov](https://clinicaltrials.gov/), focusing on head and neck cancer clinical trials conducted in the United States. Only studies that reported race information were included in the analysis. |
|
|
186 |
|
|
|
187 |
|
|
|
188 |
## Conclusions |
|
|
189 |
|
|
|
190 |
The analysis identified several factors that are associated with more diverse head and neck cancer clinical trials: |
|
|
191 |
|
|
|
192 |
1. **Less restrictive eligibility criteria**: Studies with fewer restrictions tend to have more diverse participation. |
|
|
193 |
- Specific criteria that appear to impact diversity include age restrictions, performance score requirements, and histological type restrictions. |
|
|
194 |
|
|
|
195 |
2. **Geographic location**: Studies in areas with more diverse populations have higher diversity scores. |
|
|
196 |
|
|
|
197 |
3. **Institutional setting**: Different types of institutions show varying levels of success in recruiting diverse participants. |
|
|
198 |
|
|
|
199 |
4. **Study size**: There is a relationship between the number of participants and diversity. |
|
|
200 |
|
|
|
201 |
These findings suggest potential strategies for improving diversity in future clinical trials, such as revisiting eligibility criteria, focusing on inclusive recruitment strategies, and considering geographic factors when planning trial sites. |
|
|
202 |
|
|
|
203 |
## Running the Analysis |
|
|
204 |
|
|
|
205 |
### Prerequisites |
|
|
206 |
|
|
|
207 |
- Python 3.7+ |
|
|
208 |
- Required packages: pandas, numpy, plotly, scipy |
|
|
209 |
|
|
|
210 |
### Usage |
|
|
211 |
|
|
|
212 |
```bash |
|
|
213 |
# Run the main analysis script |
|
|
214 |
python src/main.py |
|
|
215 |
``` |
|
|
216 |
|
|
|
217 |
Or explore the Jupyter notebooks for an interactive analysis experience: |
|
|
218 |
|
|
|
219 |
```bash |
|
|
220 |
jupyter notebook "Analysis.ipynb" |
|
|
221 |
jupyter notebook "Analysis top20 vs bottom20.ipynb" |
|
|
222 |
``` |
|
|
223 |
|
|
|
224 |
## License |
|
|
225 |
|
|
|
226 |
This project is licensed under the MIT License - see the LICENSE file for details. |
|
|
227 |
|
|
|
228 |
## Acknowledgements |
|
|
229 |
|
|
|
230 |
This analysis was conducted as part of a research project examining diversity and inclusion in clinical trials for head and neck cancer. |
141 |
This analysis was conducted as part of a research project examining diversity and inclusion in clinical trials for head and neck cancer. |