Card

Diversity in Head and Neck Cancer Clinical Trials

This repository contains code and data for analyzing diversity in head and neck cancer clinical trials, specifically focusing on the inclusion of non-white participants in these studies.

Project Overview

Head and neck cancer disproportionately affects certain racial and ethnic groups. This analysis aims to understand factors that contribute to higher diversity in clinical trials and identify patterns that could lead to more inclusive research.

Key Components

  1. Diversity Metric: The analysis uses a metric defined as the percentage of non-white participants in each study to measure diversity.
  2. Score = (# non-white participants) / (# total participants) × 100
  3. Total participants = # white participants + # non-white participants

  4. Comparative Analysis: Studies are categorized into high-diversity (top 20%) and low-diversity (bottom 20%) groups based on this metric.

  5. Factor Identification: Various factors are examined to understand what contributes to more diverse clinical trials.

Analysis Methodology

The analysis followed these key steps:

  1. Data Collection: Collected data on all head and neck cancer clinical trials from ClinicalTrials.gov
  2. Diversity Scoring: Computed a diversity score for each trial based on participant demographics
  3. Stratification: Identified the top 20th percentile and bottom 20th percentile of trials by diversity
  4. Feature Extraction: Extracted key features from each clinical trial:
  5. Study Characteristics: Start/end dates, institutional setting, number of participants, location, etc.
  6. Eligibility Criteria: Detailed analysis of inclusion/exclusion criteria
  7. Comparative Analysis: Compared the distribution of features between high-diversity and low-diversity trials

Eligibility Features Analyzed

The study examined specific eligibility restrictions and their potential impact on diversity:

Feature Description
age_restrict 0 if the restriction is age>18, 1 for other restrictions (e.g., 18<age<75)
stage_size Restrictions on the cancer stage and the size of the tumor
cancer_site Restrictions on the cancer site
histological_type Whether the study was limited to SCC (Squamous Cell Carcinoma) or any other type
performance_score Restrictions on performance score (e.g., ECOG performance)
comorbidities Restrictions on comorbidities
hx_of_tt Restrictions on treatment history for cancer
lab_values Restrictions on lab test values
pregnancy_or_contraception Restrictions on pregnancy or particular contraceptives
misc Other restrictions (e.g., smoking status, ethnicity requirements)
eligibility_score Sum of all restriction scores above

General Features Analyzed

The analysis also included general study characteristics:

  1. Study start date and end date
  2. Single vs. multi-institutional study
  3. Stringency in eligibility criteria (composite score)
  4. Modality (Drug/Radiation/Biologic/Combination)
  5. Number of participants
  6. Geographic location
  7. Male/female ratio
  8. Trial type (Primary/Palliative/Recurrent/Metastatic)

Repository Structure

├── README.md                        # Project documentation
├── src/                             # Source code directory
   ├── data_processing.py           # Functions for data loading and preprocessing
   ├── analysis.py                  # Functions for statistical analysis
   ├── visualization.py             # Functions for creating visualizations
   └── main.py                      # Main script that orchestrates the analysis
├── plots/                           # Generated visualizations
   ├── box_plot_eligbility_score_diverse_vs_non_diverse.png
   ├── box_plot_num_participants_top_vs_bottom.png
   ├── distribution_age_restrict.png
   ├── distribution_comorbidities.png
   ├── distribution_histological_type.png
   ├── distribution_hx_of_tt.png
   ├── distribution_is_single_institution.png
   ├── distribution_lab_values.png
   ├── distribution_misc.png
   ├── distribution_num_participants_top_vs_bottom_studies_strat_gender.png
   ├── distribution_performance_score.png
   ├── distribution_site.png
   ├── distribution_stage_size.png
   └── geo_distribution.png
├── top_20_studies.csv               # Dataset of top 20% diverse studies
├── bottom_20_studies.csv            # Dataset of bottom 20% diverse studies
├── Diversity in head and neck clinical trials - plots (2).pdf # PDF with plot descriptions
├── Analysis.ipynb                   # Jupyter notebook with initial analysis
└── Analysis top20 vs bottom20.ipynb # Jupyter notebook with comparative analysis

Data Source

The data for this analysis was extracted from ClinicalTrials.gov, focusing on head and neck cancer clinical trials conducted in the United States. Only studies that reported race information were included in the analysis.

Conclusions

The analysis identified several factors that are associated with more diverse head and neck cancer clinical trials:

  1. Less restrictive eligibility criteria: Studies with fewer restrictions tend to have more diverse participation.
  2. Specific criteria that appear to impact diversity include age restrictions, performance score requirements, and histological type restrictions.

  3. Geographic location: Studies in areas with more diverse populations have higher diversity scores.

  4. Institutional setting: Different types of institutions show varying levels of success in recruiting diverse participants.

  5. Study size: There is a relationship between the number of participants and diversity.

These findings suggest potential strategies for improving diversity in future clinical trials, such as revisiting eligibility criteria, focusing on inclusive recruitment strategies, and considering geographic factors when planning trial sites.

Running the Analysis

Prerequisites

  • Python 3.7+
  • Required packages: pandas, numpy, plotly, scipy

Usage

# Run the main analysis script
python src/main.py

Or explore the Jupyter notebooks for an interactive analysis experience:

jupyter notebook "Analysis.ipynb"
jupyter notebook "Analysis top20 vs bottom20.ipynb"

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

This analysis was conducted as part of a research project examining diversity and inclusion in clinical trials for head and neck cancer.