This project implements an algorithm that matches patient data with clinical trials based on eligibility criteria. The solution uses patient data and clinical trial information, processes them using Natural Language Processing (NLP), and generates a list of eligible trials for each patient.
The algorithm performs the following tasks:
1. Loads patient data.
2. Scrapes clinical trial data from clinicaltrials.gov.
3. Matches patients to clinical trials based on inclusion/exclusion criteria.
4. Outputs eligible trials for each patient in a structured format (JSON and Excel).
git clone https://github.com/your-username/clinical-trial-matching.git
cd clinical-trial-matching
It’s a good idea to isolate the project dependencies in a virtual environment.
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
The dependencies are listed in requirements.txt. You can install them by running:
pip install -r requirements.txt
Make sure that you have a version of scispacy and the associated NLP model downloaded:
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.4/en_ner_bc5cdr_md-0.5.4.tar.gz
Make sure you have the following directories:
- Patient Data: Place the patient JSON files in the patient_data/
directory.
- Clinical Trial Data: Place the clinical trial JSON files in the clinical_trials/
directory.
Execute the main script to process the data and generate the output:
python matching_algorithm.py
The script will generate two types of output:
- JSON File: Contains patient-trial matching results.
- Google Sheets (optional): The matching results can also be uploaded to a Google Sheet for easier accessibility.
Example JSON output:
{
"patientId": "P123",
"eligibleTrials": [
{
"trialId": "T001",
"trialName": "Study of XYZ",
"eligibilityCriteriaMet": ["age between 18-65", "diagnosis matches condition X"]
}
]
}
File Structure:
|-- patient_data/ # Folder containing patient JSON files
|-- clinical_trials/ # Folder containing clinical trial JSON files
|-- matching_algorithm.py # Main script to run the algorithm
|-- README.md # Project documentation
|-- requirements.txt # List of dependencies
|-- output/ # Folder for generated output files
To ensure that all functions perform as expected, unit tests can be written using unittest
or pytest
.
Integration testing is necessary to verify the full workflow—loading patient data, processing trials, and generating output.
pytest test_matching_algorithm.py
This project is licensed under the MIT License. See the LICENSE file for details.