Diff of /README.md [000000] .. [9dbc52]

Switch to unified view

a b/README.md
1
# Clinical Trial Matching Algorithm
2
3
This project implements an algorithm that matches patient data with clinical trials based on eligibility criteria. The solution uses patient data and clinical trial information, processes them using Natural Language Processing (NLP), and generates a list of eligible trials for each patient. 
4
5
## Table of Contents
6
- [Project Overview](#project-overview)
7
- [Features](#features)
8
- [Installation](#installation)
9
- [Data Sources](#data-sources)
10
- [Usage](#usage)
11
- [File Structure](#file-structure)
12
- [Testing](#testing)
13
- [License](#license)
14
15
## Project Overview
16
The algorithm performs the following tasks:
17
1. Loads patient data.
18
2. Scrapes clinical trial data from [clinicaltrials.gov](https://clinicaltrials.gov).
19
3. Matches patients to clinical trials based on inclusion/exclusion criteria.
20
4. Outputs eligible trials for each patient in a structured format (JSON and Excel).
21
22
## Features
23
- **Patient Data Matching**: Automatically matches patients to clinical trials based on age, diagnosis, gender, and other factors.
24
- **NLP Processing**: Uses spaCy models for processing biomedical data and extracting key information from clinical trials.
25
- **Concurrent Processing**: Uses parallel processing to handle multiple patients and trials efficiently.
26
- **Comprehensive Output**: Generates output in both JSON and Google Sheets formats for easy review.
27
28
## Installation
29
30
### 1. Clone the repository
31
```bash
32
git clone https://github.com/your-username/clinical-trial-matching.git
33
cd clinical-trial-matching
34
```
35
36
### 2. Create a Python virtual environment
37
It’s a good idea to isolate the project dependencies in a virtual environment.
38
```bash
39
python3 -m venv venv
40
source venv/bin/activate  # On Windows: venv\Scripts\activate
41
```
42
43
### 3. Install the necessary dependencies
44
The dependencies are listed in requirements.txt. You can install them by running:
45
```bash
46
pip install -r requirements.txt
47
```
48
Make sure that you have a version of scispacy and the associated NLP model downloaded:
49
```bash
50
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.4/en_ner_bc5cdr_md-0.5.4.tar.gz
51
```
52
## Data Sources
53
54
- **Patient Data**: Downloaded from [Synthea](https://synthea.mitre.org/downloads). This project assumes the patient data is already formatted in JSON.
55
- **Clinical Trials**: Scraped from [clinicaltrials.gov](https://clinicaltrials.gov), focusing on actively recruiting trials.
56
57
## Usage
58
59
### 1. Prepare Input Data
60
Make sure you have the following directories:
61
- **Patient Data**: Place the patient JSON files in the `patient_data/` directory.
62
- **Clinical Trial Data**: Place the clinical trial JSON files in the `clinical_trials/` directory.
63
64
### 2. Run the matching algorithm
65
Execute the main script to process the data and generate the output:
66
```bash
67
python matching_algorithm.py
68
```
69
### 3. Output
70
The script will generate two types of output:
71
- **JSON File**: Contains patient-trial matching results.
72
- **Google Sheets (optional)**: The matching results can also be uploaded to a Google Sheet for easier accessibility.
73
74
Example JSON output:
75
```bash
76
{
77
  "patientId": "P123",
78
  "eligibleTrials": [
79
    {
80
      "trialId": "T001",
81
      "trialName": "Study of XYZ",
82
      "eligibilityCriteriaMet": ["age between 18-65", "diagnosis matches condition X"]
83
    }
84
  ]
85
}
86
```
87
File Structure:
88
```bash
89
|-- patient_data/                    # Folder containing patient JSON files
90
|-- clinical_trials/                 # Folder containing clinical trial JSON files
91
|-- matching_algorithm.py            # Main script to run the algorithm
92
|-- README.md                        # Project documentation
93
|-- requirements.txt                 # List of dependencies
94
|-- output/                          # Folder for generated output files
95
```
96
## Testing
97
98
### 1. Unit Testing
99
To ensure that all functions perform as expected, unit tests can be written using `unittest` or `pytest`.
100
101
### 2. Integration Testing
102
Integration testing is necessary to verify the full workflow—loading patient data, processing trials, and generating output.
103
104
### Example of Running Unit Tests:
105
```bash
106
pytest test_matching_algorithm.py
107
```
108
### License
109
This project is licensed under the MIT License. See the LICENSE file for details.
110
111
112