Diff of /readme.md [000000] .. [46c9de]

Switch to unified view

a b/readme.md
1
# Medical Trial Classification System
2
3
[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/downloads/)
4
[![Flask Version](https://img.shields.io/badge/flask-3.0.2-green.svg)](https://flask.palletsprojects.com/)
5
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
6
7
## Overview
8
9
The Medical Trial Classification System is an automated machine learning solution that classifies medical trial descriptions into five disease categories. Currently in partial implementation status, the system aims to reduce the manual effort required in categorizing medical trials.
10
11
### Current Implementation Status
12
13
✅ Core preprocessing pipeline  
14
✅ Basic model implementation  
15
✅ Initial API setup  
16
✅ Basic testing framework  
17
❌ Complete unit test coverage  
18
❌ Advanced preprocessing features  
19
❌ Model optimization  
20
❌ Full system integration testing  
21
22
### Disease Categories
23
- Amyotrophic Lateral Sclerosis (ALS)
24
- Obsessive Compulsive Disorder (OCD)
25
- Parkinson's Disease
26
- Dementia
27
- Scoliosis
28
29
## Project Structure
30
31
```
32
root/
33
├── data/                  # Data storage and processing
34
├── docs/                  # Project documentation
35
├── logs/                  # Application logs
36
├── notebooks/            # Analysis notebooks
37
├── scripts/              # Utility scripts
38
├── src/                  # Source code
39
└── tests/                # Test files
40
```
41
42
### Key Components
43
44
- `src/preprocessing/`: Text preprocessing pipeline
45
- `src/models/`: Model implementation and training
46
- `src/data/`: Data processing and pipeline
47
- `src/utils/`: Utility functions and logging
48
- `tests/`: Test implementations
49
50
## Installation
51
52
1. Clone the repository:
53
```bash
54
git clone [https://github.com/fesarikaya/MedicalTrialClassification]
55
cd MedicalTrialClassification
56
```
57
58
2. Run the environment setup:
59
```bash
60
python environment_setup.py
61
```
62
63
### Requirements
64
65
- Python 3.8+
66
- 8GB+ RAM recommended
67
- Disk space for model storage
68
- Internet connection for package installation
69
70
### Key Dependencies
71
72
- Flask==3.0.2
73
- pandas==2.2.0
74
- scikit-learn==1.4.0
75
- nltk==3.8.1
76
- spacy==3.7.2
77
- pytest==8.0.0
78
79
Full dependencies are listed in `requirements.txt`.
80
81
## Current Performance
82
83
### Model Performance
84
- Best performer: Bagging Classifier
85
  - Accuracy: 50.0%
86
  - F1 Score: 0.490
87
88
### Known Issues
89
90
1. Preprocessing Pipeline
91
   - Performance issues in current implementation
92
   - Medical term standardization needs improvement
93
   - Special character handling requires optimization
94
95
2. Model Performance
96
   - Lower than target accuracy due to preprocessing issues
97
   - Feature engineering needs enhancement
98
   - Model tuning incomplete
99
100
## Usage
101
102
### API Endpoints
103
104
1. Prediction Endpoint:
105
```bash
106
POST /predict
107
Content-Type: application/json
108
{
109
    "description": "Medical trial description text"
110
}
111
```
112
113
2. Health Check:
114
```bash
115
GET /health
116
```
117
118
### Testing
119
120
Basic tests are implemented in the `tests/` directory:
121
- `API_test.py`: API endpoint testing
122
- `model_evaluation_test.py`: Basic model evaluation
123
- Latest test results available in `prediction_test_results.json`
124
125
## Future Work
126
127
1. Preprocessing Enhancements
128
   - Optimize medical term handling
129
   - Improve text normalization
130
   - Enhance special character processing
131
132
2. Model Optimization
133
   - Implement advanced feature engineering
134
   - Optimize model parameters
135
   - Enhance ensemble methods
136
137
3. Testing Completion
138
   - Implement comprehensive unit tests
139
   - Add integration tests
140
   - Complete performance testing
141
142
## Important Notes
143
144
- System is currently in partial implementation status
145
- Use with caution and verify all predictions
146
- Current accuracy is limited
147
- Future updates will address known issues
148
149
## Development Status
150
151
The project is currently incomplete due to deadline constraints. Key pending items include:
152
- Complete unit test coverage
153
- Advanced preprocessing features
154
- Model optimization
155
- Full system integration testing
156
157
## Warning
158
159
⚠️ This system is currently in partial implementation status with known preprocessing issues affecting model performance. Use as an assistance tool only and verify all predictions manually.