The Medical Trial Classification System is an automated machine learning solution that classifies medical trial descriptions into five disease categories. Currently in partial implementation status, the system aims to reduce the manual effort required in categorizing medical trials.
✅ Core preprocessing pipeline
✅ Basic model implementation
✅ Initial API setup
✅ Basic testing framework
❌ Complete unit test coverage
❌ Advanced preprocessing features
❌ Model optimization
❌ Full system integration testing
root/
├── data/ # Data storage and processing
├── docs/ # Project documentation
├── logs/ # Application logs
├── notebooks/ # Analysis notebooks
├── scripts/ # Utility scripts
├── src/ # Source code
└── tests/ # Test files
src/preprocessing/
: Text preprocessing pipelinesrc/models/
: Model implementation and trainingsrc/data/
: Data processing and pipelinesrc/utils/
: Utility functions and loggingtests/
: Test implementationsgit clone [https://github.com/fesarikaya/MedicalTrialClassification]
cd MedicalTrialClassification
python environment_setup.py
Full dependencies are listed in requirements.txt
.
Special character handling requires optimization
Model Performance
POST /predict
Content-Type: application/json
{
"description": "Medical trial description text"
}
GET /health
Basic tests are implemented in the tests/
directory:
- API_test.py
: API endpoint testing
- model_evaluation_test.py
: Basic model evaluation
- Latest test results available in prediction_test_results.json
Enhance special character processing
Model Optimization
Enhance ensemble methods
Testing Completion
The project is currently incomplete due to deadline constraints. Key pending items include:
- Complete unit test coverage
- Advanced preprocessing features
- Model optimization
- Full system integration testing
⚠️ This system is currently in partial implementation status with known preprocessing issues affecting model performance. Use as an assistance tool only and verify all predictions manually.