[af3e0d]: / README.md

Download this file

160 lines (120 with data), 4.4 kB

Medical Trial Classification System

Python Version
Flask Version
Code style: black

Overview

The Medical Trial Classification System is an automated machine learning solution that classifies medical trial descriptions into five disease categories. Currently in partial implementation status, the system aims to reduce the manual effort required in categorizing medical trials.

Current Implementation Status

✅ Core preprocessing pipeline
✅ Basic model implementation
✅ Initial API setup
✅ Basic testing framework
❌ Complete unit test coverage
❌ Advanced preprocessing features
❌ Model optimization
❌ Full system integration testing

Disease Categories

  • Amyotrophic Lateral Sclerosis (ALS)
  • Obsessive Compulsive Disorder (OCD)
  • Parkinson's Disease
  • Dementia
  • Scoliosis

Project Structure

root/
├── data/                  # Data storage and processing
├── docs/                  # Project documentation
├── logs/                  # Application logs
├── notebooks/            # Analysis notebooks
├── scripts/              # Utility scripts
├── src/                  # Source code
└── tests/                # Test files

Key Components

  • src/preprocessing/: Text preprocessing pipeline
  • src/models/: Model implementation and training
  • src/data/: Data processing and pipeline
  • src/utils/: Utility functions and logging
  • tests/: Test implementations

Installation

  1. Clone the repository:
git clone [https://github.com/fesarikaya/MedicalTrialClassification]
cd MedicalTrialClassification
  1. Run the environment setup:
python environment_setup.py

Requirements

  • Python 3.8+
  • 8GB+ RAM recommended
  • Disk space for model storage
  • Internet connection for package installation

Key Dependencies

  • Flask==3.0.2
  • pandas==2.2.0
  • scikit-learn==1.4.0
  • nltk==3.8.1
  • spacy==3.7.2
  • pytest==8.0.0

Full dependencies are listed in requirements.txt.

Current Performance

Model Performance

  • Best performer: Bagging Classifier
  • Accuracy: 50.0%
  • F1 Score: 0.490

Known Issues

  1. Preprocessing Pipeline
  2. Performance issues in current implementation
  3. Medical term standardization needs improvement
  4. Special character handling requires optimization

  5. Model Performance

  6. Lower than target accuracy due to preprocessing issues
  7. Feature engineering needs enhancement
  8. Model tuning incomplete

Usage

API Endpoints

  1. Prediction Endpoint:
POST /predict
Content-Type: application/json
{
    "description": "Medical trial description text"
}
  1. Health Check:
GET /health

Testing

Basic tests are implemented in the tests/ directory:
- API_test.py: API endpoint testing
- model_evaluation_test.py: Basic model evaluation
- Latest test results available in prediction_test_results.json

Future Work

  1. Preprocessing Enhancements
  2. Optimize medical term handling
  3. Improve text normalization
  4. Enhance special character processing

  5. Model Optimization

  6. Implement advanced feature engineering
  7. Optimize model parameters
  8. Enhance ensemble methods

  9. Testing Completion

  10. Implement comprehensive unit tests
  11. Add integration tests
  12. Complete performance testing

Important Notes

  • System is currently in partial implementation status
  • Use with caution and verify all predictions
  • Current accuracy is limited
  • Future updates will address known issues

Development Status

The project is currently incomplete due to deadline constraints. Key pending items include:
- Complete unit test coverage
- Advanced preprocessing features
- Model optimization
- Full system integration testing

Warning

⚠️ This system is currently in partial implementation status with known preprocessing issues affecting model performance. Use as an assistance tool only and verify all predictions manually.