Card

Note: This project is not 100% open source. Only part of coding developed by owner of this github account is made public. To view the original GitLab repo (http://pacegitlab.dhe.duke.edu/dihi/2019_rfa/adult_decompensation.git), please apply access to Duke PACE machine https://pace.ori.duke.edu/.

Adult Decompensation (In progress!)

This project aims to initialize machine learning models for predicting adult inpatients' decompensation (ICU admission, mortality, RRT events, etc) in real time. Most preliminary work before building models includes data cleaning, data visualization, data quality assurance and data manipulation etc. The ultimate goal is to reduce patients' deterioration and standardize hospital response protocols.

Table of contents

Architecture

Directory tree along with functionality of each folder(or file) is summarized as follows (click the arrow to expand folders):

Code
DataPrep         cohort    //codes for cohort generation
        features    //codes for pulling and cleaning data elements
        outcome    //codes for querying and labelling outcomes
        pull_data    //pull useful data from raw db file
        adt_transfer.py    //create transfer table and output a csv file
        adt_transfer.sql    //transfer table sql query
    db    //codes for creating project database and importing data into the database
Model
v1.0    //version 1.0 (24-hour prediction window)         design_matrix
        News    //python package for implementing News(National Early Warning Score)
        visualization    //model visualization
        model_utils.py    //model utils python package
        run_ann.ipynb
        run_logistic_regression.py
        run_news.py
        run_random_forest.ipynb
        run_xgboost.py
    ockham    //unit conversion package
    utils    //utils python package (db utils, dataframe utils, etc)
Data
    db    //project database file(s)
    metadata
Modeling
v1.0         design_matrix    //design matrix file(s)
        Output    //model output data
Processed         cohort
        features
        outcome
        adult_decomp_adt_transfer.csv
    Raw    //project raw data subset from datapipeline
Docs    //project documentation and materials
Project         code map_v1.xlsx     //outlines the code and associated data files for "start-to-finish" process of data curation
        code map_v2.xlsx    
        code map_supplement.xlsx     //outlines supporting code and data files for feature engineering, modeling, etc
        code map_supplement_v2.xlsx    
        literature_review.pdf
        Perspectives Piece.docx
    Slides    //presentation slides for project milestones
Output    //project output files, figures, etc
Figures    //data visualization figures         cohort    //visualization figures for cohort statistics
        features    //visualization figures for features quality assurance
        Model    //visualization figures for model performance
    gap_analysis    //gap analysis output

.gitignore

README.md

Getting Started

Instructions on setting up the project locally.

Prerequisites

List of dependencies required for the project:
  • Python 3.7.3
  • SQLite3 2.6.0
  • Git 2.14.1
  • GNU Awk 4.1.4
Additional python packages required:
  • NumPy 1.16.2
  • pandas 0.24.2
  • TensorFlow 1.14.0
  • Keras 2.2.4
  • scikit-learn 0.21.2
  • XGBoost 0.90
  • imbalanced-learn 0.5.0
  • Matplotlib 3.1.0
  • seaborn 0.9.0
  • Plotly 4.1.1

Setup

  1. Get access to Duke EHR (Electrical Health Record) data from DIHI project folder (dihi_qi) via PACE machine https://pace.ori.duke.edu/
  2. Clone the repo
git clone http://pacegitlab.dhe.duke.edu/dihi/2019_rfa/adult_decompensation.git
  1. Follow the code maps under ./Docs/Project to run the project from start to end

Data

All the source data comes from the following locations:

  • P:/dihi_qi/data_pipeline/data
  • P:/dihi_qi/data_pipeline/metadata

Visualizations

Data visualizations for the project include:

  • ./Code/DataPrep/cohort/visualization    //cohort statistics visualization

    View Visualization
  • ./Code/DataPrep/features/vitals/visualization    //vitals visualization (data element count, data quality assurance, etc)
  • ./Code/DataPrep/outcome/QA    //ICU admission quality assurance
  • ./Code/DataPrep/outcome/patient_flow    //Sankey diagram visualization

    View Sankey Diagram
  • ./Code/Model/visualization    //Model visualization (model metrics, etc)

    View Visualization

Status

Project is: in progress;

To-do list:

  • Hospital unit labels (LU_hospital_units table) in P:/dihi_qi/data_pipeline/db/data_pipeline.db needs to be updated by duh_dep_info_v06 (in next iteration)
  • Encounters that touch pediatric ICU and neonate ICU need to be excluded (in next iteration)
  • re-pull blood culture data as grouper has been updated (in next iteration)
  • re-pull medication data and pull order data
  • Data quality assurance for vitals needs to be refined (break down vitals into the three hospitals and into each distinct flo measurement name)
  • 100% unit conversion for vital, analyte, medication data etc is pending
  • Current random down sampling needs to be replaced by stratified down sampling
  • Data collection and prediction time window is subjective to change (1-hour prediction window in next iteration)
  • More outcomes (RRT events, mortality etc) are to be incorporated

Docs

License

Copyright 2019 Ziyuan Shen, Duke Institute for Health Innovation (DIHI), Duke University School of Medicine, Durham NC.

All Rights Reserved.

Contact

Ziyuan Shen - ziyuan.shen@duke.edu

Mengxuan Cui - mengxuan.cui@duke.edu

Acknowledgement

This work is funded by Woo Center for Big Data and Precision Health, in collaboration with DIHI (Duke Institute for Health Innovation). The authors thank Professor Xiling Shen for consistently supporting the project and DIHI team for guidance and assistance with project specifics (Will Ratliff and Mark Sendak for hospital data resource and modeling support, Michael Gao and Marshall Nichols for technical support).