Diff of /README.md [000000] .. [8d2107]

Switch to unified view

a b/README.md
1
# Codebase - Predicting Efficacy of Cardiac Resynchronization Therapy Using NLP and ML
2
## - _Charlotta Lindvall, Josh Haimson, Alex Forsynth, Michael Traub, Austin Freel_
3
4
## Description:
5
6
The project code can be categorized into 3 groups:
7
1. Data management and extraction
8
2. Transformers for data pipeline
9
3. Methods for queueing, building, and exceuting tests
10
11
Below I will group the file names by these categories, and all
12
files not mentioned are miscelaneous, superfluous or unimportant.
13
14
(1) DATA MANAGEMENT, EXTRACTION AND VALIDATION
15
    
16
    anonymizer.py -- anonymizes MRNs, SSNs and names from free text during initial data transformation
17
    extract_data.py -- misc data extraction functions
18
    free_text_jsonifyer.py -- Extracts meta data from free text files
19
    language_processing.py -- used to clean extracted values
20
    loader.py -- Loads patient data from disk
21
    tables.py -- generates table statistics for paper
22
    validate.py -- Generates statistics to validate results
23
    generateTurkTasks.py -- Creates a csv file for localturk to efficiently do manual extraction
24
25
(2) TRANSFORMERS
26
27
    baseline_transformer.py -- contains all structured data transformers
28
    doc2vec_trainer.py -- Creates doc2vec models to be used by doc2vec transformer
29
    doc2vec_transformer.py -- Transforms arbitrary length text into fixed dimensional semantic representation
30
    icd_transformer.py -- Transforms ICD9 code into hierarchical numeric representation
31
    value_extractor_transformer.py -- contains the Regex extractors
32
33
(3) QUEUING, BUILDING, AND EXECUTING TESTS
34
35
    decision_model.py -- contains the hard-coded clincical guideline
36
    experiment_runner.py -- Daemon process to continually run tests
37
    model_builder.py -- Build ML/NLP models to test
38
    model_tester.py -- Used by experiment runer to run a test on a given model
39
    queue_test.py -- Helper function to queue up tests
40
    run_test.py -- Standalone code to run an individual model and test