--- a
+++ b/README.md
@@ -0,0 +1,40 @@
+# Codebase - Predicting Efficacy of Cardiac Resynchronization Therapy Using NLP and ML
+## - _Charlotta Lindvall, Josh Haimson, Alex Forsynth, Michael Traub, Austin Freel_
+
+## Description:
+
+The project code can be categorized into 3 groups:
+1. Data management and extraction
+2. Transformers for data pipeline
+3. Methods for queueing, building, and exceuting tests
+
+Below I will group the file names by these categories, and all
+files not mentioned are miscelaneous, superfluous or unimportant.
+
+(1) DATA MANAGEMENT, EXTRACTION AND VALIDATION
+    
+    anonymizer.py -- anonymizes MRNs, SSNs and names from free text during initial data transformation
+    extract_data.py -- misc data extraction functions
+    free_text_jsonifyer.py -- Extracts meta data from free text files
+    language_processing.py -- used to clean extracted values
+    loader.py -- Loads patient data from disk
+    tables.py -- generates table statistics for paper
+    validate.py -- Generates statistics to validate results
+    generateTurkTasks.py -- Creates a csv file for localturk to efficiently do manual extraction
+
+(2) TRANSFORMERS
+
+    baseline_transformer.py -- contains all structured data transformers
+    doc2vec_trainer.py -- Creates doc2vec models to be used by doc2vec transformer
+    doc2vec_transformer.py -- Transforms arbitrary length text into fixed dimensional semantic representation
+    icd_transformer.py -- Transforms ICD9 code into hierarchical numeric representation
+    value_extractor_transformer.py -- contains the Regex extractors
+
+(3) QUEUING, BUILDING, AND EXECUTING TESTS
+
+    decision_model.py -- contains the hard-coded clincical guideline
+    experiment_runner.py -- Daemon process to continually run tests
+    model_builder.py -- Build ML/NLP models to test
+    model_tester.py -- Used by experiment runer to run a test on a given model
+    queue_test.py -- Helper function to queue up tests
+    run_test.py -- Standalone code to run an individual model and test