|
a |
|
b/README.md |
|
|
1 |
# DeepHeart |
|
|
2 |
|
|
|
3 |
DeepHeart is a neural network designed for the [2016 Physionet Challenge] |
|
|
4 |
(http://physionet.org/physiobank/database/challenge/2016/) in predicting |
|
|
5 |
cardiac abnormalities from phonocardiogram (PCG) data. The challenge |
|
|
6 |
provides heart recordings from several patients labeled as normal |
|
|
7 |
or abnormal. It is difficult to predict patient health from PCG data |
|
|
8 |
because of noise from several sources: talking, breathing, intestinal |
|
|
9 |
sounds, etc. |
|
|
10 |
|
|
|
11 |
To combat the excessive amount of noise and relatively small sample size, |
|
|
12 |
a convolutional neural network is trained using Google's [Tensorflow] |
|
|
13 |
(http://github.com/tensorflow/tensorflow). Tensorflow provides an easy to use interface |
|
|
14 |
for compiling and efficiently running neural networks. |
|
|
15 |
|
|
|
16 |
Ideally the raw wav files would be fed into a very deep Tensorflow |
|
|
17 |
network and, with some careful regularization, the model would learn |
|
|
18 |
to accurately separate signal from noise. To reduce the cost of |
|
|
19 |
training, the number of hidden units is reduced in favor of |
|
|
20 |
some old school feature engineering: the fast fourier transform (FFT). |
|
|
21 |
The FFT is a signal processing technique for converting a signal into |
|
|
22 |
a frequency domain. The original signal is also filtered with a high |
|
|
23 |
pass Butterworth filter aimed at removing noise above 4Hz (or 240 beats |
|
|
24 |
per minute). The filtered signal is again transformed to it's approximate |
|
|
25 |
frequency domain. A combination of the above fourier coefficients are |
|
|
26 |
fed into the convolutional neural network. |
|
|
27 |
|
|
|
28 |
# Installing |
|
|
29 |
|
|
|
30 |
To run, set up a virtual environment (ensure python2.7, virtualenv, and |
|
|
31 |
pip are in your PATH) |
|
|
32 |
|
|
|
33 |
``` |
|
|
34 |
>> cd deepheart |
|
|
35 |
>> virtualenv env |
|
|
36 |
>> source env/bin/activate |
|
|
37 |
>> pip install -r requirements.txt |
|
|
38 |
``` |
|
|
39 |
|
|
|
40 |
Download the physionet dataset |
|
|
41 |
|
|
|
42 |
``` |
|
|
43 |
>> wget http://physionet.org/physiobank/database/challenge/2016/training.zip |
|
|
44 |
>> unzip training.zip |
|
|
45 |
``` |
|
|
46 |
|
|
|
47 |
Install tensorflow from [Tensorflow's site](https://www.tensorflow.org/versions/r0.9/get_started/os_setup.html#pip-installation) |
|
|
48 |
(pip install recommended) |
|
|
49 |
|
|
|
50 |
Build a feature vector from the raw data and train the CNN |
|
|
51 |
``` |
|
|
52 |
>> python deepheart/train_model.py <path_to_physionet_data> <do load previously saved data> |
|
|
53 |
e.g., |
|
|
54 |
>> python deepheart/train_model.py training/ f |
|
|
55 |
``` |
|
|
56 |
|
|
|
57 |
Note: by default this saves tensorboard statistics to /tmp which can |
|
|
58 |
be launched using |
|
|
59 |
``` |
|
|
60 |
>> tensorboard --logdir=/tmp/train |
|
|
61 |
``` |
|
|
62 |
|
|
|
63 |
# Performance |
|
|
64 |
Currently physionet data is scoring using the mean of sensitivity and |
|
|
65 |
specificity (Fraction of True positives and True Negatives). These summaries |
|
|
66 |
are calculated and logged in tensorboard as well as printed to terminal. |
|
|
67 |
|
|
|
68 |
Currently, the tensorflow CNN model converges to a mean score of |
|
|
69 |
0.78. |
|
|
70 |
|
|
|
71 |
# Disclaimer |
|
|
72 |
This software is not intended for diagnostic purposes. It is only designed |
|
|
73 |
for the physionet data science competition. All statements have not been evaluated by the FDA. |
|
|
74 |
This product is not intended to diagnose, treat, cure, or prevent any disease. |