Diff of /README.md [000000] .. [4d064f]

Switch to unified view

a b/README.md
1
# ECG Classification 
2
3
The code contains the implementation of a method for the automatic classification of electrocardiograms (ECG) based on the combination of multiple Support Vector Machines (SVMs). The method relies on the time intervals between consequent beats
4
and their morphology for the ECG characterisation.  Different descriptors based on wavelets, local binary patterns
5
(LBP), higher order statistics (HOS) and several amplitude values were employed. 
6
7
For a detailed explanation refer to the paper: [http://www.sciencedirect.com/science/article/pii/S1746809418301976](http://www.sciencedirect.com/science/article/pii/S1746809418301976)
8
9
If you use this code for your publications, please cite it as:
10
11
    @article{MONDEJARGUERRA201941,
12
    author = {Mond{\'{e}}jar-Guerra, V and Novo, J and Rouco, J and Penedo, M G and Ortega, M},
13
    doi = {https://doi.org/10.1016/j.bspc.2018.08.007},
14
    issn = {1746-8094},
15
    journal = {Biomedical Signal Processing and Control},
16
    pages = {41--48},
17
    title = {{Heartbeat classification fusing temporal and morphological information of ECGs via ensemble of classifiers}},
18
    volume = {47},
19
    year = {2019}
20
    }
21
22
23
24
## Requirements
25
26
Python implementation is the most updated version of the repository. Matlab implementation is independent. Both implementations are tested under Ubuntu 16.04. 
27
28
### [Python](python)
29
30
- [Numpy](https://docs.scipy.org/doc/numpy-1.13.0/user/install.html)
31
- [Scikit learn](http://scikit-learn.org/stable/install.html)
32
- [Matplotlib](https://matplotlib.org/) (Optional)
33
        
34
### [Matlab](matlab)
35
Performed using Matlab 2016b 64 bits 
36
37
- [LibSVM](https://www.csie.ntu.edu.tw/~cjlin/libsvm/#download)
38
39
*Implementation for [TensorFlow](tensorflow) is in early stage and will not be maintained by the author.*
40
41
42
## Steps (How to run)
43
 
44
1. Download the dataset:
45
    - a) Download via Kaggle:
46
47
        The raw signals files (.csv) and annotations files can be downloaded from [kaggle.com/mondejar/mitbih-database](https://www.kaggle.com/mondejar/mitbih-database)
48
49
    - b) Download via WFDB:
50
51
        https://www.physionet.org/faq.shtml#downloading-databases
52
53
        Using the comand **rsync** you can check the datasets availability:
54
55
        ```
56
        rsync physionet.org::
57
        ```
58
        The terminal will show all the available datasets:
59
        ```
60
        physionet       PhysioNet web site, volume 1 (about 23 GB)
61
        physionet-small PhysioNet web site, excluding databases (about 5 GB)
62
        ...
63
        ...
64
        umwdb           Unconstrained and Metronomic Walking Database (1 MB)
65
        vfdb            MIT-BIH Malignant Ventricular Ectopy Database (33 MB)
66
        ```
67
68
        Then select the desired dataset as:
69
        ```
70
        rsync -Cavz physionet.org::mitdb /home/mondejar/dataset/ECG/mitdb
71
        ```
72
73
        ```
74
        rsync -Cavz physionet.org::incartdb /home/mondejar/dataset/ECG/incartdb
75
        ```
76
77
        Finally to convert the data as plain text files use [convert_wfdb_data_2_csv.py](https://github.com/mondejar/WFDB_utils_and_others/blob/master/convert_wfdb_data_2_csv.py). One file with the raw data and one file for annotations ground truth. 
78
79
        Also check the repo [WFDB_utils_and_others](https://github.com/mondejar/WFDB_utils_and_others) for more info about WFDB database conversion and the original site from [Physionet_tools](https://www.physionet.org/physiotools/wag/wag.htm).
80
81
2. Run:
82
83
    Run the file *run_train_SVM.py* and adapt the desired configuration to call *train_SVM.py* file. This call method will train the SVM model using the training set and evaluates the model on a different test set. 
84
85
    Check and adjust the path dirs on *train_SVM.py* file.
86
87
4. Combining multiples classifiers:
88
    
89
    Run the file *basic_fusion.py* to combine the decisions of previously trained SVM models. 
90
91
92
## Methodology
93
94
The data is splited following the **inter-patient** scheme proposed by [Chazal *et al*](http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1306572)., i.e the training and eval set not contain any patient in common.
95
96
This code classifies the signal at beat-level following the class labeling of the [AAMI recomendation](###aami-recomendation-for-mit). 
97
98
### 1 Preprocess:
99
First, the baseline of the signal is substracted. Additionally, some noise removal can be done.
100
101
Two median filters are applied for this purpose, of 200-ms and 600-ms.
102
Note that this values depend on the frecuency sampling of the signal.
103
104
105
```python
106
    from scipy.signal import medfilt
107
    ...
108
    
109
    # median_filter1D
110
    baseline = medfilt(MLII, 71) 
111
    baseline = medfilt(baseline, 215) 
112
```
113
            
114
The signal resulting from the second filter operation contains the baseline wanderings and can be subtracted from the original signal.
115
```python        
116
    # Remove Baseline
117
    for i in range(0, len(MLII)):
118
        MLII[i] = MLII[i] - baseline[i]
119
```
120
121
### 2 Segmentation: Beat Detection
122
In this work the annotations of the MIT-BIH arrhyhtmia was used in order to detect the R-peak positions. However, in practise they can be detected using the following software (see [Software references: Beat Detection](#software-references:-beat-detection)). 
123
124
### 3 Feature Descriptor
125
In order to describe the beats for classification purpose, we employ the following  features:
126
127
1. **Morphological**: for this features a window of [-90, 90] was centred along the R-peak:
128
129
    0. **RAW-Signal** (180): is the most simplier descriptor. Just employ the amplitude values from the signal delimited by the window.
130
    
131
    1. **Wavelets** (23): The wavelet transforms have the capability to allow information extraction from both frequency and time domains, which make them suitable for ECG description. The signal is decomposed using *wave_decomposition* function using family *db1*  and 3 levels. 
132
133
    ```python
134
        import pywt
135
        ...
136
137
        db1 = pywt.Wavelet('db1')
138
        coeffs = pywt.wavedec(beat, db1, level=3)
139
        wavel = coeffs[0]
140
    ```
141
142
    2. **HOS** (10): extracted from 3-4th order cumulant, skewness and kurtosis. 
143
    ```python
144
        import scipy.stats
145
        ...
146
        
147
        n_intervals = 6
148
        lag = int(round( (winL + winR )/ n_intervals))
149
        ...
150
        # For each beat 
151
        for i in range(0, n_intervals-1):
152
            pose = (lag * (i+1))
153
            interval = beat[(pose -(lag/2) ):(pose + (lag/2))]
154
            # Skewness  
155
            hos_b[i] = scipy.stats.skew(interval, 0, True)
156
157
            # Kurtosis
158
            hos_b[5+i] = scipy.stats.kurtosis(interval, 0, False, True)
159
    ```
160
    3. **U-LBP 1D (59)** 1D version of the popular LBP descriptor. Using the uniform patterns with neighbours = 8
161
    ```python
162
        import numpy as np
163
        ...
164
165
        hist_u_lbp = np.zeros(59, dtype=float)
166
167
        for i in range(neigh/2, len(signal) - neigh/2):
168
            pattern = np.zeros(neigh)
169
            ind = 0
170
            for n in range(-neigh/2,0) + range(1,neigh/2+1):
171
                if signal[i] > signal[i+n]:
172
                    pattern[ind] = 1          
173
                ind += 1
174
            # Convert pattern to id-int 0-255 (for neigh =8)
175
            pattern_id = int("".join(str(c) for c in pattern.astype(int)), 2)
176
177
            # Convert id to uniform LBP id 0-57 (uniform LBP)  58: (non uniform LBP)
178
            if pattern_id in uniform_pattern_list:
179
                pattern_uniform_id = int(np.argwhere(uniform_pattern_list == pattern_id))
180
            else:
181
                pattern_uniform_id = 58 # Non uniforms patternsuse
182
183
            hist_u_lbp[pattern_uniform_id] += 1.0
184
    ```
185
186
    4. **My Descriptor (4)**: computed from the Euclidean distance of the  R-peak  and  four  points  extracted  from  the 4 following intervals:
187
        - max([0, 40])
188
        - min([75, 85])
189
        - min([95, 105])
190
        - max([150, 180])
191
192
    ```python
193
        import operator
194
        ...
195
196
        R_pos = int((winL + winR) / 2)
197
198
        R_value = beat[R_pos]
199
        my_morph = np.zeros((4))
200
        y_values = np.zeros(4)
201
        x_values = np.zeros(4)
202
        # Obtain (max/min) values and index from the intervals
203
        [x_values[0], y_values[0]] = max(enumerate(beat[0:40]), key=operator.itemgetter(1))
204
        [x_values[1], y_values[1]] = min(enumerate(beat[75:85]), key=operator.itemgetter(1))
205
        [x_values[2], y_values[2]] = min(enumerate(beat[95:105]), key=operator.itemgetter(1))
206
        [x_values[3], y_values[3]] = max(enumerate(beat[150:180]), key=operator.itemgetter(1))
207
        
208
        x_values[1] = x_values[1] + 75
209
        x_values[2] = x_values[2] + 95
210
        x_values[3] = x_values[3] + 150
211
        
212
        # Norm data before compute distance
213
        x_max = max(x_values)
214
        y_max = max(np.append(y_values, R_value))
215
        x_min = min(x_values)
216
        y_min = min(np.append(y_values, R_value))
217
        
218
        R_pos = (R_pos - x_min) / (x_max - x_min)
219
        R_value = (R_value - y_min) / (y_max - y_min)
220
                    
221
        for n in range(0,4):
222
            x_values[n] = (x_values[n] - x_min) / (x_max - x_min)
223
            y_values[n] = (y_values[n] - y_min) / (y_max - y_min)
224
            x_diff = (R_pos - x_values[n]) 
225
            y_diff = R_value - y_values[n]
226
            my_morph[n] =  np.linalg.norm([x_diff, y_diff])
227
    ```
228
229
2. **Interval RR** (4): intervals computed from the time between consequent beats. There are the most common feature employed for ECG classification. 
230
    1. pre_RR
231
    2. post_RR
232
    3. local_RR
233
    4. global_RR
234
235
3. **Normalized RR** (4): RR interval normalized by the division with the AVG value from each patient.
236
    1. pre_RR / AVG(pre_RR)
237
    2. post_RR / AVG(post_RR)
238
    3. local_RR / AVG(local_Python (Scikit-learn)  
239
    4. global_RR / AVG(global_RR)   
240
241
 **NOTE**: 
242
 *Beats having a R–R interval smaller than 150 ms or higher than 2 s most probably involve segmentation errors and are discarded*. "Weighted Conditional Random Fields for Supervised Interpatient Heartbeat Classification"* 
243
244
### 4 Normalization of the features
245
246
Before train the models. All the input data was standardized with [z-score](https://en.wikipedia.org/wiki/Standard_score), i.e., the values
247
of each dimension are divided by its standard desviation and substracted by its mean.
248
```python
249
    import sklearn
250
    from sklearn.externals import joblib
251
    from sklearn.preprocessing import StandardScaler
252
    from sklearn import svm
253
    ...
254
255
    scaler = StandardScaler()
256
    scaler.fit(tr_features)
257
    tr_features_scaled = scaler.transform(tr_features)
258
259
    # scaled: zero mean unit variance ( z-score )
260
    eval_features_scaled = scaler.transform(eval_features)
261
```
262
263
### 5 Training and Test
264
265
In scikit-learn the multiclass SVM support is handled according to a [one-vs-one](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) scheme.
266
267
Since   the   MIT-BIH   database presents  high  imbalanced  data,  several  weights
268
equal  to the  ratio  between  the  two  classes  of  each model  were
269
employed to compensate this differences.
270
271
The  Radial  Basis  Function  (RBF) kernel was employed.
272
273
```python
274
    class_weights = {}
275
    for c in range(4):
276
        class_weights.update({c:len(tr_labels) / float(np.count_nonzero(tr_labels == c))})
277
278
279
    svm_model = svm.SVC(C=C_value, kernel='rbf', degree=3, gamma='auto', 
280
                    coef0=0.0, shrinking=True, probability=use_probability, tol=0.001, 
281
                    cache_size=200, class_weight=class_weights, verbose=False, 
282
                    max_iter=-1, decision_function_shape=multi_mode, random_state=None)
283
284
    svm_model.fit(tr_features_scaled, tr_labels) 
285
```
286
287
288
For evaluating the model, the jk index [Mar et. al](https:doi.org/10.1109/TBME.2011.2113395)) were employed as performance measure
289
290
```python
291
292
    decision_ovo        = svm_model.decision_function(eval_features_scaled)
293
    predict_ovo, counter    = ovo_voting_exp(decision_ovo, 4)
294
295
    perf_measures = compute_AAMI_performance_measures(predict_ovo, labels)
296
```
297
298
### 6 Combining Ensemble of SVM
299
300
Several basic combination rules can be employed to combine the decision from different SVM model configurations in a single prediction (see basic_fusion.py)
301
302
303
### 7 Comparison with state-of-the-art on MITBIH database:
304
305
306
| Classifier          | Acc. | Sens. | jk index |
307
|---------------------|------|-------|----------|
308
| Our Ensemble of SVMs| **0.945**| 0.703 | **0.773** |
309
| [Zhang et al.](https://doi.org/10.1016/j.compbiomed.2013.11.019)        | 0.883| **0.868** | 0.663 |
310
| Out Single SVM      | 0.884| 0.696 | 0.640 |
311
| [Mar et al.](https://doi.org/10.1109/TBME.2011.2113395)| 0.899| 0.802 | 0.649 |
312
| [Chazal et al.](https://doi.org/10.1109/TBME.2004.827359)       | 0.862| 0.832 | 0.612 |
313
314
# About datasets:
315
316
https://physionet.org/cgi-bin/atm/ATM
317
318
# MIT-Arrythmia Database
319
320
<b>360HZ</b>
321
322
48 Samples of 30 minutes, 2 leads 
323
47 Patients:
324
325
* 100 series: 23 samples
326
* 200 series: 25 samples. **Contains uncommon but clinically important arrhythmias**
327
328
| Symbol|   Meaning                                   |
329
|-------|---------------------------------------------|
330
|· or N |   Normal beat                                 |
331
|L      |   Left bundle branch block beat             |
332
|R      |   Right bundle branch block beat              |
333
|A      |   Atrial premature beat                       |
334
|a      |   Aberrated atrial premature beat             |
335
|J      |   Nodal (junctional) premature beat           |
336
|S      |   Supraventricular premature beat             |
337
|V      |   Premature ventricular contraction           |
338
|F      |   Fusion of ventricular and normal beat       |
339
|[      |   Start of ventricular flutter/fibrillation   |
340
|!      |   Ventricular flutter wave                    |
341
|]      |   End of ventricular flutter/fibrillation     |
342
|e      |   Atrial escape beat                          |
343
|j      |   Nodal (junctional) escape beat              |
344
|E      |   Ventricular escape beat                     |
345
|/      |   Paced beat                                  |
346
|f      |   Fusion of paced and normal beat             |
347
|x      |   Non-conducted P-wave (blocked APB)          |
348
|Q      |   Unclassifiable beat                         |
349
||      |   Isolated QRS-like artifact                  |
350
351
[beats and rhythms](https://physionet.org/physiobank/database/html/mitdbdir/tables.htm#allrhythms)
352
353
||Rhythm annotations appear below the level used for beat annotations|
354
|-|-------------------------------------------------------------------|
355
|(AB |      Atrial bigeminy|
356
|(AFIB |    Atrial fibrillation|
357
|(AFL|  Atrial flutter|
358
|(B|        Ventricular bigeminy|
359
|(BII|  2° heart block|
360
|(IVR|  Idioventricular rhythm|
361
|(N|        Normal sinus rhythm|
362
|(NOD|  Nodal (A-V junctional) rhythm|
363
|(P|        Paced rhythm|
364
|(PREX| Pre-excitation (WPW)|
365
|(SBR|  Sinus bradycardia|
366
|(SVTA| Supraventricular tachyarrhythmia|
367
|(T|        Ventricular trigeminy|
368
|(VFL|  Ventricular flutter|
369
|(VT|       Ventricular tachycardia
370
371
### AAMI recomendation for MIT 
372
There are 15 recommended classes for arrhythmia that are classified into 5 superclasses: 
373
374
| SuperClass| | | | | | | 
375
|------|--------|---|---|---|---|-|
376
| N  (Normal)  | N      | L | R |  |  | |
377
| SVEB (Supraventricular ectopic beat) | A      | a | J | S |  e | j |
378
| VEB  (Ventricular ectopic beat)| V      | E |   |   |   | |
379
| F    (Fusion beat) | F      |   |   |   |   | |
380
| Q   (Unknown beat)  | P      | / | f | u |   |    |
381
382
### Inter-patient train/test split ([Chazal *et al*](http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1306572)):
383
DS_1 Train: 101, 106, 108, 109, 112, 114, 115, 116, 118, 119, 122, 124, 201, 203, 205, 207, 208, 209, 215, 220, 223, 230
384
385
| Class|N|SVEB|VEB|F|Q|
386
|------|-|-|-|-|-|
387
| instances| 45842|944|3788|414|0|    
388
 
389
DS_2 Test: = 100, 103, 105, 111, 113, 117, 121, 123, 200, 202, 210, 212, 213, 214, 219, 221, 222, 228, 231, 232, 233, 234
390
391
| Class|N|SVEB|VEB|F|Q|
392
|------|-|-|-|-|-|
393
| instances| 44743|1837|3447|388|8|    
394
 
395
# INCART Database 
396
https://www.physionet.org/pn3/incartdb/
397
398
<b>257HZ</b>
399
400
75 records of 30 minutes, 12 leads [-4000, 4000]
401
402
Gains varying from 250 to 1100 analog-to-digital converter units per millivolt.
403
Gains for each record are specified in its .hea file. 
404
405
The reference annotation files contain over 175,000 beat annotations in all.
406
407
The original records were collected from patients undergoing tests for coronary artery disease (17 men and 15 women, aged 18-80; mean age: 58). None of the patients had pacemakers; most had ventricular ectopic beats. In selecting records to be included in the database, preference was given to subjects with ECGs consistent with ischemia, coronary artery disease, conduction abnormalities, and arrhythmias;observations of those selected included:
408
409
410
411
# Software references: Beat Detection
412
1. [*Pan Tompkins*](https://es.mathworks.com/matlabcentral/fileexchange/45840-complete-pan-tompkins-implementation-ecg-qrs-detector)
413
    
414
    [third_party/Pan_Tompkins_ECG_v7/pan_tompkin.m](third_party/Pan_Tompkins_ECG_v7/pan_tompkin.m)
415
416
2. [*ecgpuwave*](third_party/README.md) Also gives QRS onset, ofset, T-wave and P-wave
417
NOTE:*The beats whose Q and S points were not detected are considered as outliers and automatically rejected from our datasets.*
418
419
3. [ex_ecg_sigprocessing](https://es.mathworks.com/help/dsp/examples/real-time-ecg-qrs-detection.html)
420
421
4. osea
422
423
424
# License
425
The code of this repository is available under [GNU GPLv3 license](https://www.gnu.org/licenses/gpl-3.0.html).