Contain 827 ECG tracings from different patients, annotated by several cardiologists, residents and medical students.
It is used as test set on the paper:
"Automatic diagnosis of the 12-lead ECG using a deep neural network".
https://www.nature.com/articles/s41467-020-15432-4.
It contain annotations about 6 different ECGs abnormalities:
- 1st degree AV block (1dAVb);
- right bundle branch block (RBBB);
- left bundle branch block (LBBB);
- sinus bradycardia (SB);
- atrial fibrillation (AF); and,
- sinus tachycardia (ST).
Companion python scripts are available in:
https://github.com/antonior92/automatic-ecg-diagnosis
Citation
Ribeiro, A.H., Ribeiro, M.H., Paixão, G.M.M. et al. Automatic diagnosis of the 12-lead ECG using a deep neural network.
Nat Commun 11, 1760 (2020). https://doi.org/10.1038/s41467-020-15432-4
Bibtex:
@article{ribeiro_automatic_2020,
title = {Automatic Diagnosis of the 12-Lead {{ECG}} Using a Deep Neural Network},
author = {Ribeiro, Ant{\^o}nio H. and Ribeiro, Manoel Horta and Paix{\~a}o, Gabriela M. M. and Oliveira, Derick M. and Gomes, Paulo R. and Canazart, J{\'e}ssica A. and Ferreira, Milton P. S. and Andersson, Carl R. and Macfarlane, Peter W. and Meira Jr., Wagner and Sch{\"o}n, Thomas B. and Ribeiro, Antonio Luiz P.},
year = {2020},
volume = {11},
pages = {1760},
doi = {https://doi.org/10.1038/s41467-020-15432-4},
journal = {Nature Communications},
number = {1}
}
ecg_tracings.hdf5
: this file is not available on github repository because of the size. But it can be downloadedtracings
. This dataset is a (827, 4096, 12)
tensor. The first dimension correspond to the 827 different exams from different {DI, DII, DIII, AVL, AVF, AVR, V1, V2, V3, V4, V5, V6}
.The signals are sampled at 400 Hz. Some signals originally have a duration of
10 seconds (10 * 400 = 4000 samples) and others of 7 seconds (7 * 400 = 2800 samples).
In order to make them all have the same size (4096 samples) we fill them with zeros
on both sizes. For instance, for a 7 seconds ECG signal with 2800 samples we include 648
samples at the beginning and 648 samples at the end, yielding 4096 samples that are them saved
in the hdf5 dataset. All signal are represented as floating point numbers at the scale 1e-4V: so it should
be multiplied by 1000 in order to obtain the signals in V.
In python, one can read this file using the following sequence:
import h5py
with h5py.File(args.tracings, "r") as f:
x = np.array(f['tracings'])
attributes.csv
contain basic patient attributes: sex (M or F) and age. Itecg_tracings.hdf5
correspond to the i-th line.annotations/
: folder containing annotations csv format. Each csv file contain 827 lines (plus the header).ecg_tracings.hdf5
correspond to the in all csv files.1dAVb, RBBB, LBBB, SB, AF, ST
=1
) or not (=0
).cardiologist[1,2].csv
contain annotations from two different cardiologist.gold_standard.csv
gold standard annotation for this test dataset. When the cardiologist 1 and cardiologist 2dnn.csv
prediction from the deep neural network described in cardiology_residents.csv
annotations from two 4th year cardiology residents (each annotated half of the dataset).emergency_residents.csv
annotations from two 3rd year emergency residents (each annotated half of the dataset).medical_students.csv
annotations from two 5th year medical students (each annotated half of the dataset).