159 lines (158 with data), 5.9 kB
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Atrial Fibrillation Features\n",
"\n",
"As with activity classification, we will featurize our time series signal and throw those features into a classifier. For AF we will build a two-class classifier using the inter-beat-interval time series. This time series can be derived by taking the difference between successive QRS complex locations as provided by the Pan-Tompkins algorithm in the previous videos. We source our features from a couple submissions to the [Computing in Cardiology Challenge 2017](https://physionet.org/content/challenge-2017/1.0.0/). \n",
"\n",
">[Behar, Rosenberg, Yaniv, Oster. Rhythm and Quality Classification from Short ECGs Recorded Using a Mobile Device. Computing in Cardiology Challenge 2017.](http://www.cinc.org/archives/2017/pdf/165-056.pdf) \n",
"\n",
"and \n",
">[Bonizzi, Driessens, Karel. Detection of Atrial Fibrillation Episodes from Short Single Lead Recordings by Means of Ensemble Learning. Computing in Cardiology Challenge 2017.](http://www.cinc.org/archives/2017/pdf/169-313.pdf)\n",
"\n",
"\n",
"Recall that the RR interval time series is irregularly sampled because we get a new measurement at each heart beat, instead of at some fixed interval in time. Many of the features we want can be computed on an irregular time series, but some, especially frequency domain features, cannot. In that case, we will first have to make this uniform by using interpolation.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Imports"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import glob\n",
"import os\n",
"\n",
"import numpy as np\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fs = 300\n",
"data_dir = '../../datasets/cinc/'\n",
"ref = pd.read_csv(data_dir + 'REFERENCE.csv')\n",
"ref = dict(zip(ref.record, ref.rhythm))\n",
"base = lambda f: os.path.splitext(os.path.basename(f))[0]\n",
"files = sorted(glob.glob(data_dir + '*.npz'))\n",
"qrs = []\n",
"labels = []\n",
"for f in files:\n",
" with np.load(f) as npz:\n",
" qrs.append(npz['qrs'])\n",
" labels.append(ref[base(f)])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Features\n",
"\n",
"The features for the AF detection algorithm are computed from the RR interval time series. We use the time domain and frequency domain features listed below.\n",
"\n",
"### Time domain\n",
" - minimum RR interval\n",
" - maximum RR interval\n",
" - median RR interval\n",
" - average RR interval\n",
" - standard deviation of RR intervals\n",
" - number of RR interval outliers\n",
" - An RR interval is an outlier if it is greater than 1.2x the average RR interval in the ECG record\n",
"\n",
" - root-mean-square of the difference between adjacent RR intervals\n",
" - percent of RR interval differences greater than 50 milliseconds\n",
"\n",
"### Frequency domain\n",
"The RR interval time series is not sampled regularly in time. We only have a datapoint every heart beat. Before we can compute any frequency domain features, the time series must be resampled so that we get uniform data points. Resample the RR interval time series to 4 Hz before computing the features below.\n",
"\n",
" - peak magnitude between 0.04 Hz and 0.15 Hz in the regularized RR interval time series\n",
" - main frequency between 0.04 Hz and 0.15 Hz in the regularized RR interval time series\n",
" - peak magnitude between 0.15 Hz and 0.4 Hz in the regularized RR interval time series\n",
" - main frequency between 0.15 Hz and 0.4 Hz in the regularized RR interval time series"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def Featurize(qrs_inds, fs):\n",
" \"\"\"Featurize the qrs complex locations time series.\n",
"\n",
" Args:\n",
" qrs_inds: (np.array of number) the sample indices of the QRS complex locations\n",
" fs: (number) the sampling rate\n",
"\n",
" Returns:\n",
" n-tuple of features\n",
" \"\"\"\n",
" # Compute time domain features\n",
" min_rr = None\n",
" max_rr = None\n",
" median_rr = None\n",
" mean_rr = None\n",
" std_rr = None\n",
" n_outliers = None\n",
" rmssd = None\n",
" pdrr_50 = None\n",
"\n",
" # Regularly resample the RR interval time series\n",
"\n",
" # Compute the Fourier transform of the regular RR interval time series\n",
"\n",
" # Compute frequency domain features\n",
" lf_mag = None\n",
" lf_freq = None\n",
" hf_mag = None\n",
" hf_freq = None\n",
"\n",
" return (min_rr, max_rr, median_rr, n_outliers, mean_rr, std_rr, rmssd, pdrr_50,\n",
" lf_mag, lf_freq, hf_mag, hf_freq)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}