dl-eeg-review / Git / [818e31] /data/data

Models:

ReneeD/

dl-eeg-review

Downloads: 1

[818e31]: / data / data_items.csv

History

Download this file

#				Origin							Rationale								Data											EEG processing							Methodology																																	Results and discussion												Reviewed by
1		Title	Year	Authors	Journal / Origin	Preprint first	Type of paper	Lab / School / Company	Country	Pages	Domain 1	Domain 2	Domain 3	Domain 4	High-level Goal	Practical Goal	Task/Paradigm	Motivation for DL	EEG Hardware	Neural response pattern	Dataset name	Dataset accessibility	Data	Data - samples	Data - time	Data - subjects	Nb Channels	Sampling rate	Offline / Online	Preprocessing	Preprocessing (clean)	Artefact handling	Artefact handling (clean)	Features	Features (clean)	Normalization	Software	Architecture	Architecture (clean)	Design peculiarities	EEG-specific design	Network Schema	Input format	Layers	Layers (clean)	Activation function	Regularization	Regularization (clean)	Nb Classes	Classes	Output format	Nb Parameters	Training procedure	Training procedure (clean)	Optimizer	Optimizer (clean)	Optim parameters	Minibatch size	Hyperparameter optim	Hyperparameter optim (clean)	Data augmentation	Loss	Intra/Inter subject	Cross validation	Cross validation (clean)	Data split	Performance metrics	Performance metrics (clean)	Training hardware	Training time	Results	Benchmarks	Baseline model type	Statistical analysis of performance	Analysis of learned parameters	Model inspection (clean)	Discussion	Limitations	Code available	Code hosted on	Limited data	First Reader	Second Reader	Validated by Author(s)	Citation
2		EEG-signals based cognitive workload detection of vehicle driver using deep learning	2018	Almogbel, Dang & Kameyama	IEEE Conference on Advanced Communication Technology	No	Conference	Waseda University	Japan	4	Classification of EEG signals	Monitoring	Cognitive	Mental workload	Improve State-of-the-Art		Driving Game (GTA)		Muse (InteraXon)	Raw EEG	Internal Recordings	Private	1 subject x 24 sessions (12 High / 12 Low Workload) 15-30min each session (Sliding windows [from 30s to 180s], 1/256 overlap)	216	540	1	4	256		None	No	No	No	Raw EEG	Raw EEG	z-score	N/M*	CNN	CNN			Yes	38400x4 (38400 = 150s @ 256Hz)	7 Conv + 3 FC	10	ReLU	Dropout: 50%	Yes			2 (Softmax)	N/M*	Standard	Standard	RMSProp	Other	lr=0.002	64	N/M	N/M	No	Binary cross-entropy	Intra	Leave-One-Session-Out	Leave-One-Session-Out	Train: 92% Valid: 8% Test: N/A	Accuracy	accuracy	N/M	N/M	95.31%	No	None	No	No	No	"This study does not impose in any way a direct comparison with the distinguished previous works because the used data, experimental conditions, classification targets are different in each, but rather explore and introduce the potential of using deep CNN architecture in classifying raw EEG signals without any pre-processing."		No	N/A	No	Yannick Roy	Isabela Albuquerque	TBC	Almogbel2018
3		Automatic ocular artifacts removal in EEG using deep learning	2018	Yang, Duan, Fan, Hu & Wang	Biomedical Signal Processing and Control	No	Journal	Key Laboratory of Power Station Automation Technology, Shanghai University	China	11	Improvement of processing tools	Signal cleaning	Artifact handling		Novel		Motor Imagery		WirelessEEG (Neuracle)	Clean EEG / Ocular artefacts	BCI Competition IV - I; Internal Recordings	Public	Each subject has 200 trails of motor imagery and each trail lasts for more than 6s. Subject 1, 2, 6 and 7 from BCI Comp. dataset + 3 (internal recordings)	1400	140	7	59; 32	100		1) Band-Pass Filter: 0.05-200Hz	Yes	No	No	Raw EEG	Raw EEG	min-max	MATLAB	SAE	AE	N/M*		Yes	100x1	3	3		L1	Yes			100x1	N/M*	Greedy Layer-wise training	Standard	N/M*	N/M			N/M	N/M	No	RMSE	Inter	No	No	16520 training samples 15458 test samples	RMSE	RMSE	Not mentioned	N/M*	RMSE is lower for proposed approach than for benchmarks, as is the accuracy on the surrogate MI task	Shallow SAE, ICA, K-ICA, SOBI	Traditional pipeline	No	No	No	"Compared with the classical OAs removal methods, the proposed method has many highlights. [...] In the future work, we are going to improve the training method of DLN or try replacing the SAE with other neural networks such as convolutional neural networks (CNN) to strengthen its fitting ability for the details of EEG."		No	N/A	No	Yannick Roy	Hubert Banville	TBC	Yang2018
4		An end-to-end framework for real-time automatic sleep stage classification	2018	Patanaik, Ong, Gooley, Ancoli-Israel & Chee	Sleep	No	Journal	Duke-NUS Medical School, Singapore University of California, San Diego	Singapore	11	Classification of EEG signals	Clinical	Sleep	Staging	Improve State-of-the-Art: DL for Sleep (CNN + MLP)	Reduce the time necessary to stage sleep recordings by using DL	Sleep	No need for feature engineering	N/M	Raw EEG	Internal Recordings	Private	Four datasets ≈ 1700 polysomnography records a total of 11,727 hr of PSG data / 1,403,164 epochs	1403164	703620	459	2	N/M		1) Pass-Band Filter (FIR): 0.3-45Hz 2) Downsampled to 100Hz (polyphase FIR filter)	Yes	No	No	Spectrogram	Frequency-domain	N/M	TensorFlow	CNN + MLP (2 Stages)	CNN	Consecutive probabilities outputted by the CNN are aggregated by the MLP	N/M	Yes	32x32x3 (spectrogram 2D x 3 channels)	CNN: 16 MLP: 1	17	CNN: ReLU MLP: tansig	N/M	N/M			5 (Softmax) Probability of each Sleep Stage	dCNN: 177 669 weights MLP: 445 weights	Standard optimization	Standard	Stochastic gradient descent with Nesterov momentum	SGD	Learning rate: 0.001 Momentum: 0.9 Learning rate decay: 10e-6	CNN: 300 MLP: 1000	Trial and error	Yes	No	Categorical Cross-Entropy	Inter	Train-Valid-Test	Train-Valid-Test	Train: 75% of DS1 & DS2 Test: 25% of DS1 & DS2 Validation: DS3, DS4	Accuracy Cohen's kappa	accuracy, Cohen's kappa	NVidia GTX 1060	N/M	Test set: ~89.8%. kappa=0.862 Validation set 1: 81.4%, kappa=0.740 Validation set 2: 72.1%, kappa=0.597	Expert rescoring of 50 records	Traditional pipeline	t-test on Cohen's kappa (automatic vs. expert rescoring) -> stat. diff. for validation set 3 but not for 4	No	No	"... our framework provides a practicable, validated, and speedy solution for automatic sleep stage classification that can significantly improve throughput and productivity of sleep labs. It has the potential to play an important role in emerging novel applications of real-time automatic sleep scoring as well as being installed in personal sleep monitors."	N/M	No	N/A	No	Yannick Roy	Hubert Banville	Yes	Patanaik2018
5		Epileptic Seizure Detection: A Deep Learning Approach	2018	Hussein, Palangi, Ward & Wang	Arxiv	Yes	Preprint	UBC	Canada	12	Classification of EEG signals	Clinical	Epilepsy	Detection	Improve State-of-the-Art: DL for Epilepsy (LSTM)	Improve performance on seizure detection with DL, on real conditions (with noise)	Resting State, Eyes Open, Eyes Closed, Seizures.	Automatically learns features	N/M	Raw EEG	Bonn University	Public	Bonn University (A,B,C,D,E) 5x 100 epochs of 23.6s	500	197	15	1	173.6		1) Artifacts Removed 2) Band-Pass Filter: 0.53-40Hz (before saving the dataset... "hardcoded")	Yes	Yes (dataset already cleaned)	Yes	Raw EEG	Raw EEG	N/M	MATLAB, Python Keras with TensorFlow backend	LSTM	RNN	N/M	N/M	Yes	100x2	3	3	N/M	N/M	N/M			2, 3 or 5	N/M	Standard optimization	Standard	Adam	Adam	LR: 0.001	64	N/M	N/M	Added artifacts (EMG, EOG) and Gaussian white noise	Categorical Cross-Entropy	Inter	10-Fold CV	k-fold	Train: 80% Test: 20%	Accuracy Sensitivity Specificity	accuracy, sensitivity, specificity	NVidia K40	2h	100% everywhere. For Sensitivity, Specificity & Accuracy of the 2-Classes, 3-Classes & 5-Classes. Robust to artificial artfiacts	Compared with many other SotA using the same dataset. BNN, ME, SVM, ELM, LDA, SVM, KNN, ANN, etc.	Traditional pipeline	No	No	No	Compared to the state-of-the-art methods, this approach can learn the high-level representations, and can effectively discriminate between the normal and seizure EEG activities. Another advantage of this approach lies in its robustness against common EEG artifacts (e.g., muscle activities and eye- blinking) and white noise.	Unbalanced class distributions	No	N/A	No	Yannick Roy	Hubert Banville	TBC	Hussein2018
6		Development of a brain computer interface interface using multi-frequency visual stimulation and deep neural networks	2018	Perez-Benitez, Perez-Benitez & Espina-Hernandez	IEEE Conference on Electronics, Communications and Computers	No	Conference	National Polytechnic Institute, Mexico	Mexico	7	Classification of EEG signals	BCI	Reactive	SSVEP	Improve State-of-the-Art: SSVEP with CNN	Increase number of commands and reduce eyestrain in a visual BCI	SSVEP (with LEDs)	Just another classifier!	(Custom-made)	SSVEP	Internal Recordings	Private	11 subjects x 5 stimuli	N/M	N/M	11	3	250		N/M	N/M	N/M	N/M	Spectrum	Frequency-domain	N/M	N/M	SAE (Sparse AutoEncoder)	AE	N/M	N/M	Yes	N/M	SAE: 2 Final: 2	4	Sigmoid	L2 regularization Sparsity loss	Yes			5 (Softmax) Diff SSVEP Freqs	N/M	1) Train SAE 2) Train softmax on top of SAE middle layer	Pre-training	N/M	N/M	epochs: 50 lambda (L2): 0.16 gamma (sparsity): 1.0 rho: 0.1	N/M	N/M	N/M	No	Mean Squared Error (SAE) Cross-entropy (softmax layer)	Intra	No	No	N/M	Accuracy	accuracy	N/M	N/M	97.78% (not clear if on training set or something else!)	k-NN Naive Bayes, Bayes Kernel Decision Tree, Random Forest, Gradient Boosted Tree Rule Induction, MC-SVM, ML Perceptron	Traditional pipeline	No	Visualization of learned parameters	Analysis of weights	The analysis of the DNN first layer weights reveals that there are two main patterns containing information about the SSVEPs in the power spectrums of the measured EEG signals: (i) the weights reinforces the features of the spectrum at frequencies {fst}, 3/2 {fst} and 2{fst} where fst are the frequencies of the MFVS and the other (ii) the weights reinforces the features of the spectrum at low frequencies from 0 Hz – 20Hz.	N/M	No	N/A	No	Yannick Roy	Hubert Banville	TBC	Perez-Benitez2018
7		Deep Semantic Architecture with discriminative feature visualization for neuroimage analysis	2018	Ghosh, Dal Maso, Roig, Mitsis & Boudrias	Arxiv	Yes	Preprint	McGill, UdeM	Canada	11	Classification of EEG signals	Monitoring	Physical	Exercise	Improve SOTA	Study the add-on effects of exercise on motor learning	Hand motor task before and after an acute exercise	Does not require hand-engineered features	(BrainProducts)	Brain Rhythms (SMR)	Internal Recordings	Private	25 subjects 4 x 50 x [3.5 sec (holding) + 3 to 5 sec (rest)]	5000	625	25	64	2500		1) Band-Pass Filter: 0.5-55Hz 2) Re-reference to average. 3) Visual Inspection, noisy signal segment removed 4) ICA to remove eye blinks 5) Morlet Wavelet (wave:7, 1Hz reso)	Yes	Visual inspection to reject transient artefacts ICA for eye blinks	Yes	Frequency Bands (55)	Frequency-domain	Per-electrode spectral normalization	Brainstorm (MATLAB), Torch	CNN	CNN	1) Base CNN that expects baseline and post-condition data in parallel 2) CNN that predicts class 3) Adverserial Component to penalize subject-dependent training	Base CNN: spectral-only convolutions	Yes	64 x 55 (channels x freq bands) [x2 since the Base CNN is used twice in a single pass]	[On TF maps, on topo maps] BaseCNN: 2 + 1, 3 + 1 Discriminator: 2, 2 Adversary: 2, 2	7	ReLU	Dropout, weight decay	Yes			2 Prob of EXE Prob of CON	N/M	Standard optimization	Standard	Adam	Adam	[On TF maps, on topo maps] LR: 0.001, 0.001 LR decay: 0.0001, 0.001 Weight decay: 0.001, 0.03	N/M	N/M	N/M	No	Negative Log-Likelihood (part 1) & KL-Divergence (part 2)	Inter	N/M	No	Train: 80% Validation: 20% Test: N/M	Accuracy	accuracy	N/M	N/M	98.70%	N/M	None	No	Visualization of class activation maps (proposed method)	Class Activation Maps	"Importantly, the proposed novel method enabled us to visualize the features learnt by deep networks such as CNNs, which may in turn yield better interpretation of their classification basis."	N/M	No	N/A	No	Yannick Roy	Hubert Banville	TBC	Ghosh2018
8		Cascade and Parallel Convolutional Recurrent Neural Networks on EEG-based Intention Recognition for Brain Computer Interface	2018	Zhang, Yao, Zhang, Wang, Chen & Boots	AAAI Conference on Artificial Intelligence	Yes	Preprint	University of New South Wales	Australia	8	Classification of EEG signals	BCI	Active	Motor imagery	Novel Approach: Cascade & Parallel CNN and RNN	Compare Cascade and Parallel CNN + RNN on Motor Imagery Dataset (eegmmidb) to SOTA	Motor Imagery (see eegmmidb dataset)	To capture temporal and spatial information.	N/M	Motor Imagery	eegmmidb; Internal Recordings	Both	(eegmmidb) 108 subjects, 3,145,160 EEG (2808min) (Internal recordings) 9 subjects x 30 trials (6 per class) Internal recordings: 10s action, 10s rest	3145430	2898	108	64	160	Offline	1) 2D Mesh (Matrix) 2) Sliding Window (clips) 3) Normalize	Yes	No	No	2D Mesh Clips (of Raw EEG)	Raw EEG	z-score	N/M	Cascade / Parallel CNN + RNN (LSTM)	CNN+RNN	CNN + LSTM combined (serial or parallel)	To capture spatial and temporal resolution	Yes	2D Data mesh (time signal x spatial matrice)	3 CNN + 1 FC (1024) 2 LSTM (64) + 1 FC (1024)	7	N/M	Dropout (0.5)	Yes	5	5 Motor Commands	5 (Softmax)	N/M	N/M	N/M	Adam	Adam	LR: 0.0001	N/M	N/M	N/M	N/M	Cross-Entropy	Inter	N/M	No	Train: 75% Test: 25%	Accuracy	accuracy	Nvidia Titan X Pascal	N/M	Cascade: 0.9824 Parallel: 0.9828	(Major and Conrad 2017) : 0.72 - ICA (Shenoy, Vinod, and Guan 2015) : 0.82 - SR-FBCSP (Pinheiro et al. 2016) : 0.85 - SVM (Kim et al. 2016) : 0.80 - SUTCCSP (Zhang et al. 2017) : 0.79 - XGBoost (Bashivan et al. 2016) : 0.67 - R-CNN	DL & Trad.	No	No	No	A large-scale dataset of 108 participants on five categories is used to evalu- ate the proposed models. The experimental results show that both the cascade and parallel architectures could achieve very competitive accuracy around 98.3%, considerably superior to the state-of-the-art methods.	N/M	No	N/A	No	Yannick Roy	Isabela Albuquerque	TBC	Zhang2018c
9		A hierarchical LSTM model with attention for modeling EEG non-stationarity for human decision prediction	2018	Hasib, Nayak & Huang	IEEE EMBS International Conference on Biomedical & Health Informatics	No	Conference	University of Texas, San Antonio	USA	4	Classification of EEG signals	BCI	Active	Mental tasks	Improve SOTA	Novel Approach: H-LSTM with Attention for Decision Classification	Allow or Deny Access based on ID + Image (Guard)	No need for hand-engineered features	ActiveTwo (BioSemi)	Raw EEG	BCIT Guard Duty	Private	1782 of 5297 sequences selected: 892 Deny + 890 Allow 18 Subjects (10s windows)	1782	297	18	64	512		1) Downsampled to 128Hz 2) Band-Pass Filter: 0.1-55Hz	Yes	N/M	N/M	Raw EEG	Raw EEG	z-score	N/M	LSTM	RNN	Hierachical (from samples in first layer to epochs in second layer) Attention mechanism	First layer acts on samples Second layer acts on epochs	Yes	0.5s epochs	2	2	N/M	L2 weight decay	Yes			1 Allow / Deny	N/M	Standard optimization	Standard	Adam	Adam	N/M	N/M	N/M	N/M	N/M	Cross-Entropy	Inter	3-Fold CV	k-fold	Train: 60% Validation: 6% Test: 33%	ROC AUC	ROC AUC	N/M	N/M	H-LSTM (w/ Attention & 0.5s epochs): 82.6% H-LSTM (w/ Attention & 2.5s epochs): 81% H-LSTM (w/ Attention & 5s epochs): 81.6% H-LSTM (w/out Attention & 0.5s epochs): 80.3% H-LSTM (w/out Attention & 5s epochs): 73.7%	SVM: 65% CNN: 69%	DL & Trad.	No	No	No	"Using the attention mechanism does help enhance the discriminate features obtained from these epochs, although it does not help model the EEG non-stationarity" "Consistent with the observation from LSTM performance, we observed an increase of performance with shorter epoch length."	N/M	No	N/A	No	Yannick Roy	Hubert Banville	TBC	Hasib2018
10		Deep EEG super-resolution: Upsampling EEG spatial resolution with Generative Adversarial Networks	2018	Corley & Huang	IEEE EMBS International Conference on Biomedical & Health Informatics	No	Conference	University of Texas, San Antonio	USA	4	Generation of data	Generating EEG	Spatial upsampling		Novel Approach: GAN for EEG Upsampling.		BCI Competition III - Dataset V	GANs previous great results on image super-resolution	N/M	N/A	BCI Competition III - V	Public	(BCI Comp. III - V) 1,096,192 samples from 3 subjects (Windows of 1s, 480/512 overlap)	36397	35.7	3	32	512		1) Downsampling in the number of channels (from 32 to 16)	Yes	No	No	None	Raw EEG	z-score	N/M	WGAN	GAN	N/M	Convolutional layers with kernel dimensions that find the relationships between channels	Yes	32 x 512 (channels x samples)	Gen: 6 Conv Layers Discrim: 4 Conv Layers + 1 FC	6	ELU	Dropout (0.1 - 0.25)	Yes			32 x SR (Channels x Super Resolved) (upsampled data)	N/M	Pre-trained Gen fine-tuned w/ WGAN framework losses w/ gradient penalty weight of 10. Also, label smoothing technique	Other	Adam	Adam	a=10^-4, b1=0.5, b2=0.9	64	N/M	N/M	N/M	Gen: MSE Discrim: Distance	Inter	Holdout	Holdout	Train: 75% Valid: 20% Test: 5%	MSE MAE (mean absolute error) Accuracy, precision and recall (for classification task)	MSE, MAE, accuracy, precision, recall	N/M	N/M	[Scale 2 - Test] MSE: 2.06E3 \| MAE: 24.66 [Scale 4 - Test] MSE: 8.68E3 \| MAE: 64.39 ~10^4 fold (MSE) and ~10^2 fold (MAE) compared to Bicubic Interpolation	Bicubic Interpolated Channel Data	Traditional pipeline	No	No	No	"Feature scaling techniques besides standard normalization decreased model performance. [...] It was notably difficult and time-consuming to train GANs for EEG data. [...] After testing different variants of GAN: WGAN appeared to be more stable during training."	"It was notably difficult and time-consuming to train GANs for EEG data"	No	N/A	No	Yannick Roy	Isabela Albuquerque	TBC	Corley2018
11		Spatial and Time Domain Feature of ERP Speller System Extracted via Convolutional Neural Network	2018	Yoon, Lee & Whang	Computational Intelligence and Neuroscience	No	Journal	Duke University Sangmyung University	USA	12	Classification of EEG signals	BCI	Reactive	ERP	Alleviate BCI illiteracy	Reduce BCI illiteracy in P300 spellers by using CNNs	Rapid Serial Visual Presentation (P300 speller)	Uncover new unknown spatial/temporal patterns. When an optimal filter is applied, the convolution will magnify the feature of interest and reduce the others [25].	B-Alert X10 (ABM)	P300 and oddball paradigm-related EEG activity	Internal Recordings	Private	33 subjects, 2 to 4 pairs of sessions (offline + online) 12 trials / session (each trial = 10s + ERP stimuli) 20 times x 6 icons / trial (300ms) 33x3x12x6x20 = 142,560 samples	142560	712	33	11	256	Both	N/M	N/M	N/M	N/M	Raw EEG	Raw EEG	N/A	TensorFlow Python	CNN	CNN	-	Layer 1: spatial correlation Layer 2: temporal filter	Yes	14 x 300 (channels x samples)	CNN: 2 FC: 2	4	ReLU	Dropout (0.1 - 0.25)	Yes	6	6 different icons (Power On/Off, Volume Up/Down, Channel Up/Down)	2	N/M	Standard optimization	Standard	Adam	Adam	N/M	N/M	N/M	N/M	N/M	N/M	Intra	No	No	Train: 50% (offline) Test: 50% (online)	Accuracy, sensitivity, precision, F1 score, ROC (+ ANOVA on metrics)	accuracy, sensitivity, precision, f1-score, ROC	N/M	N/M	Accuracy: 88.9 for high performing group, 68,7% for low performing group	No benchmark	None	ANOVA	Visualization of weights	Analysis of weights	A P300 is not visible in all subjects, but there seems to be a P700 that is pretty consistent across subjects. Spatial features seem to play a more important role than temporal features in the classification of an oddball task.	-	No	N/A	No	Hubert Banville	Yannick Roy	Yes	Yoon2018
12	??	Spectrographic Seizure Detection Using Deep Learning With Convolutional Neural Networks	2018	Yan, Wang & Grinspan	Neurology	No	Supplement	Well Cornell Medical College New York	USA		Classification of EEG signals	Clinical	Epilepsy	Detection	Improve State-of-the-Art: Using CNN on Spectrogram for Seizure Detection		Existing dataset, no mention of any task. (Supposed: Resting Sate)		N/M	Raw EEG (Seizure)	CHB-MIT	Public	130 EEGs with 177 total seizures, and 549 EEGs without seizures. >90% of seizures were <2 minutes long. 130 EEGs with seizure and 130 randomly selected EEGs without seizures were converted to the median power spectrogram (MPS). The training set consisted of 16,992 seizure containing images and 16,992 images without seizures (80% of total images). The testing set contained 4,248 seizure containing images and 4,248 images without seizures (20% of total images).	33984	N/M	N/M	-1	N/M		N/M*	N/M	N/M	N/M	Medium Power Spectrogram (MPS)	Frequency-domain		N/M*	CNN (4 variants of VGG16)	CNN	N/M*		No	Images (1s sliding window of MPS)	16	N/M		Dropout: 0.5	Yes			N/M*	N/M*	N/M	N/M	N/M	N/M				N/M	No	N/M*	Inter	N/M	No	Train: 80% Test: 20%	Sensitivity Specificity	sensitivity, specificity		N/M*	All four CNN variants achieved >98% sensitivity and specificity	N/M*	None	No	No	No	"Convolutional neural nets can achieve high sensitivity and specificity in detecting seizures within spectrograms. However, generalizability and overfitting remains a concern. Further evaluation with more diverse data sets, images grouped by individual seizures, and additional regularization techniques is warranted."		No	N/A	TBD	Yannick Roy	TBR	TBC	Yan2018
13		Generating target / non-target images of an RSVP experiment from brain signals in by conditional generative adversarial network	2018	Lee & Huang	IEEE EMBS International Conference on Biomedical & Health Informatics	No	Conference	University of Texas, San Antonio	USA	4	Generation of data	Generating images conditioned on EEG			Novel Approach: generating images confitioned on EEG	Using EEG from RSVP to generate images (target or non-target)	RSVP - 5 Images/s	GAN models.	ActiveTwo (BioSemi)	RSVP	Internal Recordings	Private	10 subjects, 5 sessions (~1h /session), 880 Epochs (1s windows)	880	14.6	10	32	-1	Offline	- PREP Pipeline (EEGLAB): bandpass (0.1-55 Hz), robust referencing, interpolating bad channels - Downsampled to 32Hz - Subset of 32 channels (visual cortex)	Yes	Yes	Yes	Raw EEG	Raw EEG	z-score	EEGLAB	cGAN	GAN	It's to generate the image, not the EEG data. Based on DCGAN	N/A	Yes	32 x 32 (channels x samples)	Generator: 4 Discriminator: 4	4	G: Leaky ReLU D: ReLU	N/M	N/M	N/A	N/A	64 x 64 (image)	N/M	GAN Style.	Standard	N/M	N/M	N/M	16	N/M	N/M	N/M	N/M	Inter	No	No	Train: 704 epochs Test: 176 epochs	Visual inspection (making sure generated image is of the right class)	visual inspection	N/M	2-3h	Accuracy: 0.625	None	None	No	Occlusion of input EEG and visualization of generated image	Occlusion of input	We demonstrated the performance of the proposed cGAN model and showed that generation with raw or normalized EEG produced better performance than that with added noise. We also showed how this model could be used for investigating the EEG and image associations.	N/M	No	N/A	No	Yannick Roy	Hubert Banville	TBC	Lee2018
14		Cross-Participant EEG-Based Assessment of Cognitive Workload Using Multi-Path Convolutional Recurrent Neural Networks	2018	Hefron, Borghetti, Schubert Kabban, Christensen & Estepp	Sensors	No	Journal	Air Force Institute of Technology (Ohio)	USA	27	Classification of EEG signals	Monitoring	Cognitive	Mental workload	Novel Approach: Using a Multi-Path Convolutional Recurrent Neural Network (MPCRNN) to improve SOTA on cross-participant classification of cognitive workload	Tackle cross-subs variability in cognitive workload assessment	Multi-Attribute Task Battery (MATB) environment	Assumptions regarding brain activity are better matched by a deep representation that includes multi-path connections.	ActiveTwo (BioSemi)	None	Internal Recordings	Private	8 subjects * 4 blocks * 6 conditions * 5 min (1s windows)	57600	960	8	128	4096		1) Trimmed to 303s trials 2) Downsampled to 512Hz 3) Down-selected 64 electrodes 4) PREP Pipeline to identify and interpolate bad channels, calculate a robust average reference, and remove line noise 5) High-Pass Filter: 1Hz 6) PSD 3-55Hz (2s Hanning-Windowed STFT, 1s overlap)	Yes	Yes (manual identification of high-variance segments)	No	PSD - Frequency Bands (53)	Frequency-domain	[-1, 1]	Keras, TensorFlow	(multi-path, residual) CNN + (bi-directional, residual) LSTM	CNN+RNN	It combines a wide multi-path, residual, convolutional networkwith a bi-directional, residual LSTM.	1x1 convolutions to act as cross-channel parametric pooling	Yes	20x53x64 (time x frequency bands x channels)	[very deep, see schema / paper]	8	ReLU and sigmoid	Dropout + batch normalization + early stopping + L1 + L2	Yes			1 (Mental Workload)	MPCRNN: 6.2M	Standard.	Standard	Adam	Adam	LR: from 0.0001 to 0.000001	128	N/M	N/M	N/M	Binary cross-entropy	Inter	Test: Hold-out 1 Participant Training: 7-Fold Cross-Validation	k-fold; Holdout	Train: 6 participants Validation: 1 participant Test: 1 participant	Mean Accuracy	accuracy	N/M	N/M	between 80-86% (depending on sequence length used as input)	Simpler DL architectures	DL	ANOVA + post-hoc Tukey Honest Significant Difference tests	No	No	We found that while increasing sequence length improves model accuracy, it does not improve generalizability since cross-participant variance increases due to cross-participant distributional differences. Furthermore, longer sequences reduce temporal specificity which decreases a model’s utility in a real-time environment. The only condition among our experiments across sequence lengths, architectures, and training methods which resulted in improved accuracy and decreased cross-participant variance was the multi-path convolutional recurrent architecture.	N/M	No	N/A	Yes	Yannick Roy	Isabela Albuquerque	Yes	Hefron2018
15		Classification of auditory stimuli from EEG signals with a regulated recurrent neural network reservoir	2018	Moinnereau, Brienne, Brodeur, Rouat, Whittingstall & Plourde	Arxiv	Yes	Preprint	Université de Sherbrooke	Canada	5	Classification of EEG signals	BCI	Reactive	Heard speech decoding	Improve SOTA	Classify heard speech (vowels) from EEG	Auditory Stimuli + Imagined Speech	Can extract features automatically	BrainAmp (BrainProducts)	Raw EEG	Internal Recordings	Private	8 subjects x 3 stimuli x 200 times each (2s windows, onset at 0.5s) (preprocessing removed 30%!)	4800	9600	8	64	N/M	Offline	1) Pass-Band Filter: 0.1-45Hz 2) Re-sampled at 500Hz 3) Windows of 2s (stimulus at 0.5s) 4) Trials with Amplitude > +-75uV rejected 5) Re-reference to local average	Yes	Yes (amplitude thresholding)	Yes	Spike Train from Ben’s Spike Algorithm (BSA)	Other	N/M	Python	RNN Reservoir	RNN	The reservoir comprises 512 neurons placed in a three-dimensional grid where 80% are excitatory and 20% are inhibitory neurons	N/M	Yes	Spike Trains per channel	N/A	N/M	Leaky Integrate-and-Fire	N/M	N/M	3	"a", "i", "u"	1	N/M	Reservoir: unsupervised tuning Classifier: linear regression	Standard	N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	Intra	5-Fold CV	k-fold	Train: 4/5 Test: 1/5	Accuracy	accuracy	N/M	N/M	83.2% (64 electrodes) 1 Electrode: 57.3% 3 Electrodes: 71.4% 10 Electrodes: 81.7% [Chance: 33%]	CNN (3 conv layers of 64 filters)	DL	No	No	No	"It’s hard to compare these results with the previ- ous work where many different experimental conditions (e.g. different type and number of stimuli) and preprocessing has been used. However, we show here that excellent classifica- tion results can be obtained with minimal preprocessing of the EEGs."	N/M	No	N/A	No	Yannick Roy	Hubert Banville	Yes	Moinnereau2018
16		Deep learning for detection of epileptiform discharges from scalp EEG recordings	2018	van Putten, de Carvalho, Tjepkema-Cloostermans	Clinical Neurophysiology	No	Journal	University of Twente	Netherlands	6	Classification of EEG signals	Clinical	Epilepsy	Detection	Improve SOTA	Use CNN and/or LSTM to classify yes/no discharges	Pre-Recorded EEG, no mention of any task. (Supposed: Resting Sate and sleep)	DL is promising novel approach and is able to learn from large data-sets	N/M	Raw EEG	TBD	Private	Training (41,381 epochs), Test (8775 epochs) For validation we used 7 EEGs (47,122 epochs) with 538 focal epileptiform discharges and 12 normal EEGs (n = 11,782 epochs). (2s windows, no overlap)	97278	3242.6	N/M	19	125	Offline	1) Band-Pass Filter: 0.5-35Hz 2) Re-referenced to both a longitudinal bipolar montage and a source Laplacian	Yes	No	No	Raw EEG	Raw EEG	N/M	Keras	CNN RNN	CNN+RNN	Multiple designs	N/M	Yes	19 (channels) x 250 (2s)	CNN: 4-9 Layers LSTM: 50-100 Units Both w/ 1-3 FC Layers	12	N/M	Dropout (20-50%)	Yes	2	Normal IED (discharge)	1 (prob [0,1] of discharge)	9142859	Standard	Standard	Adam	Adam	LR:3e-3 Beta1: 0.91 Beta2: 0.999 Epsilon: 1e-8	N/M	N/M	N/M	N/M	Categorical Cross-Entropy	Inter	N/M	No	Train: 41,381 Valid: 58,904 Test: 8,775	ROC AUC Sensitivity Specificity	ROC AUC, sensitivity, specificity	NVidia GTX 1080	2h	AUC: 0.94 Sensitivity: 0.73 Specificity: 1	None	None	No	No	No	"We foresee that deep nets may outperform humans both in classification accuracy and speed, leading to a fundamental shift in clinical EEG analysis in the next decade."	N/M	No	N/A	No	Yannick Roy	Isabela Albuquerque	TBC	VanPutten2018a
17		Cognitive Analysis of Working Memory Load from EEG, by a Deep Recurrent Neural Network	2018	Kuanar, Athitsos, Pradhan, Mishra & Rao	ICASSP	No	Conference	University of Texas, Arlington	USA	5	Classification of EEG signals	Monitoring	Cognitive	Mental workload	Improve State-of-the-Art: Using RNN to measure levels of cognitive load.	Extract features less sensitive to variations along each spatial dimension	Working memory / workload experiment. (showing a set of letters and then showing a letter asking if the letter was part of the set) Sets of 4,6,8,10 letters corresponding to mental workload 1,2,3,4.	ConvNets have demonstrated the ability to extract features that are invariant to changes in input patterns	Neurofax EEG-1200 (Nihon Kohden)	PSD	NIMHANS	Private	6490 samples, from 22 subjects Each trial of 4.5s sliced into 0.5s and an image was constructed over each time slice.	58410	486.75	22	64	256		1) From 4.5s Windows to 9 Windows of 0.5s	Yes	No	No	192 Features: 64 chan x 3 bands Theta (4-7Hz), Alpha (8-13Hz), Beta (13-30Hz) (FFT) Converted into images. (32x32) 3D electrodes spatial to 2D.	Frequency-domain	N/M	Theano, Python	CNN + BiLSTM	CNN+RNN	Transforming channels and frequency bands into images of 0.5s windows, fed to an Hybrid CNN+BiLSTM	N/M	Yes	EEG Images 32x32 (0.5s windows) (mixing 3 freq bands + 64 channels)	9 Conv Layers + 1 FC + 2 LSTM Layers of 64 units + 1 FC	13	Sigmoid	Dropout (0.5) + L2 (0.0001)	Yes			4 Classes (Softmax)	1.66 Mil	Standard	Standard	Adam	Adam	LR: 10^-4 Beta1: 0.9 Beta2: 0.99	30	N/M	N/M	Gaussian noise (on image)	Cross-Entropy	Inter	Leave-One-Subject-Out	Leave-One-Subject-Out	N/M	Accurary	accuracy	NVidia K40	18h	92.50%	SVM, Logistic Regression, Random Forest	Traditional pipeline	No	No	No	"Our implementation was different from the previous attempts and learned the robust representations from EEG image sequences using a ConvNet and BiLSTM hybrid network. Our proposed hybrid network demonstrated the significant improvements in finding better classification accuracy i.e. up to 92.5% over various existing LSTM models."	N/M	Yes	Website	No	Yannick Roy	Isabela Albuquerque	TBC	Kuanar2018
18		A Deep Learning Approach with an Attention Mechanism for Automatic Sleep Stage Classification	2018	Längkvist & Loutfi	Arxiv	Yes	Preprint	Orebro University	Sweden	18	Classification of EEG signals	Clinical	Sleep	Staging	New approach: "Explore the advantages of using a model qith selective attention applied to automatic sleep staging"	Learn representations of sleep EEG with attention	Sleep (PSG)	Learn better features	N/M	Sleep	UCDDB	Public	25 x 6-9 hours (est. at 7.5h) (30s windows, no overlap)	22500	11250	25	1	128	Offline	1) Notch Filter: 50Hz 2) Down-Sampled to 64Hz 3) Band-Pass Filter: 0.3-32Hz	Yes	N/M	N/M	28 Features: Relative Power: Delta (0.5 − 4Hz), Theta (4−8Hz), Alpha (8−13Hz), Beta (13−20Hz), & Gamma (20−32Hz), Entropy, Kurtosis, and Spectral Mean of all signals and fractal component of EEG. [+ EOG & EMG features]	Frequency-domain	z-score	N/M	SAE	AE	Attention Mechanism (static & adaptive approaches)	N/M	No	28	1	1	Sigmoid	L2 normalization KL term in cost function for sparsity	Yes	5	SWS, S2, S1, REM, Awake	5 Classes (Softmax)	N/M	1) Training AE 2) Training softmax layer on learned features	Pre-training	SGD with momentum	SGD	Momentum: 0.9 LR decay: 0.01	30	Random grid search	Yes	N/M	MSE	Inter	5-Fold CV	k-fold	Train: 60% Valid: 20% Test: 20%	Accurary	accuracy	N/M	2-3h	60-90% on each of the 5 classes.	DBN SAE (standard a) SAE (fixed a) SAE (adaptive a)	DL	No	Visualization of attention mechanism weights	Analysis of weights	"[...] Many of the used features try to capture the most relevant information for the current sleep stage and therefore mimic the standard Rechtschaffen and Kales (R&K) system [38, 18, 17] that is manually used by sleep technicians."	Unsupervised learning treats all features equally; that's why attention mechanism is useful	No	N/A	No	Yannick Roy	Hubert Banville	TBC	Langkvist2018
19		On the Classification of SSVEP-Based Dry-EEG Signals via Convolutional Neural Networks	2018	Aznan, Bonner, Connolly, Moubayed & Breckon	Arxiv	Yes	Preprint	Durham University	UK	6	Classification of EEG signals	BCI	Reactive	SSVEP	Improve State-of-the-Art	Apply CNN to SSVEP with dry EEG headset	SSVEP	Want an end-to-end system (no need to extract features)	Quick-20 (Cognionics)	SSVEP	Internal Recordings	Private	4 subjects, 4 classes 640 trials total (160 per class) x 3s	640	32	4	20	500	Offline	None	No	No	No	None	Raw EEG	N/A	Pytorch	CNN	CNN	-	Layer 1: temporal filter	Yes	N/M	2 (Tried 7 in the end)	2	ReLU	L2 normalization Dropout (50%)	Yes			4	N/M	Standard optimization	Standard	Adam	Adam	N/M	32	Grid search	Yes	No	Categorical cross-entropy	Both	10-Fold CV and Leave-One-Subject-Out	k-fold; Leave-One-Subject-Out	N/M	Accuracy	accuracy	Nvidia GTX 1060	4 min	Subject 1 - all data: 96% Subject 1-3 (individually, only 20 trials each): mean of 89% Across-subjects: 78% Leave-one-subject:out:59%	Traditional feature-based pipeline (Riemannian Geometry + classifier) RNN, LSTM, GRU	DL & Trad.	No	No	No	Repeating the convolutional layer block increased accuracy on the held-out subject.	N/M	No	N/A	Yes	Hubert Banville	Isabela Albuquerque	TBC	Aznan2018
20		A Long Short-Term Memory deep learning network for the prediction of epileptic seizures using EEG signals	2018	Tsiouris, Pezoulas, Zervakis, Konitsiotis, Koutsouris & Fotiadis	Computers in Biology and Medicine	No	Journal	National Technical University of Athens	Greece	14	Classification of EEG signals	Clinical	Epilepsy	Prediction	New Approach: Using LSTM for seizure detection. (claiming they are the first ones but they are not, so its a Improve SOTA)	Apply LSTM for seizure detection on CHB-MIT	Resting State, Eyes Open, Eyes Closed, Seizures.	Expand from CNN to LSTM. (they claimed to be first, but they are not...)	N/M	Seizures	CHB-MIT	Public	983h, 185 seizures (5s windows, no overlap)	707760	58980	23	23	256		1) Selecting only channels that are stable across recordings (for cross-validation) 2) Kept 18 channels.	Yes	No	No	Time Domain: the 4 Statistical Moments, Standard Dev, Zero Crossings, Peak-to-peak Voltage, Total signal area, decorrelation time. Frequency Domain: FFT (PSD), DWT. Cross-Correlation: Max absolute coefficient. Graph Theory: Local & Global measures. (all on 5s windows)	Combination	N/M	Keras Tensorflow Python 3.6	LSTM	RNN	-	LSTM Length: predicting seizures from 15 min before onset to 120 min before onset	Yes	Features x EEG Segments 643x[5-50]	LSTM_1: 1 (32 HU) LSTM_2: 1 (128 HU) LSTM_3: 2 (128/128 HU) + 1 FC (30)	3	ReLU	Dropout (finally discarded, because the shuffling of data seems to be enough)	Yes			2 (Softmax, 1 hot encoded: preictal or interictal)	N/M	By shuffling the EEG segments that are used as input, the LSTM network is forced to learn more generic preictal patterns as each sequence consists of random, non-adjacent preictal segments that not only come from various locations with different time distances from the actual seizure onset, but also from the preictal activity of different seizures.	Standard	Adam	Adam	LR: 0.001 B1: 0.9 B2: 0.999 Decay: 0	10	Manually trying 3 different configurations	Yes	Splitted minority class in smaller subgroups to balance classes	Cross-Entropy	Both	10-Fold CV	k-fold	Eval: 3/24 Train: N/M (assuming 21/24)	Sensitivity (SEN) Specificity (SPEC) False Prediction Rate (FPR) Preictal Window	sensitivity, specificity, false prediction rate	N/M (they seem to say CPU)	N/M	[Segments] SEN, SPEC \| [Events] SEN, FPR 15-min Preictal Window: 99.28, 99.28 \| 100, 0.107 30-min Preictal Window: 99.37, 99.60 \| 100, 0.063 60-min Preictal Window: 99.63, 99.78 \| 100, 0.032 120-min Preictal Window: 99.84, 99.86 \| 100, 0.02	SVM Decision Trees Repeated Incremental Pruning to Produce Error Reduction (RIPPER) (LSTM outperforms all of them on all subjects)	Traditional pipeline	No	No	No	In theory, better EEG signal representation could be learned if the size of LSTM network was substantially increased, by adding more layers and memory units, to compensate for the increased input size of directly providing the EEG signals. However, the computational cost of training larger LSTM networks increases rapidly requiring more training time or using arrays of GPUs. Even if computational cost was not a problem, this approach would require even more EEG data to effectively train the millions of network parameters.	1) Overal amount of Data 2) Number of Seizures	No	N/A	Yes	Yannick Roy	Isabela	TBC	Tsiouris2018
21		Joint Classification and Prediction CNN Framework for Automatic Sleep Stage Classification	2018	Phan, Andreotti, Cooray, Chén & De Vos	Arxiv	Yes	Preprint	University of Oxford	UK	11	Classification of EEG signals	Clinical	Sleep	Staging	Improve State-of-the-Art (~New task: predict neighboring classes too)	Use one-to-many approach with a multi-task softmax to leverage neighboring data to predict sleep stage	Sleep	Re-using their previous network. (H. Phan et al., 2018)	N/M	Sleep events	MASS	Public	228,870 epochs x 30s from 200 subjects	228870	114435	200	1	100	Offline	1) Convert 20s epochs in 30s epochs (+5s before + 5s after)	Yes	N/M	N/M	Spectrogram (STFT) Hamming window 2s + 50% overlap Log Spectrum	Frequency-domain	N/M*	Tensorflow	CNN	CNN	Conv-Pool-Softmax	Layer 0: filter bank on spectrogram	Yes	129 x 29 x {1, 2, 3} Bins x time, x channels 30-s epochs	1 (1xCNN +Pooling +Softmax)	1	ReLU	L2 Dropout (20%)	Yes	5	Wake N1, N2, N3 REM	5 x (1 + 2 * nb of neighbouring windows)	N/M*	Standard optimization	Standard	Adam	Adam	LR: 0.0001	200	N/M	N/M	Randomly selected batch with balanced classes	Categorical cross-entropy	Inter	Leave-10-Subjects-Out (20-Fold CV)	Leave-N-Subjects-Out	Train: 180 subjects Valid: 10 subjects Test: 10 subjects	Accuracy, Kappa, Specificity, Sensitivity, F1-score	accuracy, Cohen's kappa, specificity, sensitivity, f1-score	N/M*	1.36 hours	Multimodal acc.: 83.6 %	One-to-one and Many-to-one with same architecture and with a different ConvNet architecture without l-max pooling DeepCNN DeepSleepNet	DL	No	No	No	Increasing the number of filters in the Conv layer doesn't impact the performance much Adding other modalities (EOG, EMG) lead to significant improvements A context size larger than 3 leads to performance degradation Using recurrent layers might help	No	No	N/A	No	Hubert Banville	Yannick	Yes	Phan2018
22		Deep Convolution Neural Network and Autoencoders-Based Unsupervised Feature Learning of EEG Signals	2018	Wen & Zhang	IEEE Access	No	Journal	Xiamen University	China	12	Improvement of processing tools	Feature learning			Improve SOTA	Learn features for epilepsy detection using unsupervised learning	Resting State, Eyes Open, Eyes Closed, Seizures.	Learn features automatically	N/M	Raw EEG	Bonn University; CHB-MIT	Public	DS #1 - Bonn University (A,B,C,D,E) 5x 100 epochs of 23.6s DS #2 - CHB-MIT (first 10 subjects) 200 + 200 examples of 4096 (@ 256Hz = 16s)	500; 400	197; 106.6	10; 10	1	173.61; 256	Offline	1) Common average reference 2) Bandpass 0.53-40 Hz	Yes	N/M	N/M	1) Chose single channel with the most variance	Raw EEG	min-max	Scikit-learn Python	Convolutional AE	AE	Various (tried multiple classifiers on top of the encoder)	-	Yes	4096 x 1	9 [YR: not sure how HJB got that 9]	9	ReLU	N/M	N/M	2	Seizure No Seizure (not explicit)	4096 x 1	N/M	1) Training AE 2) Training standard classifier on learned features	Pre-training	Adam	Adam	N/M	N/M	N/M	N/M	N/M	Mean absolute error divided by input mean amplitude	Both	5 and 10 -Fold CV	k-fold	N/M	Accuracy	accuracy	N/M	N/M	No aggregate is reported... (see paper, they report results per subject and per classifier)	PCA Random projection	Traditional pipeline	No	No	No	Less than 4 hidden units on the bottleneck layer led to a drop in accuracy as compared to standard dimensionality reduction techniques. Their approach is flexible to new datasets...	"It is very difficult to train multiple hidden layers [...]"	No	N/A	No	Hubert Banville	Yannick	TBC	Wen2018
23		Deep learning with convolutional neural networks for decoding and visualization of EEG pathology	2018	Schirrmeister, Gemein, Eggensperger, Hutter & Ball	Arxiv	Yes	Preprint	University of Freiburg	Germany	7	Classification of EEG signals	Clinical	Pathological EEG		Improve SOTA Feature visualization/interpretability	End-to-end detection of abnormal EEG	N/M	Automated EEG diagnosis	N/M	Raw EEG	TUH Abnormal EEG Corpus	Public	TUH Abnormal Corpus 2740 + 277 = 3017 (x 16min) (they explored using [1, 16] min) (6s windows)	482720	48272	2132	21	250		1) Select 21 electrodes common to all subjects 2) Remove 1st minute 3) Crop to recording to up to 20 minutes 4) Clip amplitude to +-800uV 5) Resample to 100Hz	Yes	N/M	N/M	Raw EEG	Raw EEG	N/M	Pytorch	CNN	CNN	Tried two architectures: shallow and deep CNNs	Shallow CNN tailored to decode band powers	Yes	600 x 21	Deep: 5 conv layers Shallow: 1 conv layer	5	ELU	N/M	N/M			2	N/M	Standard optimization	Standard	Adam	Adam	Used SMAC	N/M	SMAC	Yes	N/M	Binary cross-entropy	Inter	10-Fold CV	k-fold	Train: 5480 (~90%) Test: 554 (~10%)	Accuracy, Sensitivity, and Specificity	accuracy, sensitivity, specificity	N/M	< 3.5 h	Accuracy: 85.4% (deep), 84.5% (shallow) Sensitivity: 75.1% (deep), 77.3% (shallow) Specificity: 94.1% (deep), 90..5% (shallow)	CNN and linear model with band-power features as input	DL & Trad.	Wilcoxon signed-rank test	Effect of spectral perturbations of the input on the resulting prediction	Input-perturbation network-prediction correlation maps	Perturbation visualizations showed that the CNNs used information related to changes in delta and theta bands. Suprisingly, shorter length EEG recordings yielded better accuracies.	"Still, to yield more clinically useful insights and diagnosis explanations, further improvements in ConvNet visualizations are needed."	Yes	GitHub	No	Isabela Albuquerque	Hubert Banville	TBC	Schirrmeister2017a
24		Predicting sex from brain rhythms with deep learning	2018	van Putten, Olbrich & Arns	Scientific Reports (Nature)	No	Journal	University of Twente	Netherlands	7	Classification of EEG signals	Personal trait/attribute	Sex		New Approach: Detecting Sex from RS EEG with DL (CNN)	Predicting an individual's sex from their EEG	Resting State EEG.	No need for engineered features, and "have potential to detect subtle differences in otherwise similar patterns".	N/M	Raw EEG	Brain Resource Int'l Database	Public	1308 subjects x 40 segments x 2s (2s windows, no overlap)	52320	1744	1308	24	128		1) Downsampled to 128Hz (from 500Hz) 2) Band-Pass Filter: 0.5-25Hz	Yes	EOG regression	Yes	Raw EEG	Raw EEG	N/M	Windows 10 Keras, Tensorflow Python 3.6	CNN	CNN	None	N/M	Yes	256 x 24 (Samples x Channels) 2s epoch	6	6	ReLU	Dropout	Yes			1 0: Female \| 1: Male (2 from schema)	9,051,902	Standard optimization	Standard	Adamax	Other	LR=0.002, B1=0.9, B2=0.999, e=10^8, decay=0.00	70	N/M	N/M	N/M	Categorical Cross-Entropy	Inter	No	No	Train: 1000 subjects Test: 308 subjects	Accuracy	accuracy	NVidia GTX-1060	N/M	81% (of correct classification over all subjects)	LR	Traditional pipeline	Permutation test	Visualization of learned filters through Deep Dream-like backprop on inputs	Generating input to maximize activation	While not all details of the features used for classification by the deep net have been revealed, our data show that differences in brain rhythms between sexes are mainly in the beta frequency range.	N/M	No	N/A	No	Yannick Roy	Hubert Banville	TBC	VanPutten2018b
25		Deep learning with EEG spectrograms in rapid eye movement behavior disorder	2018	Ruffini, Ibanez, Castellano, Dubreuil, Gagnon, Montplaisir & Soria-Frisch	BioarXiv	Yes	Preprint	NeuroElectrics University of Montreal	Canada	10	Classification of EEG signals	Clinical	Sleep	Abnormality detection	New Approach	Using DCNN for Rapid Eye Movement Behavior Disorder	Resting State EEG.	Exploiting compositional structure in data	N/M	Raw EEG	Internal Recordings	Private	(118 + 74) = 192 subjects 148 windows of 1s per subject (1s windows)	28416	473.6	192	14	256	Offline	1) Band-Pass Filter: 0.3 and 100 Hz [Hardware] 2) Notch Filter: 60Hz [Hardware] ((FFT) after detrending blocks of 1 second with a Hann window (FFT resolution is 2 Hz))	Yes	N/M	N/M	Spectrogram Frames	Frequency-domain	z-score	Tensorflow	DCNN	CNN	Conv-Pooling-Dropout	N/M	Yes	14 x 21 x 20 Channels x FFTBins x Epochs	5	5	ReLU	Dropout	Yes	2	Parkinson's disease Healthy	2	N/M*	Standard optimization	Standard	N/M	N/M	N/M	N/M	N/M	N/M	Random replication of subjects in the minority class	Cross-Entropy	Inter	Leave-Pair-Out (one subject for each class)	Leave-One-Subject-Out	N/M	Accuracy ROC AUC	accuracy, ROC AUC	N/M	N/M*	Net: Problem [ N ] ACC (AUC) DCNN: HC vs PD [2x73 / 2x1] 79% (87%) RNN: HC vs PD [2x73 / 2x1] 81% (87%) DCNN: HC+RBD vs PD+DLB [2x159 / 2x1] 73% (78%) RNN: HC+RBD vs PD+DLB [2x159 / 2x1] 72% (77%)	Stacked RNN Shallow CNN	DL	No	Maximizing network outputs for a given class	Generating input to maximize activation	Although here, as in [28], we worked with time-frequency pre-processed data, the field will undoubt- edly steer towards working with raw data in the future when larger datasets become available—as suggested in [21]	"We note that one of the potential issues with our dataset is the presence of healthy controls without follow up, which may be a confound. \We hope to remedy this by enlarging our database and by improving our diagnosis and follow up methodologies"	No	N/A	Yes	Yannick Roy	Isabela Albuquerque	Yes	Ruffini2018a
26		Deep transfer learning for error decoding from non-invasive EEG	2018	Völker, Schirrmeister, Fiered, Burgard & Ball	IEEE International Conference on Brain-Computer Interface	Yes	Conference	University of Freiburg	Germany	6	Classification of EEG signals	BCI	Reactive	ERP	New approach: Exploring Transfer Learning for BCI.	Using CNN on 2 different BCI tasks, can it generalize? Transfer Learning across subjects and across tasks	1) Eriksen Flanker Task 2) Online GUI to control intelligent robots	Enables transfer learning	N/M	1) Error 2) Mental tasks (MI)	Internal Recordings	Private	1) 1000 trials x 1.5s x 31 subjects 2) (3032 +/- 818) x 4 x 1.5s 1.5s / epoch (onset at 0.5s)	31000; 12128	775; 303.2	31; 4	128; 64	N/M		1) Re-referenced to Common Average (CAR) 2) Resampled to 250Hz	Yes	N/M	N/M	Raw EEG	Raw EEG	Electrode-wise exponential running standardization	Python BrainDecode Scikit-learn	CNN	CNN	N/M	N/M (see BrainDecode paper)	No	N/M (Raw EEG windows)	N/M (see BrainDecode paper)	N/M	N/M	N/M (see BrainDecode paper)	N/M			N/M	N/M	N/M (See braindecode paper)	N/M	N/M (see BrainDecode paper)	N/M	N/M	N/M	N/M	N/M	N/M	N/M (see BrainDecode paper)	Both	Withing-Sub: Leave-One-Session-Out CV Between-Sub: Leave-One-Subject-Out CV	Leave-One-Session-Out; Leave-One-Subject-Out	Within-Sub Train: 80% Within-Sub Test: 20% Between-Sub Train: N-1 Sub. Between-Sub Test: 1 Sub.	Normalized Accuracy	normalized accuracy	N/M	N/M	Between-Subject Transfer Learning Flanker Task: 81.7% Normalized Accuracy GUI Robots Task: Poor results, because only 4 subjects. Between-Paradigms Transfer Learning Both failed. ~50%	rLDA (CNN outperforms rLDA) Also, best result ever reported on the Error Detection on Flanker Task*	Traditional pipeline	Paired t-tests	Input-perturbation network-prediction correlation maps	Input-perturbation network-prediction correlation maps	(1) As a next step, techniques including data augmentation and automated hyper-parameter and architecture search might help to improve the generalization of deep ConvNets. (2) For a generalization to new subjects, our data suggest that a training subject group of at least 15 subjects might be necessary for reliable error decoding on unknown subjects. (3) In the flanker task, our deep ConvNets achieved the highest to date reported average accuracy.	N/M	No	N/A	Yes	Yannick Roy	Hubert Banville	TBC	Volker2018
27		DeepIED: An epileptic discharge detector for EEG-fMRI based on deep learning	2018	Hao, Khoo, von Ellenrieder, Zazubovits & Gotman	NeuroImage: Clinical	No	Journal	McGill University, Osaka University	Canada	14	Classification of EEG signals	Clinical	Epilepsy	Detection	Improve SOTA	Detect interictal epileptic discharges in noisy EEG data collected during an fMRI recording	Resting state EEG - Seizures.	Reduce the amount of time it takes to manually label interictal epileptic discharges	BrainAmp (BrainProducts)	Raw EEG	Internal Recordings	Private	67 patients (148 studies) Average study time: 50 min (range, 18–72 min) (~1s windows)	201000	7400	67	25	200	Offline	1) Bandpass 0.5-50 Hz 2) fMRI-induced artefact removal 3) Electrode-wise exponential running standardization [6] was applied with a decay factor of 0.999 4) BCG artifact removal (ballistocardiographic)	Yes	N/M	N/M	Raw EEG	Raw EEG	N/M	N/M	CNN (ResNet)	CNN	-	-	Yes	25 x [16 to 256]	31	31	ReLU	Dropout on penultimate layer (50%)	Yes	N/M (different EID types)	EIDs	128 (FC) going to softmax and triplet (real output N/M)	999,920	Standard optimization	Standard	N/M	N/M	N/M	N/M	N/M	N/M	N/M	Softmax for multi-class classification Triplet loss function	Inter	No	No	Train: 30 subjects Test: 37 subjects	ROC curves Sensitivity False positive rate	ROC, sensitivity, false positive rate	N/M	N/M	Median sensitivity: 84.2% False positive rate: 5 events/min	Cross-correlation (template-based) method for finding similar EEG epochs	Traditional pipeline	One-Way Anova + Post Hoc paired t-test	No	No	In their tests, they asked experts to edit the outputs of the net and reject false positives; they argue that it's a necessary step and that it is not too time-consuming.	-	No	N/A	No	Hubert Banville	Yannick	TBC	Hao2018
28		Deep learning for hybrid EEG-fNIRS brain–computer interface: application to motor imagery classification	2018	Chiarelli, Croce, Merla & Zappasodi	Journal of Neural Engineering	No	Journal	G. d'Annunzio' University	Italy	12	Classification of EEG signals	BCI	Active	Motor imagery	Improve SOTA	Improving MI classification with DL in multimodal system	Motor Imagery	High performance on other tasks	(EGI)	ERD/ERS	Internal Recordings	Private	40 trials (C1: 20 / C2: 20) of 5s 200 samples x 15 subjects (1s windows)	3000	50	15	123	250	Offline	1) Bandpass 8-30 Hz	Yes	N/M	N/M	Power in the mu-beta range, averaged across 1-s	Frequency-domain	N/M	TensorFlow	Fully-connected NN	FC	N/M	N/M	Yes	123 x 1, 16 x1, or 139 x 1	5	5	ReLU	Dropout (0.75)	Yes	2	Right-hand MI, Left-hand MI	2	N/M	Standard optimization	Standard	Adam	Adam	LR=1e-4, B1=0.9, B2=0.999, constant=1e-8	90	N/M	N/M	N/M	Cross-Entropy	Intra	10-Fold CV (1000x)	k-fold	Train: 180 Test: 20	Accuracy	accuracy	N/M	N/M	EEG only: ~70%, NIRS only: ~77%, EEG+NIRS: ~83%	LDA, linear SVM	Traditional pipeline	2-way repeated measurement ANOVA + post-hoc analysis	No	No	DNN worked better than CNN, RNN not tested.	RNN was not tested	No	N/A	No	Hubert Banville	TBR	TBC	Chiarelli2018
29		Preference Classification Using Electroencephalography (EEG) and Deep Learning	2018	Teo, Hou & Mountstephens	Journal of Telecommunication, Electronic and Computer Engineering (JTEC)	No	Journal	University Malaysia Sabah	Malaysia	5	Classification of EEG signals	Monitoring	Affective	Emotion	Improve SOTA	Improving classification of preference (like vs. dislike), and overcoming intra- and inter-subject variability	Rating of 3D Stimulus (1: like very much, 2: like, 3: undecided, 4: do not like, 5: do not like at all)	N/M	B-Alert X10 (ABM)	Raw EEG	Internal Recordings	Private	208 trials: 9s + [5-15]s, from 16 subjects (full trial as windows) 10 other subjects were for kNN (not counted)	208	65.87	16	9	N/M	Offline	1) Notch Filter: 50Hz	Yes	Proprietary artefact rejection and interpolation	Yes	45 features (PSD for each channel) D (1-3Hz), T (4-6Hz), A (7-12Hz), B (13-30Hz), G (31-64Hz)	Frequency-domain	N/M	R	DNN	FC	N/M	N/M	No	47	2	2	ReLU	N/M	N/M	2	Like very much Do not like at all	N/M	N/M	Standard optimization	Standard	Adadelta	Other	N/M	N/M	N/M	N/M	N/M	Cross-Entropy	Inter	10-Fold CV	k-fold	N/M	Accuracy	accuracy	N/M	N/M	63.99%	SVM Linear: 60.19%, SVM Radial: 59.67%, OneR: 59.00%, Adaboost: 58.65%, Random Forest: 57.74%, NNet: 57.71%, JRip: 57.21%, Naive Bayes: 56.79%, C5.0: 56.74%, kNN (k = 5): 56.29%	Traditional pipeline	No	No	No	"An initial study using kNN provided sufficiently good results in a 10-subject study. However, when expanded to a larger cohort size of 16 subjects, the results were not encouraging. However, the use of deep learning was able to observably overcome some of the difficulties presented by inter-subject variability posed by larger cohort sizes in EEG-based preference classification."	Intersubject variability	No	N/A	No	Yannick Roy	Hubert Banville	Yes	Teo2018
30		An Automated System for Epilepsy Detection using EEG Brain Signals based on Deep Learning Approach	2018	Ullah, Hussain, Qazi &Aboalsamh	Arxiv	Yes	Preprint	National University of Ireland King Saud University	Ireland	18	Classification of EEG signals	Clinical	Epilepsy	Detection	Improve SOTA	Improving ternary classification of ictal vs. normal vs. interictal windows	Resting State, Eyes Open, Eyes Closed, Seizures.	Automatic feature learning	N/M	Raw EEG	Bonn University	Public	Bonn University (5 sets x 100 x 23.6s) Each 100 samples --> 800 windows (512 points windows, 6.25% overlap)	4000	197	15	1	173.6	Offline	N/M	N/M	N/M	N/M	Raw EEG	Raw EEG	z-score	TensorFlow	Pyramidal 1D-CNN (P-1D-CNN)	CNN	No pooling	1D convolution motivated by EEG being a "1D signal"	Yes	8 EEG windows Raw EEG (1 channel)	3 Conv + 2 FC	5	ReLU	Dropout (0.5) Batch norm	Yes	2 or 3	2: Epileptic vs. non-epileptic 3: normal vs. ictal vs. interictal	2 or 3 Classes (Softmax)	N/M	Standard optimization	Standard	Adam	Adam	LR=0.001, B1=0.9, B2=0.999, epsilon=0.00000001, locking=false	N/M	N/M	N/M	Overlapping windows (87.5% and 25% overlap)	Cross-Entropy	Inter	10-Fold CV	k-fold	Train: 90% Test: 10%	Accuracy, Specificity, Sensitivity, Precision, f-measure, and g-mean.	accuracy, specificity, sensitivity, precision, f-measure, g-mean	N/M	N/M	99.1 ± 0.9% (for 3 classes problem) The mean accuracy of the proposed system is 99.6% for all the sixteen cases/ Many results (see papers) comparing Binary / Tenary classifications.	Random forests, Naive Bayes, kNN	Traditional pipeline	No	No	No	"According to our knowledge until this date, DL approach has never been used for this problem. The mean accuracy of the proposed system is 99.6% for all the sixteen cases (shown in Table 8 last column), which figures out the generalization power of the proposed system."	Small datasets	No	N/A	Yes	Yannick Roy	Hubert Banville	TBC	Ullah2018
31		A Novel Channel-aware Attention Framework for Multi-channel EEG Seizure Detection via Multi-view Deep Learning	2018	Yuan, Xun, Ma, Suo, Xue, Jia & Zhang	IEEE EMBS International Conference on Biomedical & Health Informatics (BHI)	No	Conference	Beijing Laboratory of Advanced Information Network State University of New York at Buffalo	China	4	Classification of EEG signals	Clinical	Epilepsy	Detection	Improve SOTA	Use end-to-end model with attention mechanism to select channel and detect seizures.	N/M	"Explore inherent EEG representations"	N/M	Raw EEG	CHB-MIT	Public	CHB-MIT: 9 out of 23 subjects 4302 EEG fragments (windows length = ??)	4302	N/M	9	23	256	Offline	N/M	N/M	N/M	N/M	Spectrogram (STFT)	Frequency-domain	N/M	N/M	2 SAEs	AE	Channel Encoders (SAE) Global Encoder (SAE) + Attention	N/M	Yes	N/M	2	2	N/M	Dropout	Yes	2	No seizure Seizure	2	N/M	[Not clear!] Unsupervised pretraining, followed by fine-tuning with softmax layer?	Standard	Adam	Adam	N/M	N/M	N/M	N/M	N/M	Cross-Entropy	Inter	Holdout	Holdout	N/M	F1-Score Accuracy AUC of ROC and precision-recall curves	f1-score, accuracy, ROC AUC, PR AUC	N/M	N/M	[F1-score] - Channel Attloc: 0.9781, Channel Attglo: 0.9785 [Accuracy] - Channel Attloc: 0.9651, Channel Attglo: 0.9661	PCA+SVM (PSVM) SAEs + attention DNN + hard channel selection	DL & Trad.	N/M	Analysis of mean attention score values for a single subject	Analysis of activations	"To the best of our knowledge, this is the first work using attention mechanism for biosignal channel selection in healthcare."		No	N/A	No	Yannick Roy	Hubert Banville	TBC	Yuan2018a
32		Compact Convolutional Neural Networks for Classification of Asynchronous Steady-state Visual Evoked Potentials	2018	Waytowich, Lawhern, Garcia, Cummings, Faller, Sajda, Vettel	Journal of Neural Engineering	Yes	Journal	U.S. Army Research Laboratory Lab for Intelligent Imaging and Neural Comp. University of Pennsylvania University of California, Santa Barbara	USA	21	Classification of EEG signals	BCI	Reactive	SSVEP	Improve SOTA	Use ConvNet for SSVEP classification	SSVEP (12 classes!)	Automatic feature learning without domain-specific information	ActiveTwo (BioSemi)	SSVEP	Internal Recordings	Public	10 subjects x 15 block x 12 trials x 4s (1s windows)	7200	120	10	8	2048	Offline	1) Bandpass 9-30 Hz 2) Downsampled to 256 Hz	Yes	N/M	N/M	Raw EEG	Raw EEG		TensorFlow, Keras Original Stimuli (from 2015) on MATLAB with Psychophysics Toolbox	EEGNet	CNN	Filter banks (temporal convolutions) followed by spatial filters	N/M	Yes	8 channels x 256 samples	3	3	ELU	Batch norm Dropout (0.25)	Yes	12	12 different combinations of frequency and phase	12	46,476 (45,900 trainable)	Standard optimization	Standard	Adam	Adam	N/M	64	N/M	N/M	N/M	Categorical Cross-Entropy	Inter	Leave-One-Subject-Out	Leave-One-Subject-Out	Train: 90% Test: 10%	Accuracy	accuracy	N/M	N/M	~90% for 7/10 Subjects. 60%, 75%, 30% for the others. (chance = 8%)	CCA (Canonical Correl. Analysis) C-CCA (Combined CCA)	Traditional pipeline	Paired t-tests when comparing to baseline	Visualization of feature activations with t-SNE	Analysis of activations	"Although unexpected, these within-class clusters highlight the strength of the deep learning approaches to learn diagnostic features directly from the data."	Experiment did not includ a non-control state	Yes	GitHub	No	Yannick Roy	Hubert Banville	Yes	Waytowich2018
33		Deep Classification of Epileptic Signals	2018	Ahmedt-Aristizabal, Fookes, Nguyen & Sridharan	Arxiv	Yes	Preprint	Queensland University of Technology	Australia	4	Classification of EEG signals	Clinical	Epilepsy	Detection	Improve State-of-the-Art: Using LSTM for Epilepsy classification	End-to-end seizure detection	Resting State, Eyes Open, Eyes Closed, Seizures.	Automatic feature learning	N/M	Raw EEG	Bonn University	Public	Bonn University (5 sets x 100 x 23.6s) (full 4096 points / 23.6s as windows, no overlap)	500	197	15	1	173.6	Offline	None, but the Bonn University Dataset already has some preprocessing.	Yes	N/M	N/M	Raw EEG	Raw EEG	N/M	Keras	LSTM	RNN	N/M	N/M	Yes	100 x 4096 (100 samples of 4096 segments)	Model 1: 1 LSTM + 1 Dropout Model 2: 2 LSTM + 2 Dropout + 1 FC	3	N/M	Dropout (0.35)	Yes	2	No seizure Seizure	1	Model 1: 16,961 Model 2: 116,033	Standard	Standard	Adam	Adam	LR: 1e-3, b1:0.9, b2:0.999	4	N/M	N/M	No	Binary Cross-Entropy	Inter	10-Fold CV	k-fold	Train: 70% Valid: 20% Test: 10%	Accuracy, Sensitivity, Specificity, Precision and the Area Under the Curve (AUC).	accuracy, sensitivity, specificity, precision, ROC AUC	N/M	N/M	Accuracy: [Valid] 95.54% [Test] 91.25% Sensitivity: [Test] 91.83% Specificity: [Test] 90.50% Precision: [Test] 91.50% AUC: [Test] 0.9582	None	None	No	No	No	"We experimented with various numbers of memory cells in each layer and obtained the best performance with a network configured with one single layer with 64 hidden units (Model 1) and with 2 hidden layers of 128 and 64 hidden units respectively (Model 2)"	N/M	No	N/A	Yes	Yannick Roy	Isabela Albuquerque	TBC	Ahmedt-Aristizabal2018
34		Emotion Recognition from EEG Using RASM and LSTM	2018	Li, Tian, Shy, Xu & Hu	International Conference on Internet Multimedia Computing and Service	No	Conference	South China University of Technology Lanzhou University	China	9	Classification of EEG signals	Monitoring	Affective	Emotion	Improve SOTA	Using rational assymetry (RASM) as features and LSTM as classifier on DEAP dataset for emotion classification. 2 Classes (Positive / Negative Valence)	Watching emotional movies (clips)	LSTM to capture temporal dependencies in emotions	N/M	Raw EEG	DEAP	Public	DEAP 895 Trials x 125 windows 63s each trial to 125 windows (1s windows, 50% overlap)	111875	939.75	32	32	256	Offline	None	No	No	No	RASM14 (STFT + Hanning Window --> 4 Freq Bands)	Frequency-domain	N/M	N/M	LSTM	RNN	N/M	"In our assumption, emotions change continuously, and this continuity is reflected in the temporal correlations of EEG signals. To explore the correlations, the classification method of Long Short-TermMemory networks (LSTM) is adopted."	Yes	125 * 14 * 4 (segments * pairs * bands)	1	1	N/M	Dropout (0.5)	Yes	2	Positive valence Negative valence	1	N/M	Standard	Standard	N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	Inter	10-Fold CV	k-fold	N/M	Accuracy	accuracy	N/M	N/M	RASM + LSTM: 76.67 (Accuracy) RASM + SVM: 65.62 (Accuracy) Zhang, 2016: 69.67 (Accuracy) Chen, 2015: 73.00 (Accuracy) Li X, 2016: 72.06 (Accuracy)	SVM Zhang [10] (DE + GELM) Chen [2] (Fusion feature + HMM) Li [6] (Wavelet energy + CRNN)	DL & Trad.	No	No	No	"Although the accuracy of our experiment is more than 75%, it is not good enough for applications. The task of the future work is to improve the recognition accuracy. More features will be tried especially those reflect the characteristics of EEG signals in frequency-space domain."	N/M	No	N/A	No	Yannick Roy	Isabela Albuquerque	TBC	Li2018
35		EEG detection and de-noising based on convolution neural network and Hilbert-Huang transform	2018	Wang, Guo, Zhang, Bai & Wang	International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)	No	Conference	Changchun University of Science and Technology Jilin Engineering Research Center of RFID and Intelligent Information Processing	China	6	Improvement of processing tools	Signal cleaning	Artifact handling		New Approach	Denoising EEG with Hilbert-Huang Transform after a detection of (yes/no) EOG artifact from a CNN classifier	N/M*	Nonlinearity of EEG	N/M	Raw EEG	Internal Recordings	Private	2000 training data and 100 test data (3s windows)	2100	105	N/M	-1	1000		N/M*	N/M	N/M*	N/M	IMF / HHT	Other	N/M	N/M	CNN	CNN	2*2 convolution kernels	N/M	Yes	"characteristic matrix of the extracted instantaneous power"	1	1	Softmax	N/M	N/M			1 EOG artifact yes/no (softmax)	N/M	N/M	N/M	N/M*	N/M	N/M	N/M	N/M	N/M	No	N/M*	Inter	No	No	Train: 2000 Test: 100	Accuracy	accuracy	N/M	N/M	80%	No	None	No	No	No	The results show that the method in this paper takes a little longer CPU time compared with the traditional wavelet de- noising [4] and HHT de-noising alone. But the signal-to-noise ratio after de-noising is obviously higher than the other two methods.	N/M	No	N/A	No	Yannick Roy	TBR	TBC	Wang2018a
36		Data Augmentation for EEG-Based Emotion Recognition with Deep Convolutional Neural Networks	2018	Wang, Zhong, Peng, Jiang & Liu	International Conference on Multimedia Modeling	No	Conference	Shenzhen University The Hong Kong Polytechnic University	China	12	Generation of data	Data augmentation			New Approach: Data augmentation on Emotion datasets for Deep learning models	Data augmentation on SEED & MAHNOB-HCI dataset and evaluation using ResNet & LeNet.	Watching emotional films/clips	Data augmentation for deep models with many parameters	N/M	Emotions (Frequency Features)	SEED; MAHNOB HCI	Public	DS #1 - SEED: 630 EEG segments, from 14 subjects 14 subjects x 15 videos x 3 sessions x 4 min splitted into 3x62s each for 1890 segments total DS #2 - MAHNOB-HCI: Between 34.9 - 117s (avg at: 75.95) 188 negative, 208 neutral and 131 positive segments (1s windows, no overlap)	117180; 40025	1953; 667.1	14; 30	62; 32	N/M	Offline	1) Downsampled to 200Hz 2) Band-pass filters: 5 freq bands 3) SFTF with non-overlapping Hamming window 1s	Yes	Manual removal	Yes	Differential Entropy (DE) per band	Other	N/M	MATCONVNET	ResNet LeNet	CNN	Data Augmentation Paper, it's not about these networks.	Data augmentation with Gaussian Noise of various std	No	n x l x 5 n: electrodes l: length (time) 5: Freq Bands DE	LeNet: 5 ResNet: 14	14	N/M	N/M	N/M	3	positive, neutral, negative	1) 3 2) 3	1) 4,000 2) 20,000	Standard optimization with augmented data	Standard	N/M	N/M	lr = 0.1	100	N/M	N/M	Gaussian Noise (augmented up to 30 times)	N/M	Inter	No	No	1 - SEED) Train: 1134 1 - SEED) Test: 756 2) N/M	Accuracy	accuracy	N/M	N/M	DS #1) LeNet: [Pre] 49.6% \| [Post] 74.3% DS #1) ResNet: [Pre] 34.2% \| [Post] 75.0% DS #2) ResNet: [Pre] 40.8% \| [Post] 45.4% DS #2) LeNet: N/M	DS #1) SVM: [Pre] 74.2% \| [Post] 73.4% DS #1) PCA-SVM: [Pre] 49.8% \| [Post] N/M% DS #2) SVM: [Pre] 42.5% \| [Post] 44.3%	Traditional pipeline	No	No	No	By analyzing the experimental result, we find that the data augmentation method can effectively improve the performance of deep models. In future, we will seek to use other data augmentation methods, such as generative adversarial networks, to generate more effective samples of EEG data and improve the performance of EEG-based emotion recognition.	N/M	No	N/A	Yes	Yannick Roy	Hubert Banville	TBC	Wang2018
37		A convolutional neural network for sleep stage scoring from raw single-channel EEG	2018	Sors, Bonnet, Mirek, Vercueil & Payen	Biomedical Signal Processing and Control	No	Journal	Université Grenoble Alpes CEA Leti, MINATEC Campus (Grenoble) Dijon University Hospital (Dijon) Grenoble University Hospital	France	8	Classification of EEG signals	Clinical	Sleep	Staging	New Approach: Sleep Stage Scoring (5 stages) with CNN on Single EEG Channel	Use CNNs on raw EEG data for 5-class sleep prediction	Sleep	CNNs have presented good performance in other domains and other EEG tasks.	N/M	Raw EEG	SHHS	Public	Dataset SHHS-1 (5793 polysomnographic records) 5,384,401 epochs of 30s ~ 5 years of data! (30s windows)	5384401	2692200	5728	1	125	Offline	None	No	No	No	Raw EEG	Raw EEG	No	TensorFlow	CNN	CNN	(no mention of pooling or dropout)	1D convolutional layers	Yes	(3750 * 4) x 1 30s epoch + 2 preceding + 1 following 30s @ 125Hz = 3750 samples	12 Conv Layers + 1 FC (256) + 1 FC (5 classes)	14	Leaky ReLU	N/M	N/M	5	Wake N1 N2 N3 REM	5 [prob for each class] (Softmax)	N/M*	Standard optimization	Standard	Adam	Adam	lr = 3 ×10^−5, b1 = 0.9, b2 = 0.999	128	N/M	N/M	Tried cost-sensitive learning and oversampling (didn't improve. didn't use it.)	Multiclass Cross-Entropy	Inter	Train-Valid-Test	Train-Valid-Test	Train: 50% Valid: 20% Test: 30%	Accuracy	accuracy	NVidia GTX980Ti	N/M*	87%	Tsinalis [15] CNN: 0.75 Supratak [16] CNN-LSTM: 0.86 Liang [9] [...] : 0.88 Zhu [10] DVG, SVM: 0.85 Fraiwan [6] T-F, RF: 0.83 Hassan [38] EMD, Ensemble: 0.87 Hassan [11] EMD, [...]: 0.89 Hassan [12] PSD, RF: 0.88 Hassan [39] EMD, [...] : 0.83 Sharma [13] Iterative filtering: 0.88 Hsu [14] Energy, RNN: 0.90	DL & Trad.	No	Visualization of synthetic inputs that maximize class probability	Generating input to maximize activation	"This study shows that it is possible to classify sleep stages using a single EEG channel and a convolutional neural network work- ing on raw signal samples without any feature extraction phase and with performance on par with other state-of-the-art methods." "Further research is necessary to address class imbalance. Ensemble learning [35] or CNN-specific methods [36] may prove suitable"	N/M	Yes	GitHub	No	Yannick Roy	Isabela Albuquerque	Yes	Sors2018
38		ChronoNet: A Deep Recurrent Neural Network for Abnormal EEG Identification	2018	Roy, Kiral-Kornek & Harrer	Arxiv	Yes	Preprint	IBM Research - Australia	Australia		Classification of EEG signals	Clinical	Pathological EEG		Improve SOTA	Detect abnormal EEG with a new end-to-end architecture	?	Automatic interpretation of EEG from raw data	N/M	Raw EEG	TUH Abnormal EEG Corpus	Public	TUH Abnormal EEG Corpus Training set: 1361 abnormal/1379 normal sessions Test set: 127 abnormal/150 normal session Became: 14,971 / 15,169 windows for training (1min windows)	30417	30417	N/M	22	250		None	No	N/M	N/M	None	Raw EEG	N/A	N/M	1) Conv+GRU 2) Inception Conv+GRU 3) Dense Conv+GRU 4) Inception Dense Conv+GRU	CNN+RNN	Conv filter sizes grow exponentially inside a given layer (e.g., 2, 4, 8)	-	Yes	15000 x ?	1) 7 2) 7 3) 7 4) 7	7	N/M	N/M	N/M			2	N/M	Standard optimization	Standard	Adam	Adam	500 epochs	64	N/M	N/M	N/M	N/M	Inter	5-Fold CV	k-fold	Train: 90.8% Test: 9.2%	Accuracy	accuracy	N/M	N/M	1) 82.31 2) 84.11 3) 83.89 4) 86.57	CNN-MLP: 78.80 DeepCNN: 85.40	DL	No	No	No	The ChronoNet architecture is a general-purpose architecture for time series - has been applied to speech data classication.	-	No	N/A	No	Hubert Banville	TBR	Yes	Roy2018
39		EEG-GAN: Generative adversarial networks for electroencephalograhic (EEG) brain signals	2018	Hartmann, Schirrmeister & Ball	Arxiv	Yes	Preprint	University of Freiburg	Germany	7	Generation of data	Generating EEG			Generate EEG signals	Generate EEG signals using GANs	Motor imagery	GANs are good at generating data	N/M	Raw EEG	Internal Recordings	Private	438 EEG Signals They don't talk about the lenght of these signals. They show plot of -500ms to 2500ms (probably windows of 3s)	438	21.9	N/M	1	250		None	No	No	No	None	Raw EEG	Subtract mean then divide by maximum absolute value	N/M	Wassertein GAN (modified)	GAN	-	Conv layers instead of autoregressive model, as it worked well in the authors's other papers	Yes	Gen: 200 Discr: 768	Gen: 14 Discr: 14	14	Leaky ReLU	Gradient penalty	Yes	N/A	N/A	Gen: 768 Discr: 1	N/M	GAN optimization with increasing resolutions	Other	Adam	Adam	"Equalized learning rate" lr = 0.001 beta1 = 9 beta2 = 0.99	?	N/M	N/M	No	Improved Wassertein distance	Inter	No	No	Train: 286 Valid: 72 Test: 80	Inception score Frechet inception distance Euclidean distance Sliced Wassertein distance	inception score, frechet inception distance, euclidean distance, sliced Wasserstein distance	N/M	N/M	[Many values]	WGAN with gradient penalty	DL	No	Visual inspection of generated segments (time series distribution, spectrum distribution, examples)	Analysis of generated outputs	The metrics did not correlate with visual performance, and so the authors recommend using many metrics to obtain a balanced view	Mode collapse in GANs	No	N/A	No	Hubert Banville	Isabela Albuquerque	TBC	Hartmann2018
40		Know Your Mind: Adaptive Brain Signal Classification with Reinforced Attentive Convolutional Neural Networks	2018	Zhang, Yao, Wang, Zhang, Zhang & Liu	Arxiv	Yes	Preprint	University of New South Wales, Tsinghua University, Michigan State University	Australia		Classification of EEG signals	Multi-purpose architecture			Make general framework for EEG classification	Apply a single architecture (reinforced attentive CNN) to EEG classification	1 & 2: Motor imagery 3: Person identification 4: Pathology (seizure detection)	Skip time-consuming feature engineering and no task-specific classifier.	EPOC (Emotiv), N/M	1 & 2) Motor Imagery 3) None 4) Seizures	eegmmidb; Internal Recordings; EEG-S; TUH	Both	DS #1 - eegmmidb: 20 x 28000 points (@160Hz) DS #2 - Internal: 7 x 34560 points (@128Hz) DS #3 - EEG-S: 8 x 7000 points (@160Hz) DS #4 - TUH: 5 x 12000 points (@250Hz) (windows of 1 point)	560000; 241910; 56000; 60000	58.33; 31.5; 5.8; 4	20; 7; 8; 5	64; 14; 64; 22	160; 128; 160; 250		None	No	N/M	N/M	None	Raw EEG	N/A	TensorFlow	CNN with attention + DQN	CNN	1) Replicating and shuffling incoming samples 2) Attention mechanism trained with RL 3) CNN 4) Nearest-neighbour classifier	A) Replicate and shuffle operation intended to randomly unveil interesting spatial patterns	Yes	1 x nb_channel	CNN: 3 DQN: 2	3	ReLU & Sigmoid	L2	Yes			1) 5 2) 6 3) 8 4) 2	N/M	Standard optimization (including reinforcement learning)	Standard	Adam	Adam	Learning rate: 0.001	N/M	N/M	N/M	No	Cross-entropy	Inter	N/M	No	N/M	Accuracy, Precision, Recall, F1-score Latency Resilience	accuracy, precision, recall, f1-score, latency, resilience	N/M	10 min	Accuracy 1) 0.9932 2) 0.9708 3) 0.9984 4) 0.9975	Not clear what they were trained on (samples? features?): Linear SVM, Random Forest, kNN, LSTM, GRU, Adaptive boosting, LDA + 5 state-of-the-art papers for each (20 total)	DL & Trad.	No	No	No	Latency is comparable to other methods The number of channels used affects the performance.	-	Yes	GitHub	No	Hubert Banville	Yannick	TBC	Zhang2018a
41		Gated Recurrent Networks for Seizure Detection	2018	Golmohammadi, Ziyabari, Shah, Von Weltin, Campbell, Obeid & Picone	Arxiv	Yes	Preprint	Neural Engineering Data Consortium, Temple University	USA	5	Classification of EEG signals	Clinical	Epilepsy	Detection	Improve SOTA (their previous work)	Explore Gated RNN (LSTM & GRU), explore initialiazation and regularization of these networks	(see TUH dataset paper)	Improve their last results	N/M	Seizures	TUH Seizure Corpus	Public	TUH EEG Corpus (Train + Test \| in sec) Seizures: 51,140 + 53,930 Non-Seizures: 877,821 + 547,728 (21s windows, no overlap)	72886	25510.35	246	22	250		None	No	N/M	N/M	LFCCs + First & Second Derivative of LFCCs	Other	N/A	N/M	1) CNN + LSTM 2) CNN + GRU	CNN+RNN	2D CNN to 1D CNN to bi-LSTM First LSTM output: 128 (1s data / epoch) Second LSTM output: 2-way sigmoid (classification of a 1s epoch)	1) Gated units to avoid vanishing gradient. 2) RNNs to capture long-term dependencies.	Yes	210 x 22 x 26 (Windows * Channels * Features)	3x 2D CNN + 1x 1D CNN + LSTM	5	ELU	1) L1 2) L2 3) L1/L2 4) Dropout 5) Guassian Noise	Yes			1 (classification - sigmoid)	N/M	Initialization: The best performance is achieved using orthogonal initialization	Standard	Adam	Adam	N/M	N/M	N/M	N/M	No	MSE	Inter	No	No	Train: 928,962s Test: 601,659s	Sensitivity, Specificity	sensitivity, specificity	N/M	N/M	CNN + GRU - Sensitivity: 30.83% \| Specificity: 91.49% CNN + LSTM - Sensitivity: 30.83% \| Specificity: 97.10% Best Regulation: L1/L2 Best Initialization: Orthogonal	Compared CNN+GRU vs CNN+LSTM Compared 10 different initialization methods (see comments) Compared 5 different regularization methods (L1/L2, L1, L2, Gaussian noise, Dropout)	DL	No	No	No	LSTMs outperformed GRUs. We also studied initialization and regularizations of these networks. In future research, we are designing a more powerful architecture based on reinforcement learning concepts. We are also optimizing regularization and initialization algorithms for these approaches. Our goal is to approach human performance which is in the range of 75% sensitivity with a false alarm rate of 1 per 24 hours [11].	No enough labeled data. Having certified specialist to label the data is very expensive, and hard to have people to do it.	No	N/A	Yes	Yannick Roy	TBR	TBC	Golmohammadi2017b
42		Optimizing Channel Selection for Seizure Detection	2018	Shah, Golmohammadi, Ziyabari, Weltin, Obeid & Picone	Arxiv	Yes	Preprint	Neural Engineering Data Consortium, Temple University	USA		Classification of EEG signals	Clinical	Epilepsy	Detection	Study the Impact of Number of Channels	Explore the impact of using/having from 2 to 22 channels with same network	(see TUH dataset paper)	Lower the number of EEG channels required (also save disk space)	N/M	Seizures	TUH Seizure Corpus	Public	TUH EEG Seizure Corpus (TUSZ) No more information about samples/time (1s windows)	N/M	N/M	N/M	22	250		None	No	N/M	N/M	LFCCs + First & Second Derivative of LFCCs	Other	N/A	N/M	CNN + LSTM	CNN+RNN	(same as there previous paper: Gated Recurrent Networks for Seizure Detection)	(same as there previous paper: Gated Recurrent Networks for Seizure Detection)	Yes	210 x 22 x 26 (Windows * Channels * Features)	3x 2D CNN + 1x 1D FC CNN + 2x Bi-LSTM	5	ELU & Sigmoid	Dropout	Yes			1* (classification - sigmoid)	N/M	N/M	N/M	Adam	Adam	N/M	N/M	N/M	N/M	No	MSE	Inter	N/M	No	N/M	Sensitivity, Specificity	sensitivity, specificity	N/M	N/M	22 Channels - Sensitivity: 39.15% \| Specificity: 90.37% 20 Channels - Sensitivity: 34.54% \| Specificity: 82.07% 16 Channels - Sensitivity: 36.54% \| Specificity: 80.48% 8 Channels - Sensitivity: 33.44% \| Specificity: 85.51% 4 Channels - Sensitivity: 33.11% \| Specificity: 39.32%	No	None	No	No	No	The results presented in this paper use the Any Overlap scoring method [11] in which true positives are counted when the hypothesis overlaps with one or more reference annotations. False positives correspond to events in which the hypothesis annotations do not overlap with any of the reference annotations. This method of scoring is popular in the EEG research community.	-	No	N/A	No	Yannick Roy	TBR	TBC	Shah2017
43		Improving brain computer interface performance by data augmentation with conditional Deep Convolutional Generative Adversarial Networks	2018	Zhang & Liu	Arxiv	Yes	Preprint	Beijing Institute of Technology	China		Generation of data	Data augmentation			Generate EEG signals	Generate EEG signals using GANs for data augmentation	Motor Imagery (Left/Right Hand)	To increase amount of data available for training	N/M	Motor imagery	BCI Competition II - III	Public	BCI Competition II - III (1 Subject x 7 runs x 40 trials x 9 seconds) Used only 280 trials (140 training / 140 testing) Took 5s from each trial: 4s-9s (5s windows)	280	23.3	1	3	128	Offline	None	No	No	No	Continuous Wavelet transform (Morlet) Only keep 7-15 Hz (Time-Frequency Domain)	Frequency-domain	z-score	N/M	Augmentation: cDCGAN Classification: CNN	CNN	Conditional Deep Convolutional GAN (cDCGAN) + label information as input to both generator and discriminator	2D kernel to accomodate input TFR	Yes	N/M	N/M	N/M	ReLU + Leaky ReLU + Sigmoid	N/M	N/M	2	Left Hand Rigth Hand	Same as input (not mentioned)	N/M	cDCGAN optimization Training CNN with real and artificial data	Standard	N/M	N/M	N/M	N/M	N/M	N/M	GAN [0.5 - 2x] (artificial EEG data)	N/M	Intra	No	No	Train: 50% Test: 50%	Accuracy	accuracy	N/M	N/M	No augmentation: ~83 % 50% of augmentation: ~84% 150% of augmetation: ~84% 200% of augmentation: ~85.5%	None	None	No	No	No	Data augmentation with GAN does help increasing accuracy when limited data is available.	Limited amount of data available per subject when training a BCI.	No	N/A	Yes	Hubert Banville	Yannick	Yes	Zhang2018b
44		Convolutional neural networks for seizure prediction using intracranial and scalp electroencephalogram	2018	Truong, Nguyen, Kuhlmann, Bonyadi, Yang, Ippolito & Kavehei	Neural Networks	Yes	Journal	University of Sydney Royal Melbourne Institute of Technology Swinburne University University of Melbourne University of Queensland University of Adelaide	Australia		Classification of EEG signals	Clinical	Epilepsy	Prediction	Improve SOTA	Use CNN to improve SOTA in seizure prediction	Ongoing recording with and without seizures	Test CNN on different epilepsy datasets	N/M	None (Seizures)	Freiburg Hospital; CHB-MIT; Kaggle: AESSPC	Public	DS #1 - Freiburg: 311h (59 seizures) DS #2 - CHB-MIT: 209h (64 seizures) DS #3 - AESSPC: 627h (48 seizures) (30s windows, no overlap)	37320; 25080; 75240	18660; 12540; 37620	13; 13; 2	6; 22; 16	N/M		1) Removed Powerline: 47-53Hz + 97-103Hz 2) Removed DC Component (0Hz)	Yes	N/M	N/M	STFT (2D Freq x Time) 30s EEG windows	Frequency-domain	N/A	Python Keras Tensorflow	CNN	CNN	Batch Norm + Pooling 2 Dense	First, we keep the CNN architecture simple and shallow as described above (Ba & Caruana, 2014)	Yes	n x 59 x 114 (electrodes x time x freq)	CNN: 3 FC: 2	5	ReLU Sigmoid Softmax	Dropout (50%)	Yes			2	N/M	We applied cost-sensitive learning by changing 300 the cost function in a way that the misclassification cost of preictal samples is multiplied by the ratio of interictal samples to preictal samples for each patient. We over-sampled the minority class. The cost-sensitive learning was for comparison. Though the two methods achieve comparable performance.	Standard	N/M	N/M	N/M	N/M	N/M	N/M	Overlapping windows (overlap % subject-specific to match classes)	N/M	Intra	Leave-One-Seizure-Out	Leave-One-Sample-Out	Train: N/M Valid: 25% of training Test: N/M	Sensitivity FPR (/h)	sensitivity, FPR	NVidia K80	N/M	Measures (Epilepsy Specific): SOP of 30 min \| SPH of 5 min DS #1) Sensitivity : 81.4% \| FPR: 0.06/h DS #2) Sensitivity : 81.2% \| FPR: 0.16/h DS #3) Sensitivity : 75.0% \| FPR: 0.21/h	Compares on 3 Datasets Compares to 14 other SOTA (papers)	DL & Trad.	Wilcoxon signed-rank test	No	No		(1) Unbalanced Classes. (2) Comparing results with SOTA is complicated because each approach was tested with one dataset that is limited in the amount of data.	No	N/A	Yes	Yannick Roy	TBR	Yes	Truong2018
45		Semi-supervised Seizure Prediction with Generative Adversarial Networks	2018	Truong, Kuhlmann, Bonyadi & Kavehei	Arxiv	Yes	Preprint	University of Sydney University of Melbourne University of Queensland	Australia		Classification of EEG signals	Clinical	Epilepsy	Prediction	Improve SOTA	Use unlabelled data and data fusion to improve SOTA in seizure prediction	Ongoing recording with and without seizures	Leverage unlabelled data	N/M	Raw EEG	CHB-MIT; Freiburg Hospital	Public	DS #1 - Freiburg: 311h DS #2 - CHB-MIT: 209h (28s windows, no overlap)	39985; 26871	18660; 12540	13; 13	16; 6	256		STFT on 28-s windows with 50% overlap Removal of power line noise frequencies	Yes	N/M	N/M	STFT	Frequency-domain	N/M	Tensorflow	1) GAN 2) CNN	Other	-	-	Yes	1) GAN generator: 100 x1 2) GAN discriminator: n x 56 x 112 3) CNN: Same as discriminator	1) GAN generator: 4 2) GAN discriminator: 3 3) Classifier: 2	4	Softmax, Sigmoid	Dropout (50%)	Yes			1) GAN generator: 2) GAN discriminator: 3) CNN	N/M	1) Train GAN 2) Train 2 new FC layers on top of discriminator using labelled data	Other	N/M	N/M	N/M	N/M	N/M	N/M	Overlapping windows (overlap % subject-specific to match classes)	N/M	Both	Leave-One-Seizure-Out	Leave-One-Sample-Out	Train: N/M Valid: 25% of training Test: N/M	ROC AUC	ROC AUC	Nvidia P100	N/M	AUC: 77.68% (CHBMIT), 75.47 (Freiburg) [6 and 12% less than benchmark]	CNN	DL	No	No	No	Although the performance decreased as compared to a standard CNN, the authors argue this can reduce the effort put into labelling the data.	-	No	N/A	No	Hubert Banville	TBR	Yes	Truong2018a
46		Time Series Segmentation through Automatic Feature Learning	2018	Lee, Ortiz, Ko & Lee	Arxiv	Yes	Preprint	Princeton University	USA		Classification of EEG signals	Multi-purpose architecture			Improve SOTA	Detect changepoints/breakpoints in data (changes in signal) and apply to different types of time series data	Eye movements	Deep learning models for changepoint detection don't make assumptions about the underlying processes, as opposed to standard models	EPOC (Emotiv)	Eyes open vs. eyes closed	EEG Eye State	Public	EEG Eye State Dataset 117 seconds from 1 subject with Emotiv (14980 points, using windows of 25 points)	600	2	1	14	256		N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	Stacked Autoencoder	AE	-	-	No	N/M	2 (encoder)	2	N/M	Tied weights in encoder and decoder L2 weight decay	Yes			2	N/M	Standard optimization	Standard	Stochastic gradient descent	SGD	N/M	N/M	N/M	N/M	No	Cross-entropy (or square loss?)	Intra	No	No	N/M	ROC Prediction loss (specific to task) MSE Prediction ratio	ROC, prediction loss, mse, prediction ratio	N/M	N/M	ROC curves...	Bayesian changepoint detection (based on Gamma or Gaussian priors) Pruned Exact Linear Time method Density-ratio estimation method	Traditional pipeline	No	No	No	Deep learning avoids typical problems in modelling changepoints.	-	No	N/A	No	Hubert Banville	TBR	TBC	Lee2018a
47		Investigating the Impact of CNN Depth on Neonatal Seizure Detection Performance	2018	O’Shea, Lightbody, Boylan & Temko	Arxiv	Yes	Preprint	Irish Centre for Fetal and Neonatal Translational Research, University College Cork	Ireland		Classification of EEG signals	Clinical	Epilepsy	Detection	Improve SOTA	Use CNN to improve SOTA in neonatal seizure detection	Ongoing recording with and without seizures	Improve SOTA with CNN-11 based on their CNN-6 (2017)	N/M	Raw EEG	Internal Recordings	Private	18 babies: over 800 hours of multichannel unedited EEG containing 1389 seizure (8s windows, 7s overlap)	N/M	N/M	18	8	256		Down-sample to 32Hz Filtered between 0.5 and 12.8Hz	Yes	No	No	8 sec windows (1 sec shift)	Raw EEG	N/A	N/M	CNN	CNN	Conv - Batch Norm - Pooling Output not Dense layer but Global Average Pooling	"The 11-layer network can learn more simple features in the first layer (3 samples wide) and more complex features in the final layers (212 samples wide)."	Yes	256x1 (8 sec x 1 channel)	11	11	Softmax	Batch norm	Yes			2 Seizure / Non-Seizure	28,642	The network was trained for 100 epochs, after each epoch the validation AUC was calculated.	Standard	Stochastic Gradient Descent	SGD	LR: 0.01 Momentum: 0.9	2048	N/M	N/M	Sliding Window (Shifted by 1s, 7/8 overlap)	N/M	Inter	Leave-One-Subject-Out	Leave-One-Subject-Out	"The training data contains less than 2% of the validation dataset"	ROC AUC	ROC AUC	N/M	N/M	AUC: 97.61% AUC90: 86.85%	CNN - 6 layers (O'Shea et al., 2017) SVM	DL & Trad.	No	No	No	This represents a substantial improvement over a shallower 6-layer CNN network which has a smaller range of receptive fields. These results represent the current best results for this task obtained using a single classifier.	N/M	No	N/A	No	Yannick Roy	TBR	Yes	OShea2018
48		Removing Confounding Factors Associated Weights in Deep Neural Networks Improves the Prediction Accuracy for Healthcare Applications	2018	Wang, Wu, Xing	Pacific Symposium on Biocomputing 2019	Yes	Preprint	Carnegie Mellon University University of Illinois Urbana-Champaign Petuum Inc.	USA	12	Improvement of processing tools	Reduce effect of confounders			New approach: Reduce effect of confounders in medical data	Reduce the effect of confounders in medical data (e.g., gender bias in training data)	Students watching MOOC videos	Learn representations from scratch	Mindset (NeuroSky)	Raw EEG	Internal Recordings	Private	10 students x 20 videos x 2min 10 confusing / 10 not confusing (??s windows)	N/M	400	10	1	N/M	N/M	N/M	N/M	No	No	Raw EEG	Raw EEG	z-score	TensorFlow	Bi-LSTM	RNN	Use of Confounder Filtering	N/M	No	N/M	N/M	N/M	Tanh	N/M	N/M	2	Confused Not-confused	1 (sigmoid)	N/M	N/M	N/M	N/M	N/M	N/M	20	N/M	N/M	N/M	Binary Cross-Entropy*	Inter	5-Fold CV	k-fold	N/M	Accuracy	accuracy	N/M	N/M	CF-Bidirectional LSTM acc: 75.0%	SVM: 67.2% K-Nearest Neighbors: 51.9% Convolutional Neural Network: 64.0% Deep Belief Network: 52.7% RNN-LSTM: 69.0% Bidirectional LSTM: 73.3%	DL & Trad.	No	No	No	The use of confounding filtering improves the predictive performance.	N/M	Yes	GitHub	No	Isabela Albuquerque	TBR	Yes	Wu2018
49		HAMLET: Interpretable Human And Machine co-LEarning Technique	2018	Deiss, Biswal, Jin, Sun, Westover & Sun	Arxiv	Yes	Preprint	Georgia Institute of Technology Massachusetts General Hospital	USA	9	Classification of EEG signals	Multi-purpose architecture			New approach	Help experts generate high quality labels	Tested on Epilepsy data, could be used for different tasks	Features can be automatically extracted to help experts label the data	N/M	Raw EEG	Internal Recordings	Private	D: Using 140,000 of 390,486 x 16s sequences (unbalanced with 5 classes) 1) 20,000 (89h of EEG) : 80/20 - train / test. Patients in the testing set are not present in the training set (testing is per- formed on unseen patients) 2) 20,000 (89h of EEG) : 80/20 - train / test Patients in the testing set are also present in the training set (testing is performed on known patients). 3) 100,000 sequences from D	140000	37333	155	19	200		1) Low-Pass filter: 60Hz 2) Computation of montages* (not sure what that means) 3) 16s windows	Yes	No	No	Raw EEG (None)	Raw EEG	N/M	Python Tensorflow	CNN CAE (Conv AutoEncoder)	Other	1D CNN FC Layer only for training	One advantage of CNNs is the automated feature selection that happens during training. Without additional work, the model learns the features that it finds most relevant for its given task, from the raw signals.	Yes	16x	Classifier: 6 Conv + 1FC	7	ELU	Dropout (20%)	Yes			5 (softmax)	N/M	Co-Learning Supervised & Unsupervised	Other	Adam	Adam	N/M	128	N/M	N/M	Flipped Electrodes Left <-> Right side of brain, keeping references the same (Fz, Cz, Pz) almost 2x dataset.	N/M	Inter	No	No	Train: 80% Test: 20%	Accuracy	accuracy	Intel(R) Xeon(R) E5-2630 2.40 GHz 32 cores 256 Gb ofRAM 4 GPUs Tesla K80	13h	Before re-labeling \| After re-lbl full \| After re-lbl re-eval only HAMLET-CNN 39.36% \| 40.75% \| 68.75% HAMLET-CAE 38.46% \| 39.06% \| 67.97% CNN 38.89% \| 41.58% \| 68.75% MLP 21.04% \| 23.14% \| 14.06%	CNN MLP	DL	No	1) Retrieval of closest labelled example to explain the decision on a specific input 2) Analysis of weights	Retrieval of closest examples, Analysis of weights	To summarize, first, we have introduced a novel tech- nique, HAMLET, for human and machine co-learning that is suited for creating high-quality labeled datasets on challenging tasks with a limited budget. This technique has benefits that can appreciated in many deep learning applications.	N/M	No	N/A	Yes	Yannick Roy	TBR	TBC	Deiss2018
50		Addressing Class Imbalance in Classification Problems of Noisy Signals by using Fourier Transform Surrogates	2018	Schwabedal, Snyder, Cakmak, Nemati & Clifford	Arxiv	Yes	Preprint	Emory University	USA	7	Generation of data	Data augmentation			Improve SOTA	Use FT Surrogates for Data Augmentation. (Tested with a CNN on Sleep Data)	Sleep Dataset (CAP)	Some EEG problemes are unbalanced. (e.g. Sleep stages, Epilepsy, etc.) For DL to perform well, we need data augmentation techniques.	N/M	Sleep	CAP Sleep	Public	CAPSLPDB: 94 out of 101 overnight PSGs x ~8h (30s windows, no overlap)	90240	45120	94	2	N/M		Low-pass filter: 13Hz (4th order Butterworth) Downsampling to 32Hz	Yes	No	No	Raw EEG (None)	Raw EEG	N/M	N/M	CNN	CNN	1D CNN for each channel: 2xEEG + 1xEOG + 1xEMG		Yes	30s Raw EEG	Conv 1D: 4 Conv 2D: 1 FC: 3	8		Dropout	Yes	6	Wake S1, S2, S3, S4 REM	6 (softmax)	N/M	N/M	N/M	RMS-Prop	Other	LR: 0.0016 Momentum: None Decay: 0.9	128	Baysian Hyperparams Optim.	Yes	FT Surrogates	N/M	Inter	5-Fold CV	k-fold	Train: 4/5 Validation: 1/5 Test: N/M	F1-Score Accuracy	f1-score, accuracy	Google Cloud	N/M	Accuracy (no augmentation): 67% \| 73% \| 51% \| 64% \| 75% \| 70% Accuracy (FT surrogate): 83% \| 86% \| 38% \| 75% \| 97% \| 46% Accuracy (IAAFT surrogates): 91% \| 83% \| 48% \| 79% \| 96% \| 81%	(all internal, no external) No data augmentation FT surrogates IAAFT surrogates	None	No	No	No	Increases in the S2-accuracy seemed to be at the expense of stages S1 and S3 for larger values of α. Based on these results, we hypothesize that the effect of surrogate augmentation on an individual class accuracy does not directly depend on their conditional prediction accuracies, which are on the diagonal of the conditional confusion matrix (cf. Fig. 4(a)); instead, augmentation may introduce mixing between class labels indicated by a large off-diagonal element upon which the accuracy of one of the mixed labels will dominate.	Unfortunately, we were not yet able to evaluate and compare IAAFT surrogates with these results due to temporal and budget constraints.	Yes	GitHub	Yes	Yannick Roy	TBR	TBC	Schwabedal2018
51		EEG Classification Based on Sparse Representation and Deep Learning	2018	Gao, Shang, Xiong, Fang, Zhang, & Gu	NeuroQuantology	No	Journal	Zhejiang University City College,	China	7	Classification of EEG signals	BCI	Active	Motor imagery	Improve SOTA	Use CNN + Sparse coding on top of CSP features	Motor Imagery	N/M	N/M	CSP	BCI Competition III - IVa	Public	BCI Competition III - IVa 140 + 140 = 280 samples (length = 6s)	280	28	2	118	100	Offline	Band-pass filter 8-15Hz	Yes	No	No	CSP (32 CSP filters)	Frequency-domain	N/M	N/M	CNN	CNN	CNN's input is a sparse representation of CSP features	N/M	Yes	28 x 28	CNN: 2 FC: 1	3	ReLU	N/M	N/M	2	Right Hand Right Foot	2 (softmax)	N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	Binary cross-entropy	Inter	N/M	No	Train: 280 Test: N/M	Accuracy	accuracy	N/M	N/M	Accuracy Class 1: 98% Accuracy Class 1: 99%	Sparse representations (not clear what is the classifier)	Traditional pipeline	No	No	No	Performance of CNN+sparse representations is less afect when the number of training samples decreases.	N/M	No	N/A	No	Isabela Albuquerque	Yannick Roy	TBC	Gao2018
52		Use of features from RR-time series and EEG signals for automated classification of sleep stages in deep neural network framework	2018	Tripathy, & Rajendra Acharya	Biocybernetics and Biomedical Engineering	No	Journal	Siksha 'O' Anusandhan, India Ngee Ann Polytechnic, Singapore SUSS University, Singapore	India	13	Classification of EEG signals	Clinical	Sleep	Staging	Improve SOTA	Use DNNs on EEG + ECG for sleep stage scoring	Sleep Dataset (MIT-BIH)	They don't mention why DL.	N/M	Raw EEG (Sleep)	MIT-BIH	Public	MIT-BIH 18 records * 100000 points / 7500 points per window (30s windows)	240	120	18	1	250		1) 5 Band-pass filters to 5 freq bands	Yes	No	No	14 EEG-HRV Features (out of 19) (The dispersion entropy and the variance features are evaluated from the different bands of EEG signal) (the RQA and dispersion entropy features are evaluated from the IMFs of RR-time series)	Other	N/M	Matlab 2015a	SAE	AE	3 DNNs EEG features + HRV features combined as inputs. Outputs = 2 classes (x3 DNNs)	N/M	Yes	14 EEG Features ECG Features (30s window)	2 AE	2	Sigmoid	L2 (N/M... Assumed from the formula)	Yes			2 (softmax) 3 DNN Networks Classifying 2 classes each	N/M	Greedy Layer Wise	Pre-training	SGD	SGD	N/M	N/M	N/M	N/M	N/M	(See Formula)	Inter	10-Fold CV	k-fold	N/M	Accuracy (Acc) Sensitivity (Sen) Specificity (Spe)	accuracy, sensitivity, specificity	CPU 2 GHz 2 GB RAM	1 Instance: EEG: 4.89s RR: 0.03s	Acc Sleep vs Wake: 85.51% Acc Light vs Deep Sleep: 94.03% Acc REM vs NREM: 95.71%	Hayet and Slim [55] (ELM, Werteni et al. [56] (SVM), Adnane et al. [16] (SVM), Rossow et al. [57] (HMM), Redmond and Heneghan [58] (QDA), Song et al. [59] (Multivariate Discrim. Analysis), Prucnal et al. [12] (NN), Hasan et al. [11] (RUSBoost), Da Silveira et al. [13] (RF)	Traditional pipeline	No	No	No	The dispersion entropy values for delta (d), theta (u) and alpha (a) bands are found to be more discriminatory for the classification of the wake and sleep classes.	The limitation of this work is that we have used only 18 subjects. The performance of this work can be improved using more subjects from the diverse race. The number of REM sleep stage instances in MIT-BIH polysomnography database is less as compared to deep sleep, light sleep and wake classes.	No	N/A	Yes	Yannick Roy	TBR	Yes	Tripathy2018
53		Emotion stress detection using EEG signal and deep learning technologies	2018	Liao, Chen & Tai	IEEE International Conference on Applied System Invention (ICASI)	No	Conference	Department of Information Management Chaoyang University of Technology	Taiwan		Classification of EEG signals	Monitoring	Affective	Emotion	New approach	Use CNN to classify Attention & Meditation from raw EEG	Listening to music	Exploring the use of DL for stress detection via EEG	Mindwave Mobile (Neurosky)	None	Internal Recordings	Private	7 subjects x 10 min (1s windows, no overlap)	4300	70	7	1	512		N/M	N/M	N/M	N/M	Frequency Bands	Frequency-domain	N/A	N/M	CNN	CNN	N/M	N/M	No	1s (N/M, assuming 512 samples)	7	7	RELU	N/M	N/M			1 0: Meditation 1: Attention	N/M	N/M	N/M	N/M	N/M	N/M	N/M	Grid Search	Yes	N/M	N/M	Inter	N/M	No	Train: 80% Test: 20%	Accuracy F1-Score	accuracy, f1-score	N/M	N/M	Accuracy: 80.13%	None	None	No	No	No	The F1-score shows that our system is better in predicting class 1 than predicting class 0.	N/M	No	N/A	No	Yannick Roy	TBR	TBC	Liao2018
54		Hierarchical internal representation of spectral features in deep convolutional networks trained for EEG decoding	2018	Hartmann, Schirrmeister & Ball	BCI Conference	Yes	Conference	University of Freiburg	Germany	6	Improvement of processing tools	Model interpretability	Model visualization		Improve interpretability of CNNs	Study most activating inputs. Study effect on internal representation of variations in the input signal	Motor imagery	End-to-end learning	N/M	Raw EEG	Internal Recordings	Private	14 subjects x 1000 trials x 4s (4s windows)	14000	933	14	128	5000		1) Downsample to 250 Hz 2) Common average re-reference	Yes	No	No	Raw EEG (None)	Raw EEG	N/M	Pytorch	CNN	CNN	See Schirrmeister et al. (2017)	See Schirrmeister et al. (2017)	Yes	522 x 128 (samples x channels)	CNN: 5 FC: 1	6	ELU	N/M	N/M			4	N/M	Standard optimization	Standard	Adam	Adam	N/M	N/M	N/M	N/M	N/M	Cross-entropy	Intra	N/M	No	Train: 80% Test: 20%	Accuracy F1-Score	accuracy, f1-score	N/M	N/M	Mean accuracy over 14 subjects: 88.6% (but this is not the focus of paper)	None	None	No	1) Signal perturbation (amplitude & phase) 2) Most-activating input windows	Input-perturbation network-prediction correlation maps, Analysis of most-activating input windows	Analyzed effect of perturbations in phase and amplitude of input signals. Earlier layers focus on frequency-related information while latest layers focus on amplitude.	N/M	No	N/A	No	Isabela Albuquerque	Hubert Banville	TBC	Hartmann2018b
55		Spatial-Temporal Recurrent Neural Network for Emotion Recognition	2018	Zhang, Zheng, Cui, Zong & Li	IEEE Transactions on Cybernetics	Yes	Journal	Southeast University, Nanjing, China Nanjing University of Science and Technology, China	China	9	Classification of EEG signals	Monitoring	Affective	Emotion	New Approach: Stacking 2 RNN layers for spatial and temporal resolution, for EEG & Facial Expression for emotion classification	Stacking 2 RNN layers for spatial and temporal resolution, for EEG & Facial Expression for emotion classification	Emotion Classification for short emotional films/clips (SEED dataset)	Leverage RNN for both spatial and temporal features	(NeuroScan)	Emotions	SEED	Public	SEED: 15 subjects Assumed: 15 subjects x 15 movies x 4min x 2 exp (9s windows, no overlap)	12000	1800	15	62	1000		None	No	No	No	DE descriptors (?) - Freq Bands (256-point FFT + Hanning Window (1s) for 5 F-Bands)	Frequency-domain	N/M	N/M	STRNN (Spatial-Temporal RNN)	RNN	Spatial & Temporal features representation with stacked RNNs	1) To learn spatial dependencies, a quad-directional spatial RNN (SRNN) layer is first employed 2) Then, a bi-directional temporal RNN (TRNN) layer is further stacked on SRNN to capture long-term temporal dependencies	Yes	Not clear... (to be reviewed)	SRNN: 1 TRNN: 1	2	ReLU Sigmoid	N/M	N/M			3 (Softmax)	N/M	N/M	N/M	Back Propagation Through Time (BPTT)	N/M	N/M	N/M	N/M	N/M	N/M	Cross-entropy	Inter	No	No	Train: 9 Sessions Test: 6 Sessions	Accuracy	accuracy	N/M	N/M	Accuracy: 89.5%	None	None	No	No	No	A multidirection SRNN layer and a bi-direction TRNN layer are hierarchi- cally employed to learn spatial and temporal dependencies layer by layer. To adapt the multichannel EEG signals to the proposed STRNN framework, the spatial scanning order of electrodes are specified by spatial coordinates and tempo- ral variation information is involved by slicing a window on the extracted DE feature sequences.	N/M	No	N/A	No	Yannick Roy	TBR	TBC	Zhang2018
56		Individual Recognition in Schizophrenia using Deep Learning Methods with Random Forest and Voting Classifiers: Insights from Resting State EEG Streams	2018	Chu, Qiu, Liu, Ling, Zhang & Wang	IEEE Transactions on Neural Systems and Rehabilitation Engineering	Yes	Journal	Big Data and AI Research Center of Shanghai Jiaotong University	China	7	Classification of EEG signals	Clinical	Schizophrenia	Detection	New Approach	Using Random Forest and Voting Classifiers with a CNN for Individual Recognition in Schizophrenia	Resting State, Eyes Open. (300s each)	Automatic feature extraction	(BrainProducts)	Raw EEG	Internal Recordings	Private	120 Subjects x 300 seconds	360000	600	120	64	1000	Offline	1) Occular Correction (with Brain Vision Analyzer's algos) 2) Re-Referenced to Common Average 3) Pass-Band Filter (IIR): 0.01 - 50Hz	Yes	Yes*	Yes	1) Raw EEG 2) Freq Bands	Raw EEG	Divide by max	N/M	CNN, RNN, and MLP	CNN	3 Conv Layers, ELU, 3 Dropout 0.5, 3 Max Pooling + Dropout 0.25, 3 FCs, 1 voting (RF, softmax or SVM)	N/M	Yes	Not clear	CNN: 6 MLP: 4 RNN: 2	6	ELU	Dropout (0.5, 0.25)	Yes	3	High risk Schizophrenia Healthy	3 Classes (replaced Softmax with Random Forrest)	N/M	Standard	Standard	N/M	N/M	N/M	N/M	N/M	N/M	No	N/M	Inter	Yes	Yes (no detail)	Train: 50% Test: 50%	Accuracy	accuracy	NVIDIA GeForce GTX 750	N/M	FES: 96.7% CHR: 81.6% HC: 99.2%	ANNV, RNNV, CNNV, ANNV+mSVM, RNN+mSVM, CNN+mSVM, ANN+RF, RNN+RF, CNN+RF	DL	No	No	No	"In conclusion, we have shown that CNNV-RF performs better than softmax and CNNV-mSVM on a well-known dataset (mnist) and resting state EEG streams used in this paper. Switching from softmax or mSVM to RF is incredibly simple and appears ro be helpful for classification problems."	N/M	No	N/A	No	Yannick Roy	Isabela Albuquerque	TBC	Chu2017
57		An EEG-based Image Annotation System	2018	Parekh, Subramanian, Roy & Jawahar	National Conference on Computer Vision, Pattern Recognition, Image Processing, and Graphics	Yes	Conference	IIIT Hyderabad, India University of Glasgow, Singapore National Brain Research Centre, Manesar, India	India	11	Classification of EEG signals	BCI	Reactive	RSVP	Novel Approach: Image classification based on subject's P300	Using CNN (EEGNet) to classify images based on P300. RSVP with Oddball.	RSVP. Images from Caltech101 and VOC 2012. Oddball Paradigm for P300	Not mentioned why DL...	EPOC (Emotiv)	RSVP P300	Internal Recordings	Private	5 subjects x 3 sessions x 2 (Test/Train) x 6 min 25 block of 100 image (2x test/train) x 100ms/image (1s windows)	7500	125	5	14	128		1) Baseline power removal using the 0.5 second pre-stimulus samples 2) Band-Pass filter: 0.1 - 45 Hz 3) ICA to remove artifacts (eye-blinks, and eye and muscle movements)	Yes	Yes	Yes	P300	Other	N/A	Braindecode	CNN (EEGNet)	CNN	They add a Outlier Removal "Feature". They used a pre-trained VGG-16 on predicted target image. (to reduce false-positive due to class imbalance)	see EEGNet & Braindecode	No	1s Windows (Raw EEG)	3	3	ELU	N/M (see Braindecode / EEGNet)	N/M			2 Target / Non-Target	N/M	N/M	N/M	Adam	Adam	N/M	N/M	N/M	N/M	N/M	Categorical Cross-Entropy	Inter	5-Fold CV	k-fold	Train: 2500 images / subject Test: 2500 images / subject	F1-Score (Due to a heavy class imbalance between T/non-T, we use F1-score)	f1-score	NVIDIA GEFORCE GTX 1080 Ti	N/M	[DS: CT101] Before outliers removal: F1: 0.71 Precision: 0.66 Recall: 0.81 [DS: CT101] After outliers removal: F1: 0.68 Precision: 0.63 Recall: 0.72 [DS: VOC2012] Before outliers removal: F1: 0.88 Precision: 0.99 Recall: 0.81 [DS: VOC2012] After outliers removal: F1: 0.83 Precision: 0.97 Recall: 0.72	None	None	No	No	No	Our annotation system exclusively relies on the P300 ERP signature, which is elicited upon the viewer detecting a pre-specified object class in the displayed image. A further outlier removal procedure based on binary feature-based clustering significantly improves annotation performance.	N/M	No	N/A	No	Yannick Roy	TBR	TBC	Parekh2018
58		EEGNet: A Compact Convolutional Neural Network for EEG-based Brain-Computer Interfaces	2018	Lawhern, Solon, Waytowich, Gordon, Hung & Lance	Journal of Neural Engineering	Yes	Journal	U.S. Army Lab,DCS Corporation, Columbia University, Georgetown University Medical Center	USA	17	Classification of EEG signals	BCI	Active & Reactive	MI & ERP	Novel Approach: DN that can be used for different BCI paradigms	Compare EEGNet with SOTA ML for different BCI Paradigms	Visual P300 ERN Movement-related cortical potentials Sensory Motor Rhythms	Allows robust feature extraction	ActiveTwo (BioSemi), N/M, ActiveTwo (BioSemi), N/M	1) P300 2) ERN 3) Movement-related cortical potentials 4) SMR	Internal Recordings; Kaggle: Inria BCI challenge; Internal Recordings; BCI Competition IV - IIa	Both	P300: 15 subject x 2000 trials [1s windows] ERN: 26 subjects x 340 trials [1.25s windows] MRCP: 13 subjects x 1100 trials [1.5s windows] SMR: 9 subjects x 288 trials [3s windows]	30000; 8840; 14300; 2592	500; 184.2; 357.5; 129.6	15; 26; 13; 9	64; 56; 64; 22	512; 600; 1024; 250	Offline	1) Rereferencing (linked mastoids or earlobes) 2) Bandpass filter (1 - 40 Hz, 0.1-40 Hz or 4-40 Hz) 3) Downsampled to 128 Hz (** Different approaches! e.g. Used PREP Pipeline for #3)	Yes	No	No	Raw EEG	Raw EEG	Exponential moving average	Keras + Tensorflow	CNN	CNN	Layer 1: 1D Temporal Filters Layer 2: Depthwise 2D Conv Layer 3: Separable 2D Conv	1D temp. conv. at L1 to learn frequency filters. Depthwise: Inspired in part by the Filter-Bank Common Spatial Pattern (FBCSP) algorithm. Separable: explicitly decoupling the relationship within and across feature maps by first learning a kernel summarizing each feature map individually, then optimally merging the outputs afterwards	Yes	Channels x Time	3	3	ELU	Dropout, weight decay	Yes	P300: 2 ERN: 2 MRCP: 2 SMR: 4	P300: Target/Non-target ERN: Error/No error MRCP: Left/Right hand SMR: left hand/right hand/feet/tongue	(depends on the task) (softmax)	1) 1,066 2) 1,082 3) 1,098 4) 796	Within-Subject and Cross-Subject If classes are umbalanced, we apply class-weight to the loss function whenever the data is imbalanced	Standard	Adam	Adam	N/M	64	N/M	N/M	N/M	Categorical cross-entropy + Class weight if unbalanced	Both	Intrasubject: 4-Fold CV Intersubject: Leave some subjects out (different nb folds and ratio for each per task)	k-fold; Leave-N-Subjects-Out	[Intra] Train: 50% [Intra] Valid: 25% [Intra] Test: 25% [Inter] Different ratios of subjects for training, for validation and for test	Accuracy ROC AUC	accuracy, ROC AUC	NVidia Quadro M6000	N/M	See paper for full breakdown. TLDR; Doesn't outperform or underperform anything by a lot. (however, it uses two orders of magnitude fewer parameters)	DeepConvNet (Schirrmeister, 2017) ShallowConvNet (Schirrmeister, 2017) Riemannian EEG (Barachant, 2015) FBCSP	DL & Trad.	Repeated-measures ANOVA	1) Summarizing averaged outputs of hidden unit activations. 2) Visualizing the convolutional kernel weights. 3) Calculating single-trial feature relevance on the classification decision Also used DeepLIFT (Shrikumar 2017)	Analysis of activations, Analysis of weights, Ablation of filters, DeepLIFT	In this work we proposed EEGNet, a compact convolutional neural network for EEG-based BCIs that can generalize across different BCI paradigms (e.g. ERP and oscillatory-based) in the presence of limited data and can produce interpretable features. To the best of our knowledge, this represents the first work that has validated the use of a single network architecture across multiple BCI datasets, each with their own feature characteristics and data set sizes. Through the use of feature visualization and ablation analysis, we show that neurophysiologically interpretable features can be extracted from the EEGNet model	N/M	Yes	GitHub	No	Yannick Roy	Hubert Banville	Yes	Lawhern2018
59		A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series	2018	Chambon, Galtier, Arnal, Wainrib & Gramfort	IEEE Transations on Neural Systems and Rehabilitation Engineering	Yes	Journal	Telecom ParisTech, Inria, Université Paris-Saclay	France	12	Classification of EEG signals	Clinical	Sleep	Staging	Improve State-of-the-Art		Sleep		N/M	Sleep events	MASS	Public	MASS (61 out of 62) 61 nights (8h) each from diff subjects (30s windows, no overlap)	58560	29280	61	20	128		1) Low-pass @30Hz	Yes	N/M	N/M	Raw EEG + EOG and raw EMG	Raw EEG	z-score	Keras + Tensorflow	ConvNet	CNN	3 conv layers + dense (per modality)	Layer 1: spatial filter Layers 2, 3: temporal filters	Yes	Nb channels * 30 s	4	4	Linear, ReLU, Softmax	25% (last layer)	Yes			5	<10^5	1) Training on a single 30-s epoch 2) Freezing net, and train last layer on multi-epochs	Pre-training	Adam	Adam			Random searches with the hyperopt Python packag	Yes	No	Categorical cross-entropy	Inter	Leave-p-subject-out 5 random permutations	Leave-N-Subjects-Out	Train: 41 records Valid: 10 records Test: 10 records	Balanced accuracy F1-score, Precision, Sensitivity, Specificity, Confusion matrix	balanced accuracy, f1-score, precision, sensitivity, specificity, confusion matrix	N/M*	~250 s	Acc: ~80% Bal. acc.: ~80% Kappa: ~0.7 F1 score: ~0.71	Gradient boosting on time domain and freq. domain features Univariatie ConvNets from Tsinalis et al. (2016) and Supratak et al. (2017)	DL & Trad.	No	Occlusion sensitivity	Occlusion of input	1D convolution provided a speed-up vs. 2D convolutions Smaller number of parameters than other studies Temporal context helps for some classes, but not for others; recurrent architectures could help Size of dataset matters		No	N/A	Yes	Hubert Banville	TBR	TBC	Chambon2018
60		Converting Your Thoughts to Texts: Enabling Brain Typing via Deep Feature Learning of EEG Signals	2018	Zhang, Yao, Sheng, Kanhere, Gu, Zhang	IEEE International Conference on Pervasive Computing and Communications (PerCom)	Yes	Conference	University of New South Wales Macquarie University RMIT University	Australia	10	Classification of EEG signals	BCI	Active	Motor imagery	Improve SOTA	Joint CNN & LSTM + AE for Motor Imagery (5 classes)	Motor Imagery (5 classes) (see eegmmidb dataset)	EEG processing is time-consuming and depend on human expertise. SOTA models achieve 70-80% which is not enough.	N/M, EPOC (Emotiv)	Motor Imagery	eegmmidb; Internal Recordings	Public	1) eegmmidb: 28,000 samples x 10 subjects (28000 points @ 160Hz = 175s/subject) 2) Internal (Emotiv): 34,560 samples x 7 subjects (28000 points @ 128Hz = 270s/subject) (window length = 1 point)	28000; 34560	29.2; 31.5	10; 7	64; 14	160; 128	Offline and Online	N/M	N/M	No	No	Raw EEG (None)	Raw EEG	N/M	N/M	CNN + LSTM + linear AE + XGB (classification)	Other	CNN & LSTM are parallel, then combined for the AE then XGB classifier	CNN for Spatial and RNN for Sequential info	Yes	1 x 64 (sample x channels)	LSTM: 6 layers CNN: 2 Conv + 2 FC	6	ReLU, Sigmoid, tanh	L2	Yes	5	eegmmidb: eye closed, left hand, right hand, both hands, both feet emotiv: up arrow, down arrow, left arrow, right arrow, eye closed	5	N/M	Standard optimization	Standard	LSTM & CNN: Adam AE: RMSProp	Adam	Full table on optim params	7000	N/M (they have tried many config, manually I suppose)	N/M	N/M	LSTM + CNN: Cross-Entropy AE: MSE	Intra	No	No	Train: 75% Test: 25%	accuracy, precision, recall, F1 score, ROC curve, and ROC AUC	accuracy, precision, recall, f1-score, ROC, ROC AUC	N/M	2000 s	DS #1 - Accuracy: 0.955 DS #2 - Accuracy: 0.9427	Baselines: KNN, SVM, RF, LDA, AdaBoost, RNN, CNN Externals: Almoari, Sun, Mohammad, Major, Shenoy, Tonic, Rashid, Ward, Sita, Pinheiro. (all different papers, see Table IV)	DL & Trad.	No	No	No	The classification accuracy of the public dataset (eegmmidb) is consistently higher than the local real-world dataset (emotiv). Our future work will focus on improving the accuracy in the person-independent scenario, wherein some subjects participate in the training and the rest of subjects involve in the testing.	N/M	Yes	GitHub	No	Yannick Roy	Hubert Banville	Yes	Zhang2017g
61		MindID: Person Identification from Brain Waves through Attention-based Recurrent Neural Network	2018	Zhang, Yao, Kanhere, Liu, Gu & Chen	ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies	Yes	Conference	University of New South Wales, Australia Tsinghua University RMIT University, Australia	Australia	20	Classification of EEG signals	Personal trait/attribute	Person identification		Improve SOTA: EEG for Person Identification	Use RNN on EEG for Person Indentification	3 Different Datasets. (they claim that Delta has the most personal info)	The DL motivation is not clear. They want to improve SOTA.	EPOC (Emotiv), N/M	Delta Band*	Internal Recordings; eegmmidb	Both	DS1 (EID-M): 21,000 Samples/Subject. Total: 168,000 DS2 (EID-S): 7,000 Samples/Subject. Total: 56,000 DS3 (EEG-S): 8 subjects x 7,000 samples (window length = 1 point)	168000; 56000; 56000	21.9; 7.3; 5.8	8; 8; 8	14; 14; 64	128; 128; 160		1) Remove DC Offset (substract) 2) Band-Pass Filter: 0.5 - 4Hz (using only Delta)	Yes	No	No	Delta Band	Frequency-domain	z-score	Matlab	Attention-based Encoder-Decoder RNN + XGB Classifier	RNN	Encoder, Decoder, Attention Module + XGB Classifier	N/M	Yes	1x14 Delta Bands / Channel (not clear about the dimensionality)	Encoder: 3 FC (164) + 1 LSTM (164) Decoder: 1 FC (164)	4	N/M	L2	Yes			8 One-Hot Label (ID - 8 Subjects)	N/M	N/M	N/M	Adam	Adam	LR:	21,000 samples (?)	N/M	N/M	N/M	Cross-Entropy	Inter	No	No	Train / Test DS1: 147,000 / 21,000 DS2: 49,000 / 7,000 DS3: 49,000 / 7,000	Precision Recall F1-Score	precision, recall, f1-score	Nvidia Titan X Pascal 768G memory 145 TB PCIe SSD	N/M	Precision \| Recall \| F1-Score DS #1: 0.982 \| 0.982 \| 0.982 DS #2: 0.988 \| 0.988 \| 0.988 DS #3: 0.999 \| 0.999 \| 0.999	SVM, RF, KNN, AdaBoost, LDA, XGB, RNN	DL & Trad.	No	No	No	Moreover, the pre-trained model should be updated for a period of time since the user’s EEG data is gradually changed with the environmental factors such as age, mental state, and living style. One of our future work is to develop an online learning system which is enabled to automatically update the training dataset based on the testing data which is collected during the operating period.	Limited by the local experimental conditions, our study only gathered EEG data from 8 subjects with few trials. The dataset is only divided into two categories (Multi and Single), which is not enough to explore the change trend of the identification accuracy with the increase of data trials.	Yes	GDrive	Yes	Yannick Roy	TBR	Yes	Zhang2017e
62
63		A convolutional neural network for steady state visual evoked potential classification under ambulatory environment	2017	Kwak, Muller & Lee	PLOS One	No	Journal	Korea University, TU Berlin	South Korea		Classification of EEG signals	BCI	Reactive	SSVEP	Improve SOTA	Improve robustness of SSVEP BCIs for exoskeleton control in ambulatory conditions	SSVEP	-	MOVE (BrainProducts)	SSVEP	Internal Recordings	Private	2 datasets (50x5s + 250x5s) x 7 subjects 5s trial into 300 x 2s trials (2s sliding window, 10ms shift size)	630000	175	7	8	1000	Offline	1) Notch filter @60Hz 2) Band pass from 4-40 Hz	Yes	No	No	120 FFT bins from 5-35 Hz	Frequency-domain	min-max	N/M	1,2) CNN 3) MLP	CNN	1) CNN (3 layers) 2) CNN (4 layers) 3) MLP (3 layers)	First conv layer: spatial filter Second conv layer: spectral filter	Yes	120 x 8 Freq x channels	1) 3 2) 4 3) 3	4	Sigmoid	N/M	N/M	5	Walk Forward Turn Left Turn Right Stand Up Sit Down	5	N/M	Standard optimization	Standard	SGD	SGD	Learning rate: 0.1	N/M	N/M	N/M	Sliding Window (Shifted by [10-60ms] over 2s win)	N/M	Intra	10-Fold	k-fold	Train: 90% Test: 10% Chronological split	Accuracy	accuracy	N/M	N/M	Static condition: up to 99.28% Ambulatory condition: up to 94.03%	CCA, MSI, CCA + kNN	Traditional pipeline	Yes (not clear what method)	Visualization of activations	Analysis of activations	CNN-1 (3 layers) was the most robust. Since architecture is pretty simple, no regularization is used.	Artefacts in ambulatory settings	No	N/A	No	Hubert Banville	Yannick Roy	TBC	kwak2017
64		Mental Tasks Classification using EEG signal, Discrete Wavelet Transform and Neural Network	2017	Padmanabh, Shastri & Biradar	Discovery	No	Journal	Savitribai Phule Pune University	India		Classification of EEG signals	BCI	Active	Mental tasks	[Classification of 5 different mental tasks, via Wavelet & ANNs (PNN & MLP)]		5 Mental Tasks (Baseline, Multiplication, Rotation, Counting, Letter composition)		7P511 (Grass Instruments)		Keirn & Aunon (1989)	Public	5 subjects x 5 tasks x 5 trials x 10s @250Hz (1s windows)	1250	20.8	5	6	250		1) Band-Pass filter: 0.1-100Hz	Yes	N/M	N/M		Frequency-domain		MATLAB & NNtool	MLP PNN	FC			No	200x1	2 (20; 15)	2		N/M	N/M						Standard	N/M	N/M	Learning Rate: 0.9		N/M	N/M	No	MSE	N/M	N/M	No	N/M	accuracy	accuracy	N/M	N/M	MLP: 92% NPP: 100%	None	None	No	No	No			No	N/A	No	Yannick Roy	TBR	TBC	Padmanabh2017
65		Cross-session classification of mental workload levels using EEG and an adaptive deep learning model	2017	Yin & Zhang	Biomedical Signal Processing and Control	No	Journal	University of Shanghai for Science and Technology East China University of Science and Technology	China		Classification of EEG signals	Monitoring	Cognitive	Mental workload	New approach		ACAMS (Automation-enhanced Cabin Air Management System)		(Nihon Kohden)	PSD	Internal Recordings	Private	7 subjects x (5min + 6x15min + 5min) x 2 sessions (2s windows, no overlap)	42000	1400	7	11	500		1) Low-Pass filter: 40Hz 2) ICA for EOG artifacts	Yes	Yes	Yes	PSD Avg. Power: T (5–7.5 Hz), A (8–13.5 Hz), B1 (14–20 Hz), B2 (20.5–30 Hz), G (30.5–40 Hz)	Frequency-domain		MATLAB	SDAE	AE	Adaptive Stacked Denoising AutoEncoder		Yes	55x1 EEG PSD Features	6	6		N/M	N/M			2	N/M*		Other	N/M	N/M			Grid search	Yes	Gaussian noise on Freq features	N/M*	Intra	N/M	No	Train: 66% Test: 33%	accuracy, confusion matrix, sensitivity, specificity	accuracy, confusion matrix, sensitivity, specificity		N/M*	[Complicated ... read me again ...] SDAE > State of the art.	ANN, NB, kNN, SVMlin, SVMrbf, BSV, SDAE	DL & Trad.	Wilcoxon sign-rank test	3D scatter plots of layer activations	Analysis of activations	It is evident that the proposed method is superior to those shallow and static classifiers when the comprehensive cortical information is adopted as the network inputs.		No	N/A	No	Yannick Roy	TBR	TBC	Yin2017a
66		Generative Adversarial Networks Conditioned by Brain Signals	2017	Palazzo, Spampinato, Kavasidis, Giordano & Shah	ICCV	No	Conference	University of Catania, University of Central Florida	Italy	9	Generation of data	Generating images conditioned on EEG			New approach: generating images conditioned on EEG	Generating images using GANs conditioned by EEG representation	Visual presentation of images	Allows image generation	BrainAmp (BrainProducts)	Raw EEG	Internal Recordings	Private	6 Subjects x 50 images x 40 classes 1400s per subject (4 sessions of 350s) 2000 images x 6 subjects = 12000 minus exclusions = 11,466 valid samples	11466	140	6	128	1000		1) Hardware notch filter: 49-51 Hz 2) Band-pass filter: 14-70 Hz 3) Non-uniform quantization of the voltage values	Yes	N/M	N/M	Raw EEG	Raw EEG	N/M	N/M*	1) LSTM for EEG encoder 2) DCGAN for image generation	Other	Conditional DCGAN (conditioning G and D)	N/M	Yes	Nb channels * 0.5 s	1) 2 2) 5 (generator), 6 (discriminator)	6	1) ReLU 2) ReLU	N/M	N/M			1) 40 2) 64 x 64	N/M	1) Train encoder to predict image category from raw EEG 2a) Train GAN on images without EEG features 2b) Train GAN condtioned on average (across subs) EEG representation learned by the encoder	Other	Adam (lr=0.001)	Adam	1) N/M 2) Batch normalization	1) 16 2) N/M	N/M	N/M	1) Nothing on EEG 2) Resizing + Flipping on Images	1) categorical cross-entropy. 2) non-saturating	Inter	N/M; [TBD]	No	Train: 80% Valid: 10% Test: 10%	1) Accuracy 2) Inception score, Inception accuracy	accuracy, inception score, inception accuracy	2 Titan X Pascal	N/M	Encoder: 83.9% GAN: IS: 4-6.5, acc: 43%	No	None	No	No	No	Conditioning vector (i.e. EEG representations) are noisy, which makes harder to learn how an appropriate conditioning vector.	Suffers from classes with high internal variability Dataset is small	No	N/A	No	Isabela Albuquerque	Hubert Banville	TBC	Palazzo2017
67		The effects of pre-filtering and individualizing components for electroencephalography neural network classification	2017	Major & Conrad	IEEE SoutheastCon	No	Conference	University of North Carolina (Charlotte)	USA	6	Classification of EEG signals	BCI	Active	Motor imagery	Improve State-of-the-Art: Exploring impact of ICA preprocessing	Analyze effectiveness of using ICA to enhance EEG that will be processed by a neural network	Motor imagery	"Since every brain computer interface (BCI) has to be tailored for each person it is advantageous to use a neural network"	N/M	Raw EEG	eegmmidb	Public	109 subjects x 14 experiments (12x2min + 2x1min) Not clear how many samples they used... (??s windows)	N/M	2834	109	64	160	Offline	1) Band pass filter: 8-30Hz	Yes	Yes	Yes	Raw EEG	Raw EEG	N/M	Matlab	MLP	FC	N/M	N/M	Yes	16x ?	10	10	N/M	N/M	N/M	2	Left Grasp Right Grasp	2	N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	Inter	No	No	Train: 2/3 Test: 1/3	Accuracy	accuracy	N/M	N/M	With ICA: 68% Without ICA: 56%	No	None	No	No	No	Applying ICA to raw data improves the neural network performance.	N/M	No	N/A	No	Isabela Albuquerque	Yannick Roy	TBC	Major2017
68		Convolutional neural network-based transfer learning and knowledge distillation using multi-subject data in motor imagery BCI	2017	Sakhavi & Guan	IEEE Conference on Neural Engineering	No	Conference	NUS & NTU	Singapore	4	Classification of EEG signals	BCI	Active	Motor imagery	Transfer learning (from one subject to another)	Reduce calibration time in a BCI using transfer learning	Motor imagery	Reduce BCI's calibration time	N/M	Raw EEG	BCI Competition IV - IIa	Public	BCI competition IV-2a dataset (4 out of 9 subjects) x 4 classes x 72 samples x 2 session x 4 seconds. After removing data: ∼ 1000 per class per session (4s windows, no overlap)	8000	153.6	9	22	250		1) Bandpass between 0.5-100 Hz 2) Notch filter @50 Hz	Yes	N/M	N/M	FBCSP in 9 frequency bands, then extracting envelope	Frequency-domain	weird z-scoring	Torch7	CNN + MLP	CNN	CNN: 5 layers MLP: 1 layer	-	Yes	CNN: 32x40 MLP: 32	CNN: 4 conv, 1 FC MLP: 1 FC	5	ReLU	N/M	N/M			CNN: 128 MLP: 128	N/M	1) Pre-train CNN+MLP on N-1 subjects 2) Fine-tune pre-trained network on 1 subject	Pre-training	Adam	Adam	N/M	N/M	N/M	N/M	N/M	KL divergence	Inter	Leave-N-Samples-Out	Leave-N-Samples-Out	Train: 5, 10, 20 samples / class Test: Remaining	Test set accuracy	accuracy	N/M	N/M	Average acc: 69.71%	SVM	Traditional pipeline	Wilcoxon sign-rank test	No	No	Best results (average across subjects) show significant improvement with respect to SVM. However, there is high variability	Choosing hyperparameter lambda	No	N/A	No	Hubert Banville	Isabela Albuquerque	TBC	Sakhavi2017
69		Single-trial EEG classification of motor imagery using deep convolutional neural networks	2017	Tang, Li & Sun	Optik - International Journal for Light and Electron Optics	No	Journal	Zhejiang University of Technology	China	8	Classification of EEG signals	BCI	Active	Motor imagery	New Approach	CNN for MI on Single Trial	Motor Imagery	Automated feature extraction	ActiveTwo (BioSemi)	SMR - ERD/ERS	Internal Recordings	Private	2 subjects x 460 trials 3s epochs, and splitted in 50ms windows (50ms windows)	55200	46	2	28	1000	Offline	1) [Hardware] Notch Filter: 50Hz 2) [Hardware] Band-Pass Filter: 0.5-100Hz 3) [Software] Band-Pass Filter: 8-30Hz	Yes	No	No	SMR - ERD/ERS	Frequency-domain	N/M	N/M	CNN	CNN	Activation Function: Hyperbolic Tangent	N/M	Yes	28x60 Channels x Time Points	2 Conv 1 FC	3	Tanh Sigmoid	N/M	N/M	2	Left hand Right hand	2	N/M	Standard	Standard	GD	SGD	N/M	N/M	N/M	N/M	No	N/M*	Intra	10-Fold CV	k-fold	Train: 80% Test: 20%	confusion matrix, accuracy, ROC, precision, recall, f-score	confusion matrix, accuracy, ROC, precision, recall, f-score	N/M	N/M	Accuracy: 86.41%	Power+SVM CSP+SVM AR+SVM	Traditional pipeline	ANOVA	No	No	"The results demonstrate that CNN can further improve classification performance compared with other three conventional methods."	N/M	No	N/A	No	Yannick Roy	Isabela Albuquerque	TBC	Tang2017
70		Pattern Recognition of Momentary Mental Workload Based on Multi-Channel Electrophysiological Data and Ensemble Convolutional Neural Networks	2017	Zhang, Li & Wang	Frontiers in Neuroscience	No	Journal	East China University of Science and Technology	China	16	Classification of EEG signals	Monitoring	Cognitive	Mental workload	Improve State-of-the-Art	MWL classification with CNN & ECNN	ACAMS (Automation-enhanced Cabin Air Management System)	N/M	(Nihon Kohden)	PSD	Internal Recordings	Private	6 subjects x 2 sessions x 10 tasks x 5 min (2s windows)	18000	600	6	10	500	Offline	1) Low-Pass filter: 40Hz	Yes	No	No	PSD (STFT) Avg. Power: D (1-4Hz), T (5–8 Hz), A (9–13 Hz), B1 (14–16 Hz), B2 (17–30 Hz), G (31–40 Hz)	Frequency-domain	N/M	Python Matlab	CNN ECNN	CNN	Many architectures tested	N/M	Yes	102x10 (not clear what x what)	[2, 10] (tested many)	10	ReLU	N/M	N/M	4 and 7	Low/Normal/High/ - Unloaded - Very Low/Low/Medium/High/Very High/Overloaded	4 and 7	N/M	N/M	N/M	Nesterov Momentum Adagrad Adadelta Adam	Adam	(see paper, they describe each optimizer params)	N/M	N/M	N/M	N/M	Cross-Entropy	Inter	5-Fold CV	k-fold	Train: 50% Test: 50%	Accuracy Precision F-Measure G-Measure	accuracy, precision, f-measure, g-measure	Single Intel core i5 CPU, 4-GB memory, Windows	N/M	93%	LDA NB SDA	Traditional pipeline	No	No	No	"It was found that the deeper CNN model with the small convolutional kernels leads to improved classification performance." [YR] --> Like in other fields...	N/M	No	N/A	No	Yannick Roy	TBR	TBC	Zhang2017
71		Deep RNN learning for EEG based functional brain state inference	2017	Patnaik, Moharkar & Chaudhari	International Conference on Advances in Computing, Communication and Control (ICAC3)	No	Conference	Xavier Institute of Engineering, Mahim, Mumbai M G M Inst. of Health Sciences, Navi Mumbai	India		Classification of EEG signals	BCI	Active	Mental tasks	New Approach: Brain State Inference with RNN using Alpha Phase Coherence		5 Tasks: Baseline, Multiplications, Rotations, Letter Composition, Visual Counting (not using baseline)		N/M	ERD/ERS (looking at Alpha Cross Coherence - Occipital/Center)	Keirn & Aunon (1989)	Public	DB: 7 subjects x 10 sessions x 5 tasks x 10s Then they say they used 65 instances for 4 activities. not clear... (Sliding window of 50 samples)	N/M	33	7	6	250		1) Band-Pass Filter: 0.1-100Hz (Hardware) 2) ICA for EOG Artifacts 3) DWT to get Alpha Sub-Bands 4) Hilbert Transform (no-overlap) for Phase Coherence	Yes	Yes	Yes	Alpha Sub-Bands Phase Coherence	Frequency-domain		N/M*	Elman's RNN with Bottlenect	RNN	A 5-layer network with 53-400 - 50-200-20-T		Yes	[Shape Not Mentioned]	5	5		N/M	N/M			4 Classes	N/M*		Standard	N/M*	N/M			N/M	N/M	No	MSE	Inter	No	No	Train: 40 instances (/65) Test: 25 instances (/65)	Accuracy	accuracy		N/M*	90% for two tasks 82% for three tasks 77% for all the four tasks	No	None	No	No	No	"In this research, a RNN model is trained to identify the phase coherence patterns of EEG alpha-bands. Difference between EEG signals from central and occipital (C1-O1 & C2- O2) locations is considered to compute phase coherence patterns for various activities."		No	N/A	No	Yannick Roy	TBR	TBC	Patnaik2017
72		Deep Convolutional Neural Networks for Interpretable Analysis of EEG Sleep Stage Scoring	2017	Vilamala, Madsen & Hansen	IEEE International Workshop on Machine Learning for Signal Processing	Yes	Conference	Technical University of Denmark Danish Research Centre for Magnetic Resonance	Denmark		Classification of EEG signals	Clinical	Sleep	Staging	New Approach: CNN for Sleep Stages		Sleep		N/M	PSD	Sleep EDF	Public	SleepEDF 2 whole nights x 20 subjects (2 x ~10h x 20) 30s windows	48000	24000	20	1	100		Multitaper Spectral Estimation	Yes	N/M	N/M	Spectrogram log values (from Multitaper Spectral Estim.)	Frequency-domain		N/M*	CNN	CNN	VGGNET Activation Function: ReLU & Softmax Xavier’s initialisation.		No	224x224 (RGB Image)	16	16		Dropout	Yes			5 (Sleep Stage)	N/M*		Pre-training	Adam	Adam	Learning Rate: 10^-5 Mini-batch: 250 Decay Rate 1st & 2nd moments 0.9 & 0.999		N/M	N/M	No	Categorical cross-entropy	Inter	Leave-One-Subject-Out	Leave-One-Subject-Out	Train: 15 subjects Valid: 4 subjects Test: 1 subjects	Precision Sensitivity F1-score Accuracy	precision, sensitivity, f1-score, accuracy		N/M*	[VGG-FE] Precision: 91, Sensitivity: 73, F1-S: 81, Accuracy: 83 [VGG-FT] Precision: 93, Sensitivity: 78, F1-S: 84, Accuracy: 86	SSAE, CNN	DL	No	Sensititvity maps	Saliency map	Further improvement of the method includes better hyperparameter optimisation when generating the spectral images		No	N/A	Yes	Yannick Roy	TBR	TBC	Vilamala2017
73		Deep long short-term memory structures model temporal dependencies improving cognitive workload estimation	2017	Hefron, Borghetti, Christensen & Kabban	Pattern Recognition Letters	No	Journal	Air Force Institute Air Force Research Laboratory	USA	9	Classification of EEG signals	Monitoring	Cognitive	Mental workload	Improve State-of-the-Art: MWL classification with RNNs (LSTM).		Multi-Attribute Task Battery (MATB) environment	Using deep RNNs to account for temporal dependence considerably improves day-to-day feature stationarity	N/M	PSD (Raw EEG)	Internal Recordings	Private	6 of 8 subjects x 5 sessions x 6 of 9 trials x 5 min This process yielded 380 features for each second and approximately 9000 observations per individual for the five day period. (10s slidding windows, 9s overlap)	54000	900	6	19	256		The power spectral density was determined for 30 points spread out over a logspace from 3 Hz to 55 Hz by extracting power from complex Morlet wavelets [9] . Each wavelet was 2 s in length	Yes	N/M	N/M	Mean, Variance, Skewness, Kurtosis of PSD (delta (1–4), theta (4–8), alpha (8–14), beta (15–30), and gamma (30–55)) + all possible combinations of M, V, S, K.	Frequency-domain		Keras, Theano	LSTM	RNN	N/M*		Yes	600 x 30 x F (batch size, temporal depth in seconds, and number of features) (F varies between 90 and 380 features)	2 LSTM Layers (50 and 10 units	2		Dropout	Yes			1 (low or high WL)	N/M*		Standard	Mini-batch gradient descent (600 obs. per batch) Adam, Dropout 20%	Adam		600	Random search	Yes	No	Binary Cross-Entropy	Intra	4-Fold CV	k-fold	Train: 3 days Valid: 1 day Test: 1 day	Accuracy	accuracy		N/M*	93% (using all measures: M/V/S/K)	linear SVM (SVM-L), Radial Basis Function (RBF) SVM (SVM-R), feedforward ANN (ANN), deeply stacked simple RNN (RNN-D), single LSTM (LSTM-S), and deeply stacked LSTM (LSTM-D)	DL & Trad.	ANOVA, Tukey HSD	No	No	There is an abundance of future work to be pursued in this area. Due to time constraints and computational complexity, only a select number of deep architectures were examined during this re- search. A thorough evaluation of different deep RNN architectures to include variations in the depth of hidden layer recurrent con- nections, stacking of different sized LSTM layers, and interleaving fully-connected feedforward layers between sequence-to-sequence recurrent layers may yield additional improvement.		No	N/A	No	Yannick Roy	TBR	TBC	Hefron2017
74		The signature of robot action success in EEG signals of a human observer: Decoding and visualization using deep convolutional neural networks	2017	Behncke, Schirrmeister, Burgard & Ball	Arxiv	Yes	Preprint	Albert-Ludwigs-University Freiburg University Medical Center Freiburg	Germany	6	Classification of EEG signals	BCI	Reactive	ERP	Novel Approach: DL for Robot Error Detection	Comparing CNN to rLDA and FB-CSP (both state of the art) for error detection in human-robot interaction	Participant watching short videos of robots "performing naturalistic actions either in a correct or an erroneous manner"	Deep Learning has been tried for other EEG decoding tasks	N/M	Error Potential	Internal Recordings	Private	5 subjects x 720 trials + 12 subjects x 800 trials x ~20s/trials (using ~2.8s/trials)	13200	616	17	128	N/M	Offline	1) Re-reference to common average (CAR) 2) Downsampled to 250 Hz	Yes	No	No	Raw EEG	Raw EEG	Electrode-wise exponential moving standardization	Braindecode	CNN	CNN	Deep ConvNet from braindecode paper	Layer 1: temporal filtering, Layer 2: spatial filtering, with no non-linearity in-between (Braindecode)	No	Time x channels	5	5	ELU	Dropout Early stopping	Yes	2	Error No error	2	N/M	Standard optimization	Standard	Adam	Adam	N/M	N/M	N/M	N/M	N/M	Categorical cross-entropy	Intra	No	No	N/M	Accuracy	accuracy	N/M	N/M	KPO Error (2.5-5s): (78.2 ± 8.4) % KPO Error (3.3-7.5s): (71.9 ± 7.6) % RGO Error (4.8-6.3s): (59.6 ± 6.4) % RGO Error (4-7s): (64.6 ± 6.1) %	rLDA FB-CSP (CNN is better)	Traditional pipeline	Permutation test on individual decoding results Wilcoxon signed-rank tests	Correlation of changes in ConvNet predictions with perturbation changes in 1) input spectral amplitudes and 2) time domain signals to obtain information about what the deep ConvNets learned from the data	Input-perturbation network-prediction correlation maps	"Among other recent advances in the field of deep learning research, automatic hyperparameter optimization and architecture search, including recurrent and residual network architectures, data augmentation, using 3-D convolutions, or increasing the amount of training data all have the potential to further increase ConvNet performance."	N/M	No	N/A	No	Yannick Roy	Hubert Banville	TBC	Behncke2017
75		Deep learning with convolutional neural networks for EEG decoding and visualization	2017	Schirrmeister, Springenberg, Fiederer, Glasstetter, Eggensperger, Tangermann, Hutter, Burgard, Ball	Human Brain Mapping	Yes	Journal	University of Freiburg	Germany	30	Classification of EEG signals	BCI	Active	Motor imagery	Improve SOTA Feature visualization/interpretability	Find out best CNN architecture for EEG decoding	Motor imagery/execution	Can learn from raw data	N/M	None	BCI Competition IV - IIa; Internal Recordings; BCI Competition IV - IIb; Mixed Imagery Dataset	Both	DS #1 - BCI Comp IV - IIa: 9 * 2 * 288 = 5184 x 4s DS #2 - Internal Recordings: 14 * 1000 = 14000 x 4s DS #3 - BCI Competition IV - IIb: 9 * 720 = 6480 x 4s DS #4 - Mixed Imagery Dataset: 4009 trials / 37830 w (DS4: 2s window, 1.5s overlap)	5184; 14000; 6480; 37830	345.6; 933.33; 432; 267	9; 14; 9; 4	22; 44; 3; 64	250; 250; 250; 250		BCI Competition Datasets: 1) Lowpass @38 Hz	Yes	Yes (removed trials with at least one channel > 800 uV)	Yes	Raw EEG	Raw EEG	Electrode-wise exponential moving standardization	Lasagne	CNN	CNN	1) Deep ConvNet 2) Shallow ConvNet 3) Hybrid of 1) and 2) with 2 dense layers 4) ResNet	1) Layer 1: temporal filtering, Layer 2: spatial filtering, with no non-linearity in-between 2) Embedding FBCSP in a ConvNet 3) Combining 1 and 2 4) 2 layers like in 1)	Yes		1) 5 2) 2 3) max(2, 5) + 2 = 7 4) 31	31	1) ELU 2) Square, log 3) ELU, square & log 4) ELU	Dropout (0.5) Early stopping	Yes			2 or 4	N/M	Standard optimization	Standard	Adam	Adam	Batch norm	N/M	N/M	N/M	Crops (sliding windows within 1 trial)	Categorical cross-entropy For cropped training: "Tied loss function"	Intra	No	No	1) 288 - 288 2) 880 - 160 3) 400 - 320 4) Variable per subject	Accuracy Confusion matrices	accuracy, confusion matrix	Geforce GTX Titan Black Intel Xeon @2.60 GHz with 32 cores 128 GB RAM	N/M		Filter bank common spatial patterns	Traditional pipeline	Wilcoxon sign-rank test	Input-feature unit-output correlation maps (visualization of correlation between spectral bands and receptive fields) Input-perturbation network-prediction correlation map (perturbing the input and visualizing change in output of net)	Input-feature unit-output correlation maps, Input-perturbation network-prediction correlation maps	ConvNets reached FBCSP accuracies ConvNet design choices substantially affects decoding accuracies Recent DL advances substantially increases accuracies ResNet performed worse than deep ConvNet Cropped training strategy improves performance on higher frequencies And much more!	ConvNets can be too flexible, especially if there is a specific type of brain activity that a user should use	Yes	GitHub	Yes	Hubert Banville	TBR	TBC	Schirrmeister2017
76		Optimal Feature Selection and Deep Learning Ensembles Method for Emotion Recognition From Human Brain EEG Sensors	2017	Mehmood, Du & Lee	IEEE Access	No	Journal	Chonbuk National University, Nanjing University of Posts and Telecommunications	South Korea	10	Classification of EEG signals	Monitoring	Affective	Emotion	Improve SOTA	Ensemble Method with DL and others to improve SOTA in EEG emotion classification	Watching "Emotional" Images from IAPS database.	Using ensemble approach.	EPOC (Emotiv)	Emotions	Internal Recordings	Private	21 subjects x 4 classes x 2 sessions x 45 trials 360 epochs, 368s / session (1.5s windows, no overlap)	7560	189	21	14	128		1) Artifact Removal (cites: Gómez-Herrero et al., 2006) 2) Filtering (cites: Widmann et al., 2012) 3) Epoching	Yes	Yes	Yes	Hjorth parameters for different frequency ranges + ANOVA feature selection	Frequency-domain	N/M	EEGLAB Matlab WEKA	"Deep Learning" (they don't even specify)	N/M	Ensemble: LDA, KNN, SVM, Naive/Bayes-Net, DT, RF, Deep Learning They don't describe the DL model at all	N/M	Yes	3 Hjorth params for each of the 5 frequencies (?)	N/M	N/M	N/M	N/M	N/M			N/M	N/M	Pre-Training and Fine Tuning	Pre-training	N/M	N/M	N/M	N/M	N/M	N/M	No	N/M	Inter	10-Fold CV	k-fold	Train: 90% Test: 10%	Accuracy	accuracy	SOTA Server 4 TITAN-X (Pascal)	N/M	Accuracy: 76.62%	Jirayucharoensak et al., 2014 (SAE): 46/50% Chanel et al., 2006 (FDA, Naive Bayes): 72% Khalili et al., 2008 (LDA, KNN): 61% Horlings et al., 2008 (SVM): 37/32% Jenke et al., 2014 (...): 45% Yin et al., 2017 (SAE, Ensemble): 84/83% Atkinson et al., 2016 (...): 73/73%	DL & Trad.	ANOVA	No	No	Comparatively, the proposed method performs better than existing emotion recognition methods. The proposed feature selection method OF obtained the best emotion recognition rates of 76.6% for Voting ensembles method. Based on our results, we conclude that optimal feature selection is a good choice for enhancing the performance of EEG-based emotion recognition.	To further improve emotion recognition performance, we need to explore additional feature combinations with more emotional classes in the arousal–valence domain.	No	N/A	No	Yannick Roy	TBR	TBC	Mehmood2017
77		Emotion Recognition based on EEG using LSTM Recurrent Neural Network	2017	Alhagry, Fahmy & El-Khoribi	International Journal of Advanced Computer Science and Applications (IJACSA)	No	Journal	Cairo University	Egypt		Classification of EEG signals	Monitoring	Affective	Emotion	Improve SOTA: Using LSTM on raw EEG to classify emotions (arousal, valence, liking)		Emotion Classification on DEAP (Like or Dislike video)		N/M	Raw EEG	DEAP	Public	DEAP: 32 subjects x 40 x 1min 12x 5s windows per video (5s window, no overlap)	15360	1280	32	32	512		1) Downsampled to 128Hz (in the dataset) 2) Re-reference to Common Average (in the dataset) 3) Eye Artifacts Removed (in the dataset) 4) High-Pass Filter [freq not mentioned]	Yes	Yes	Yes	Raw EEG (None)	Raw EEG		Keras, TensorFlow	LSTM	RNN	AF: ReLU and Sigmoid		Yes	5s segments x 32 channels (672 x 32)	2 LSTM Layers (64,32) + 1 Dropout (0.2) + 1 FC	3		Dropout	Yes			3 Classes	5534113		Standard	RMSProp, LR:0.001	Other			N/M	N/M	No	N/M*	Intra	4-Fold CV	k-fold	Train: 75% Test: 25%	Average Accuracy	accuracy		N/M*	Arousal: 85.65% Valence: 85.45% Liking: 87.99%	Traditional pipelines Koelstra et al., [2]: 62 \| 56 \| 55 % Atkinson ... [3]: 73 \| 73 \| - % Yoon and Chung [6]: 70 \| 70 \| - % Naser and Saha [7]: 66 \| 64 \| 70 % proposed method: 86 \| 85 \| 88 %	Traditional pipeline	No	No	No	Results show that the proposed method is a very promising choice for emotion recognition, because of its powerful ability to learn features from raw data directly. It achieves high average accuracy over participants compared to the traditional feature extraction techniques.		No	N/A	No	Yannick Roy	TBR	TBC	Alhagry2017
78		Intent Recognition in Smart Living Through Deep Recurrent Neural Networks	2017	Zhang, Yao, Huang, Sheng & Wang	International Conference on Neural Information Processing (ICONIP)	Yes	Conference	University of New South Wales, AU Macquarie University, AU Singapore Management University, Singapore	Australia	11	Classification of EEG signals	BCI	Active	Motor imagery	Improve SOTA	Using LSTM on multiclass BCI open dataset Use hyperparameter fine-tuning method	Motor Imagery (see eegmmidb dataset)	Explore multiclass as opposed to binary classification like many others. BCI at home will be multiclasses.	N/M	Intent / Motor Imagery	eegmmidb	Public	eegmmidb: 10 subjects x 28,000 samples (28000 points @ 160Hz = 175s/subject) (window length = 1 point)	28000	29.2	10	64	160		None	No	N/M	N/M	Raw EEG (None)	Raw EEG	N/A	N/M	LSTM	RNN	N/A	N/M	Yes	1 x 64 (sample x channels)	5	5	Sigmoid	L2	Yes	5	eegmmidb: eye closed, left hand, right hand, both hands, both feet emotiv: up arrow, down arrow, left arrow, right arrow, eye closed	5	N/M	N/M	N/M	Adam	Adam	LR: 0.004 Lambda: 0.005	N/M	Orthogonal Array (OA) experiment method	Yes	N/M	Cross-Entropy	Inter	No	No	Train: 75% Test: 25%	Accuracy Recall F1 Score ROC	accuracy, recall, f1-score, ROC	N/M	N/M	Accuracy: 0.9545 Recall: 0.9228 F1: 0.9382 AUC: 0.9985	Almoari [2] 0.7497, Sun [13] 0.65, Major [4] 0.68, Shenoy [12] 0.8206, Tolic [16] 0.6821, Ward [19] 0.8, Pinheiro [10] 0.8505 KNN (k=3) 0.8369, SVM 0.5082, RF 0.7739, LDA 0.5127, AdaBoost 0.3431, CNN 0.8409	DL & Trad.	No	No	No	To achieve optimal recognition accuracy, we employ OA to op- timize the hyper-parameters. In this paper, we select five most common hyper-parameters including λ (the coefficient of L2 norm), lr (learning rate), Ki(the hid- den layer nodes size), I (the number of layers), and nb (the number of batches).	N/A	Yes	Website	No	Yannick Roy	TBR	TBC	Zhang2017d
79		Deep Recurrent Neural Networks for seizure detection and early seizure detection systems	2017	Talathi	Arxiv	Yes	Preprint	Lawrence Livermore National Lab	USA		Classification of EEG signals	Clinical	Epilepsy	Detection	Improve SOTA: Using RNN for early seizure dectection	Using GRU-RNN for early seizure detection	Resting State, Eyes Open, Eyes Closed, Seizures.	Using available data to test RNNs for seizure detection.	N/M	Seizures	Bonn University	Public	Bonn University 5 x 100 x 23.6s 173.61 x 23.6 = 4097 --> 51 segments x 80 (0.46s windows)	25500	197	15	1	173.6		None (see dataset preprocessing steps)	No	N/M	N/M	Raw EEG (None)	Raw EEG	N/A	Keras	GRU (RNN)	RNN	GRU -> FC -> GRU	GRU for RNN long-term dependencies, but control the vanishing gradient	Yes	51 x 80 x 1 (51 EEG sub-segment x 80 values x 1 channel)	GRU: 2 FC: 1	3	N/M	N/M	N/M			3 (Logistic Regression with Softmax)	In the order of 100,000	(1) We train the RNN in stateful-mode*. (2) Rescaling the learning rate by factor 0.1 at each 100th epoch	Standard	Adam	Adam	LR: 0.01	N/M	N/M	N/M	N/M	N/M	Inter	No	No	Train: 50% Test: 50%	Accuracy	accuracy	N/M	N/M	98% Accuracy within the first 5 sec (3 classes: Healthy vs Ictal vs InterIctal)	They mentioned (A. T. Tzallas et al., 2007) getting 98% accuracy (ANN).	Traditional pipeline	No	No	No	This findings offers a strong support to the utility of GRU-RNN model for use in early-seizure detection system that can be extremely useful for developing closed loop seizure control systems where timely intervention can be leveraged to abate seizure progression	-	No	N/A	No	Yannick Roy	TBR	TBC	Talathi2017
80		DeepSleepNet: a Model for Automatic Sleep Stage Scoring based on Raw Single-Channel EEG	2017	Supratak, Dong, Wu & Guo	Arxiv	Yes	Preprint	Imperial College London	UK	11	Classification of EEG signals	Clinical	Sleep	Staging	Improve SOTA: Using CNN+LSTM for Sleep Stage Scoring from Raw EEG	Combining CNN + LSTM for Raw EEG and testing it on 2 different existing datasets	Sleep	Using RNN (LSTM) to capture time depencies in sleep stages.	N/M	Sleep Stages	MASS; Sleep EDF	Public	MASS: Used SS3, PSG recordings from 62 subjects Sleep EDF: Used 20 subjects (30s windows, no overlap)	58600; 41950	29300; 20975	62; 20	20; 2	256; 100		1) Notch filter: 60Hz 2) Band-pass filter: 0.30 - 100Hz	Yes	No	No	Raw EEG (None)	Raw EEG	N/M	TensorLayer eTRIKS	CNN + bi-LSTM	CNN+RNN	1D Conv, Batch Norm, Max Pooling	First part is representation learning, which can be trained to learn filters to extract time-invariant features from each of raw single-channel EEG epochs. The second part is sequence residual learning, which can be trained to encode the temporal information.	Yes	30s EEG Epoch (2 diff sampling freq)	2 CNN 2 bi-LSTM	4	ReLU	L2 Dropout (50%)	Yes			5 Sleep Stages (Softmax)	N/M	The two-step training algorithm (their technique) to prevent from suffering from class imbalance. The algorithm first pre-trains the representation learning part of the model and then fine-tunes the whole model using two different learning rates.	Pre-training	Adam	Adam	LR: 0.0001 b1: 0.9 b2: 0.999	100	N/M	N/M	Oversampling to balance classes (duplicating minority sleep stages)	Cross-Entropy	Inter	DS 1 - MASS) 31-Fold DS 2 - Sleep EDF) 20-Fold	k-fold	DS 1 Train: 60 subjects DS 1 Valid: 2 subjects DS 2 Train: 30 subjects DS 2 Valid: 1 subject	Precision (PR) Recall (RE) F1-score (F1) macro-averaging F1-score (MF1) Accuracy (ACC) Cohen’s Kappa coefficient (κ)	precision, recall, f1-score, macro-averaging f1-score, accuracy, Cohen's kappa	NVIDIA GeForce GTX980	The training time for each validation fold was approximately 3 hours on each node	Sleep EDF - Acc: 82.0 Sleep EDF - MF1: 76.9 Sleep EDF - k: 0.76 MASS - Acc: 86.2 MASS - MF1: 81.7 MASS - k: 0.80	Traditional pipelines & DL Sleep EDF: Y.-L. Hsu et al., 2013 Sleep EDF: R. Sharma et al., 2017 Sleep EDF: A. R. Hassan et al., 2017 Sleep EDF: O. Tsinalis et al., 2016a Sleep EDF: O. Tsinalis et al., 2016b MASS: H. Dong et al., 2016	DL & Trad.	No	Visualization of filter activations	Analysis of activations	It achieved similar overall accuracy and macro F1-score compared to the state-of-the-art hand-engineering methods on both the MASS and Sleep-EDF datasets, which have different properties such as sampling rate and scoring standards (AASM and R&K).	N/M	Yes	GitHub	No	Yannick Roy	TBR	TBC	Supratak2017
81		Mixed Neural Network Approach for Temporal Sleep Stage Classification	2017	Dong, Supratak, Pan, Wu, Matthews & Guo	IEEE Transaction on Neural Systems and Rehabilitation Engineering	Yes	Journal	Imperial College London	UK	11	Classification of EEG signals	Clinical	Sleep	Staging	Improve SOTA: Using Mixed NN on 1 channel EEG for Sleep Stage Scoring	Combining MLP + LSTM on 1-Channel Raw EEG from an existing (open) dataset	Sleep	Using RNN (LSTM) to capture time depencies in sleep stages and using a single, frontal (skin) electrode.	N/M	Sleep Stages	MASS	Public	MASS: 62 subjects (~ 494h) (30s windows, no overlap)	58600	29300	62	1	256		N/M Seems to directly do SFTF for freq features	N/M	No	No	PSD Features	Frequency-domain	N/M	Theano	Mixed NN (MNN) MLP + LSTM	RNN	N/M	Our MNN is composed of a rectifier neural network which suitable for detecting naturally sparse patterns [18], and a long short-term memory (LSTM) for detection of temporally sequential patterns [19]	Yes	30s EEG Epoch PSD	MLP: [2,5] LSTM: 1 (200-1000)	6	ReLU	Dropout	Yes			5 Sleep Stages (Softmax)	N/M	N/M	N/M	SGD	SGD	LR: 0.01 Momentum: 0.9 no weight decay	500	Manual fine tune	Yes	Oversampling to balance classes	Cross-Entropy	Inter	31-Fold CV	k-fold	Train: 60 subjects Valid: 2 subjects	Macro F1-score (MF1) Accuracy (ACC) Recall (RE) Precision (PR)	macro f1-score, accuracy, recall, precision	NVIDIA 630	2 days	MF1: 80.50 ACC: 85.92	SVM: 75.01 \| 79.70 (best with sequence 2) RF: 72.44 \| 81.67 (best with sequence 3) MLP: 77.23 \| 81.43 (best with sequence 4)	DL & Trad.	No	No	No	(1) In terms of convenience, wearing the F4 channel near the hair line is imperfect. Other frontal EEG channels such as Fp2 and Fpz are easier to wear, but these channels have lesser information about stage W, N1, N2 and N3. (2) In our experiment, we tried to add fully connected layers between LSTM and softmax, and vary their hidden sizes, but no improvement was found.	Less inofrmation in low frontal (skin) channels (They've identified 3 challenges) Challenge 1. Heterogeneity Challenge 2. Temporal Pattern Recognition Challenge 3. Comfort	No	N/A	No	Yannick Roy	TBR	TBC	dong2018mixed
82		SLEEPNET: Automated Sleep Staging System via Deep Learning	2017	Biswal, Kulas, Sun, Goparaju, Westover, Bianchi & Sun	Arxiv	Yes	Preprint	Georgia Institute of Technology Nanyang Technological University Massachusetts General Hospital	USA	17	Classification of EEG signals	Clinical	Sleep	Staging	Improve SOTA: Using CNN, RNN, CRNN for Sleep Stage Scoring	Trying CNN, LSTM, RCNN on 10,000 subjects on Raw EEG, Expert Feature Set and Freq Bands for Sleep Stage Scoring	Sleep	Leveraging huge dataset (3.2TB) of 10,000 subjects to apply deep learning	N/M	Sleep Stages	Internal Recordings	Private	10,000 overnight PSGs x ~8h / patient 80000 hours 3.2TB of data! Each 8h ~ 950-1000 labels (avg at 975) (30s windows, no overlap)	9750000	4800000	10000	6	200		None	No	No	No	3 Sets of Features: 1) Raw EEG 2) Experts Defined Features 3) Spectrogram	Combination	N/M	Tensorflow CUDA 8.0	1) CNN 2) RNN 3) RCNN	CNN+RNN	1) CNN: 1D Conv for Raw EEG / 2D Conv for Freq Features 2) RNN: Look back steps in RNN : [3,5,10,20,30]	By combining a RNN with CNN, we can have a hybrid model, namely, Recurrent-Convolutional Neural Networks (RCNN), which is able to extract features present in a spectrogram and preserve the long-term temporal relationship present in the EEG data	No	30s EEG Epoch (depending on feature set)	RNN: 5 LSTM (1000)	5	ReLU	Dropout	Yes			5 Classes	N/M	N/M	N/M	N/M	N/M	LR: [0.01 - 0.00001]	N/M	We performed 50 iterations of random search over a set of parameter choices for hyper-parameter tuning	Yes	N/M	Categorical Cross-Entropy	Inter	50 Iterations of random search for hyper-parameter tuning	Train-Valid-Test	Train: 8700 patients Valid: 300 patients Test: 1000 patients	Accuracy Cohen's Kappa	accuracy, Cohen's kappa	Intel Xeon E5-2640, 256GB RAM, four Nvidia Titan X	between 40 -100 min	[RNN] - Expert Defined Features: [Acc] 85.76 \| 79.46 [k] [RNN] - Spectrogram Features: [Acc] 79.21 \| 73.83 [k] [RNN] - Waveform Features: [Acc] 79.46 \| 72.46 [k] --- [RCNN] - Expert Defined Features: [Acc] 81.67 \| 76.38 [k] [RCNN] - Spectrogram Features: [Acc] 81.47 \| 74.37 [k] [RCNN] - Waveform Features: [Acc] 79.81 \| 73.52 [k]	Logistic Regression Tree Boosting MLP CNN RNN RCNN	DL & Trad.	No	No	No	On 1000 held-out testing patients, the best performing algorithm achieved an expert-algorithm level of inter-rater agreement of 85.76% with Kappa value 79.46%, exceeding previously reported levels of expert-expert inter-rater agreement for sleep EEG staging.	N/M	No	N/A	No	Yannick Roy	TBR	TBC	Biswal2017
83		DeepKey: An EEG and Gait Based Dual-Authentication System	2017	Zhang, Yao, Chen, Wang, Sheng & Gu	Arxiv	Yes	Preprint	University of New SouthWales Macquarie University RMIT University	Australia	20	Classification of EEG signals	Personal trait/attribute	Person identification		Improve SOTA: EEG for Person Identification	Use AR+RNN+SVM on EEG+Gait for Person Identification	Motor Imagery (see eegmmidb dataset) + Gait (PAMAP2 dataset)	The DL motivation is not clear. They want to improve SOTA.	N/M	Raw EEG	eegmmidb	Public	eegmmidb: 8 subjects x 13,500 samples 13,500 samples / 90 per window = 150 examples per subjects, 1200 total (90 points windows, no overlap)	1200	11.3	8	64	160		None (AR)	No	No	No	Raw EEG	Raw EEG	N/M	N/M	AR + RNN + SVM	RNN	N/M	AR for pre-processing, RNN for feature extracting, and SVM for classification. Auto-regressive Coefficients (AR) is one of the most widely used pre-processing methods on EEG data	Yes	150x13x64 (150 segments, 13 coefficients (AR), 64 features/nodes)	5 RNN (64)	5	N/M	L2	Yes			8 One-Hot Label (ID - 8 Subjects)	N/M	N/M	N/M	Adam	Adam	lambda is set as 0.004 while learning rate is set as 0.005	8 mini-batch with the shape of [150, 13, 64]	Orthogonal Array Experiment Method	Yes	N/M	Log Loss Function	Inter	No	No	Train: 87.5% Test: 12.5%	Accuracy	accuracy	N/M	N/M	Highest Accuracy: 0.9841 Gait: 0.999 Combined: 0.983	[45]: PSD + cross-correlation values, [8]: Customized Threshold, [17]: Low-pass filter+wavelets+ ANN, [3]: Bandpass FIR filter +ECOC + SVM, [44]: IAF + delta band EEG + Cross-correlation & mahalobonis, [22]: CSP +LDA, [23]: AR + SVM	Traditional pipeline	No	No	No	The Gait Identification Model adopts a 7-layer deep learning model to process gait data and classify subjects’ IDs, achieving an accuracy of 0.999. The EEG Identification Model combines three components (auto-regressive coefficients, the RNN structure, and an SVM classifier) and achieves the accuracy of 0.9841 on a public dataset. Overall, the DeepKey authentication system obtains a FAR of 0 and a FRR of 0.019.	N/M	Yes	GitHub	No	Yannick Roy	TBR	Yes	Zhang2017c
84		Multi-Person Brain Activity Recognition via Comprehensive EEG Signal Analysis	2017	Zhang, Yao, Zhang, Wang, Sheng, Gu	EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services	Yes	Conference	University of New South Wales, Australia Singapore Management University Macquarie University, Australia RMIT University, Australia	Australia	10	Classification of EEG signals	BCI	Active	Motor imagery	Improve SOTA	Use AE + XGB for BCI-MI 5 classes (eegmmidb + internal recordings)	Motor Imagery (see eegmmidb dataset)	Deep learning should be able to generalize better across subjects and across classes, instead of binary classif.	N/M	Motor Imagery	eegmmidb	Public	eegmidb: 20 subjects x 28000 samples Total: 560,000 EEG samples (window length = 1 point)	560000	58.3	20	64	160		N/M	N/M	No	No	Raw EEG (None)	Raw EEG	z-score	N/M	AE + XGB Classifier	AE	Encoder, Decoder + XGB Classifier	N/M	Yes	64x?? Channels x Raw EEG time window	1 (64) Input - Encoder - Decoder - Classifier (XGB)	1	N/M	L2	Yes			5	N/M	N/M	N/M	RMSProp	Other	LR: 0.01	There are 9 mini-batches and the batch size is 17,280.	N/M	N/M	N/M	MSE	Inter	No	No	Train: 532,000 Test: 28,000	Accuracy Precision Recall F1-Score ROC ROC AUC	accuracy, precision, recall, f1-score, ROC, ROC AUC	Nvidia Titan X Pascal 768G memory 145 TB PCIe SSD	See charts	Accuracy: 0.794 Precision: 0.7991 Recall: 0.781 F1 score: 0.7883 AUC: 0.9456	SVM, RNN, LDA, RNN+SVM, CNN, DT, AdaBoost, RF XGBoost, PCA+XGBoost, PCA+AE+XGBoost, EIG+AE+XGBoost, EIG+PCA+XGBoost, DWT+XGBoost, SAE+XGBoost, AE+XGBoost	DL & Trad.	No	No	No	As part of our future work, we will build multi-view model of multi-class EEG signals to improve the classification performance. In particular, we plan to establish multiple models with each single model dealing with a single class. Following this philosophy, the correlation between test sample and each model can be calculated in the test stage and the sample can be classified to the class with minimum correlation coefficient.	N/M	No	N/A	No	Yannick Roy	TBR	Yes	Zhang2017a
85		Neurology-as-a-Service for the Developing World	2017	Dharamsi, Das, Pedapati, Bramble, Muthusamy, Samulowitz, Varshney, Rajamanickam, Thomas & Dauwels	Arxiv	Yes	Preprint	IBM Research AI Nanyang Technological University	USA	5	Classification of EEG signals	BCI	Active	Motor imagery	Improve SOTA: Use DL on the Cloud	Use DL on the Cloud for developing countries. Starting with a BCI Tasks (MI)	MI: Feet and Hands, real / imagined.	To develop neurology-as-a-service to learn features automatically from the data. This would help developing countries	N/M	Motor Imagery	eegmmidb	Public	eegmidb: 103 out of 109 subjects, 12 out of 14 tasks Segments 0.8 sec and sliding window of 0.05 sec The prepared data consisted of 17,232 samples (window length = 0.8s ??? not clear)	17232	N/M	103	64	160		1) Bandpass: 3 - 30Hz 2) Generate Spectrogram: Hanning window & NFFT (128)	Yes	No (mention it, but only filters)	No	Spectrograms	Frequency-domain	N/A	N/M (Cloud)	CNN	CNN	N/M	N/M	No	3D (channels x freq x time)	[1-3 3D CNN] [0-2 FC]	5	N/M	Dropout	Yes			N/M	N/M	N/M	N/M	(hyperparameters are automatically fine-tuned using an optimizer)	N/M	LR: 0.001	N/M	Random Optimizer	Yes	N/M	N/M	Inter	No	No	Train: 70% Test: 30%	Accuracy	accuracy	N/M (Cloud)	N/M	Best accuracy: 63.4%	PCA-SVM	Traditional pipeline	No	No	No	As part of our next steps, we plan to use this framework on a dataset aimed at classification of epileptic seizures and/or pathological/normal EEG. We would also like to see how the framework performs using other hyperparameter optimization techniques including Bayesian optimization.	N/M	No	N/A	No	Yannick Roy	TBR	TBC	Dharamsi2017
86		Deep Architectures for Automated Seizure Detection in Scalp EEGs	2017	Golmohammadi, Ziyabari, Shah, de Diego, Obeid, Picone	Arxiv	Yes	Preprint	Neural Engineering Data Consortium, Temple University	USA	8	Classification of EEG signals	Clinical	Epilepsy	Detection	Improve SOTA: Comparing different deep architectures	Compare HMM+sAE, HMM+LSTM, IPCA+LSTM, CNN+MLP, CNN+LSTM	Ongoing EEG recording, with and without seizures.	With big EEG corpus now available we can explore deep learning.	(Natus), (Nihon Kohden)	Seizures	TUH; Duke Seizure Corpus	Both	TUHS & DUZS: 1,864,012s ~ 517.8h, 159 subjects Multiple models with different window sizes	N/M	31067	159	22; -1	250; N/M		N/M	N/M	N/M	N/M	LFCCs + First & Second Derivative of LFCCs	Other	N/A	N/M	1) HMM + sAE 2) HMM + LSTM 3) IPCA + LSTM 4) CNN + MLP 5) CNN + LSTM	CNN+RNN	2D Conv Layers -> Flatten -> 1D Conv Layer -> LSTM (output 1s data) -> LSTM -> 2-way sigmoid	They tried different architectures trying to capture Spatio-Temporal information. They also use Time-Freq Features, not raw EEG as is.	Yes	210 @ 22 x 26 x 1 (Frames @ Channels * Features * 1) (to be reviewed)	3x 2D CNN + 1x 1D FC CNN + 2x Bi-LSTM (CNN + LSTM) (see paper for others)	6	ELU	Dropout	Yes			2-way Sigmoid	N/M	Trained + Eval on TUSZ and only Eval on DUSZ	Standard	Adam	Adam	N/M	N/M	N/M	N/M	N/M	MSE	Inter	Train-Valid-Test	Train-Valid-Test	Train: 614,382 (sec) Valid: 647,948 (sec) Test: 601,682 (sec)	Sensitivity Specificity	sensitivity, specificity	N/M	N/M	CNN + LSTM gave the best results. TUSZ - Sensitivity: 30.83% \| Specificity: 96.86% DUSZ - Sensitivity: 33.71% \| Specificity: 70.72%	HMM + Gaussian mixture + AE They compared 7 Optimizer Methods. (e.g. Adam, SGD, etc.) They compared 6 Activation Functions. (e.g. Tanh, Sigmoid, etc.) CNN + LSTM, Adam, ELU is the best combinaison	DL & Trad.	No	No	No	This is a significant finding because the Duke corpus was collected with different instrumentation and at different hospitals. Our work shows that deep learning architectures that integrate spatial and temporal contexts are critical to achieving state of the art performance and will enable a new generation of clinically-acceptable technology.	Access to labeled data and $ to label the data and make it public.	No	N/A	Yes	Yannick Roy	TBR	TBC	Golmohammadi2017a
87		Neonatal Seizure Detection using Convolutional Neural Networks	2017	O'Shea, Lightbody, Boylan, Temko	IEEE 27th International Workshop on Machine Learning for Signal Processing	Yes	Conference	Irish Centre for Fetal and Neonatal Translational Research, University College Cork	Ireland	6	Classification of EEG signals	Clinical	Epilepsy	Detection	New Approach	CNN on (preprocessed) raw EEG for neonatal seizure detection	Ongoing EEG recording, with and without seizures.	CNN works well on audio signal, why not on EEG.	N/M	Seizure	Internal Recordings	Private	835 hours with 1389 seizures from 18 subjects splitted into 8s, 50% overlap. (not clear 50% versus 7s overlap / 1s shift) (8s window, 7s overlap)	3006000	50100	18	8	256		Band-pass filter: 0.5 - 12.8Hz Down sampled: 32Hz EEG Split into 8s windows (12.5% overlap)	Yes	No	No	Raw EEG 8 sec windows (1 sec shift)	Raw EEG	N/A	Keras	1D - CNN	CNN	Conv - Batch Norm. - Pooling Output Layer: GAP (not dense)	"...1D CNNs wide convolutional filters (1-4s, 32-128 samples) significantly improved the performance". Sample size filters were used. In contrast to larger filter lengths allow the learning the various filters in a hierarchical manner [21].	Yes	256x1 (8s x 1 channel)	6	6	RELU Softmax	Batch Norm	Yes			2 (Seizure vs Non-seizure)	16,930	The network was trained for 100 epochs, after each epoch the validation AUC was calculated.	Standard	SGD	SGD	LR: 0.003 LR -= 10% every 20 iterations Nesterov Momentum: 0.9	2048	N/M	N/M	Sliding Window (Shifted by 1s, 7/8 overlap)	Categorical Cross-Entropy	Inter	Leave-One-Subject-Out	Leave-One-Subject-Out	Train: 17 subjects Test: 1 subject	ROC AUC	ROC AUC	N/M	N/M	AUC: 97.1% AUC90: 82.9%	SVM	Traditional pipeline	No	No	No	"We have also tried max pooling, which led to slightly inferior results in our experiments." "Initially, the EEG was converted to time-frequency images (spectrograms) and 2D CNNs were utilized, adopted from the area of image processing [17] – this architecture proved unsuccessful in the seizure detection task."	N/M	No	N/A	No	Yannick Roy	TBR	Yes	OShea2017
88		Improving classification accuracy of feedforward neural networks for spiking neuromorphic chips	2017	Yepes, Tang & Mashford	International Joint Conference on Artificial Intelligence	Yes	Conference	IBM Research, VIC, Australia	Australia	7	Improvement of processing tools	Hardware optimization	Neuromorphic chips		New Approach: Running DL on Neuromorphic Chips	Compare constrained network for a neuromorphic chip on 2 datasets, vs unconstrained version of NN	MNIST & EEG Data from Nurse et al., 2015 (BCI-MI)	Implement DL/DNNs on a chip.	N/M	Motor Imagery	Nurse et al. (2015)	Public	From [Nurse et al., 2015]: 1 subject ~ 30 min 480/468 examples for training, 66/95 for testing (window length N/M)	1109	30	1	-1	1000		N/M	N/M	No	No	N/M	N/M	[0, 1]	Matlab	[Esser et al., 2015]	N/M	[Esser et al., 2015]	[Esser et al., 2015]	No	N/M	Small Network: 3 Large Network: 4	4	N/M	N/M	N/M			2	N/M	N/M	N/M	N/M	N/M	LR: 0.1	25	N/M	N/M	No	N/M	Intra	No	No	Train: 80% Test: 20%	Accuracy	accuracy	TrueNorth (IBM Chip)	(see paper)	EEG Data: 86% MNIST: 98-99%	No	None	No	No	No	Furthermore, analysis of the learnt parameters pro- vide insights that might complement hardware design, thus providing a more efficient deployment of the trained models. The trained models use a small portion of the TrueNorth chip (30 cores vs. 4096 avail- able in the current version of the chip), thus requiring a much less than 70mW to work, which makes these models suitable for portable autonomous devices with large autonomy.	N/M	No	N/A	No	Yannick Roy	TBR	TBC	Yepes2017
89		Automatic Analysis of EEGs Using Big Data and Hybrid Deep Learning Architectures	2017	Golmohammadi, Hossein, Torbati, Lopez De Diego, Obeid & Picone	Arxiv	Yes	Preprint	Temple University Jibo, Inc., Redwood City	USA	20	Classification of EEG signals	Clinical	Epilepsy	Detection	New Approach: Hybrid HMM & SdA for Epilepsy	Using an Hybrid 3 Passes Model, combining HMM & Stacked Denoising AutoEncoders for Epilepsy classification	Ongoing EEG recording, with and without seizures. (TUH Dataset)	Deep Learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction	(Natus)	Seizure	TUH	Public	TUH Corpus Training from 359 sessions, Evaluation from 159 sessions. 113453 events total. Splitted 10s windows -> 1s epoch -> 0.1s frames (0.1 window, 0.2s overlap)	113453	18909	518	128	1024		PCA	Yes	No	No	Cepstral coefficient-based feature extraction approach based on Linear Frequency Cepstral Coefficients (LFCCs)	Other	N/M	Theano	3x Stacked denoising Autoencoders (SDAE)	AE	3 Passes. (1) HMM -> (2) SDAEs -> (3) NLP (2) PCA -> Out of Sample -> 3 SDAEs in parallel -> Enhancer (combining 3 SDAEs)	Not your typical DL-EEG approach...	Yes		3 [Nodes from 100-800]	3	N/M	N/M	N/M			6 Classes	N/M	Training of these three SDAE networks is done in two steps: pre-training and fine-tuning. Denoising autoencoders are stacked to form a deep network. The unsupervised pre-training of such an architecture is done one layer at a time.	Pre-training	Minibatch Stochastic Gradient Descent	SGD	LR: [0.1-0.5]	[100-300]	N/M	N/M	Out-of-sample technique (van der Maaten, 2009)	Cross-Entropy	Inter	No	No	Train: 84,032 events Test: 29,421 events	Sensitivity Specificity	sensitivity, specificity	N/M	N/M	Pass: Sensitivity \| Specificity 1 (HMM): 86.78 \| 17.70 2 (SDAE): 78.93 \| 4.40 3 (SLM): 90.10 \| 4.88	No	None	No	No	No	A summary of the results for different stages of processing is shown in Table 12. The overall performance of the multi-pass hybrid HMM/deep learning classification system is promising: more than 90% sensitivity and less than 5% specificity.	N/M	No	N/A	No	Yannick Roy	TBR	TBC	Golmohammadi2017
90		Multimodal deep learning approach for joint EEG-EMG data compression and classification	2017	Ben Said, Mohamed, Elfouly, Harras & Wang	IEEE Wireless Communications and Networking Conference	Yes	Conference	Qatar University Carnegie Melon University University of British Columbia	Qatar	6	Classification of EEG signals	Monitoring	Affective	Emotion	New Approach: Compressing joint EEG-EMG with an autoencoder	Compression & Classification of joint EMG + EEG on DEAP dataset with SAE	Watching music videos (DEAP Dataset)	Deep learning approach has emerged as one of the possible techniques to exploit the correlation of the data from multiple modalities. Compression for mobile health data.	N/M	Emotions	DEAP	Public	DEAP 32 subjects x 40 videos x 63s (6s windows)	23040	1280	32	-1	128		1) 6s Windows 2) Whitened 3) Normalized	Yes	N/M	N/M	Raw EEG + EMG (None)	Raw EEG	z-score	N/M	SAE	AE	N/M	Deep learning approach has emerged as one of the possible techniques to exploit the correlation ofthe data from multiple modalities	No	N/M	2 SAE	2	Sigmoid	L2	Yes			N/M 1) Compressed data 2) Classification	N/M	Greedy-layer wise	Pre-training	N/M	N/M	N/M	N/M	N/M	N/M	Duplicated multimodal data keeping values from 1 modality, setting the other modality to 0. And vice-versa.	Square Euclidean Distance	Inter	No	No	[Compress] Train: 50% [Compress] Test: 50% [Classif] Train: 75% [Classif] Test: 25%	1) Compression: Distortion 2) Classification: Accuracy	distortion, accuracy	N/M	N/M	1) [Compression] Distortion: EMG = 13.85% \| EEG = 12% 2) [Classification] Accuracy: 78.1%	Discrete Wavelet Transform (DWT) [26] Compressed Sensing (CS) [27] (distortion: 22% \| 17.21%) 2D compression approach which is based on SPIHT and FastICA [28]	Traditional pipeline	No	No	No	1) Compression: Distortion 2) Classification	N/M	No	N/A	No	Yannick Roy	TBR	Yes	BenSaid2017a
91		Deep Learning for Fatigue Estimation on the Basis of Multimodal Human-Machine Interactions	2017	Gordienko, Stirenko, Kochura, Alienin, Novotarskiy & Gordienko	Arxiv	Yes	Preprint	National Technical University of Ukraine	Ukraine	12	Classification of EEG signals	Monitoring	Physical	Exercise	New Approach	Multi-modal fatigue (and activity) estimation	Different activities (sports) while having different sensors	Use Multimodal Models. to combine different modalities with a NN.	OpenBCI (OpenBCI)	Multimodal	Internal Recordings	Private	N/M Not much information on the EEG data	N/M	N/M	N/M	-1	N/M		N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	DNN	N/M	N/M	N/M	No	N/M	N/M	N/M	N/M	N/M	N/M			N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	Mean Residual Deviance (MRD) Mean Absolute Error (MAE)	N/M	N/M	No	N/M	Mean Residual Deviance (MRD) Mean Absolute Error (MAE)	mean residual deviance, mean absolute error	N/M	N/M	See Paper (not really relevant / meaningful for this paper)	N/M	None	No	No	No	The main achievement is the multimodal data measured can be used as a training dataset for measuring and recognizing the intensity and physical load on the person by means of the machine learning approaches.	N/M	No	N/A	No	Yannick Roy	TBR	Yes	Gordienko2017
92		Towards Deep Modeling of Music Semantics using EEG Regularizers	2017	Raposo, Matos, Ribeiro, Tang & Yu	Arxiv	Yes	Preprint	Universidade de Lisboa	Portugal	5	Classification of EEG signals	Music semantics			Improve SOTA on music semantics	Modeling of music audio semantics	Listening to music	Previous success of CNNs in music audio modeling	OpenBCI (OpenBCI)	None	Internal Recordings	Private	60 music segments + 2 noise + 2 songs x 18 subjects music duration = average of 15.13s samples approx: 60 * 15.13 / 1.5 * 18 = 10894 (1.5s windows)	10894	272.3	18	16	250		1) Highpass 0.5Hz 2) Notch at 50Hz	Yes	Yes	Yes	Raw EEG + Audio embeddings	Raw EEG	Rescaled [-1, 1]	N/M	CNN	CNN	N/M	N/M	Yes	N/M	5	5	ReLU	No	N/M			128	N/M	1) Train audio+lyrics embeddings model 2) Train audio embeddings+EEG embeddings model	Standard	N/M	N/M	N/M	102	N/M	N/M	N/M	CCA between embeddings	Inter	5-Fold CV	k-fold	Train: 80% Test: 20%	Mean Reciprocal Rank (MRR)	mean reciprocal rank	GeForce GTX 1080	20 minutes	Outperformed Spotfiy by ~1%, but did not perform better than the SOTA (by a small margin)	Spotify embeddings and current SOTA (Choi)	DL & Trad.	No	No	No	Proposed approach did not outperformed SOTA but SOTA was trained on more than 2083 hours of music, whereas the proposed method needs less than 3 hours of both music and EEG	N/M	No	N/A	No	Isabela Albuquerque	TBR	TBC	Raposo2017
93		Deep convolutional neural network for the automated detection and diagnosis of seizure using EEG signals	2017	Acharya, Oh, Hagiwara, Tan & Adeli	Computers in Biology and Medicine	No	Journal	Ngee Ann Polytechnic, Singapore SUSS University, Singapore University of Malaya, Malaysia The Ohio State University, US	Singapore		Classification of EEG signals	Clinical	Epilepsy	Detection	New Approach: CNN for Epilepsy (claiming its a new approach, but it's not...)	13-Layers CNN for Epilepsy	Ongoing EEG recording, with and without seizures.	To develop a computer-aided diagnosis (CAD) to classify EEG	N/M	Seizures	Bonn University	Public	Bonn University: B,D,E 3 x 100 x 23.6s (23.6s windows)	300	118	10	1	173.6		None	No	N/M	N/M	Raw EEG	Raw EEG	z-score	N/M	CNN	CNN	1D CNN Conv / Max Pooling		Yes	4097x1	1D CNN: 10 FC: 2	12	ReLU	L1	Yes			3 (Softmax with 3 classes)	N/M	A conventional backpropagation (BP) [32] with a batch size of 3 is employed in this work to train CNN.	Standard	Adam	Adam	Lambda: 0.7 LR: 1x10^-3 Momentum: 0.3	3	Trial and Error	Yes	No	N/M	Inter	10-Fold CV	k-fold	Train: 90% Valid: 30% of 90% Test: 10%	Accuracy Specificity Sensitivity	accuracy, specificity, sensitivity	Intel Xeon 2.40 GHz (E5620) 24 GB RAM	150 epochs 12.8s / epochs = 32 min	Accuracy: 88.7% Sensitivity: 95% Specificity: 90%	Many other SOTA (check paper) They performed worse than most previous SOTA	Traditional pipeline	No	No	No	The advantage of the model presented in this paper, however, is separate steps of feature extraction and feature selection are not required in this work. Nevertheless, the main drawback of this work is the lack of huge EEG database	Amount of data	No	N/A	Yes	Yannick Roy	TBR	TBC	Acharya2017
94		Electroencephalogram-based decoding cognitive states using convolutional neural network and likelihood ratio based score fusion	2017	Zafar, Dass & Malik	Plos One	No	Journal	Universiti Teknologu PETRONAS	Malaysia	23	Classification of EEG signals	BCI	Reactive	RSVP	Improve SOTA	Decode seen images by extracting features with a CNN	Watching natural images from 5 classes	Features learned automatically can be more efficient	(EGI)	None	Internal Recordings	Private	26 subjects x 21min (1s windows)	13520	546	26	128	250		[Hardware: Bandpass from 0.1 to 100 Hz] 1) Bandpass from 0.3 to 30 Hz 2) Removal of eye artefacts	Yes	Yes	Yes	Raw EEG	Raw EEG	N/M	N/M	CNN	CNN	Modified LeNet CNN is just for feature extraction (feature selection and classification is done separately)	Temporal 1D conv in first layer	Yes	128 x 250	2	2	Sigmoid, tanh	N/M	N/M			128 x 11 x 100	N/M	???	Standard	N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	Monte-Carlo 100-Fold CV	Train-Valid-Test	Train: 90% Test: 10%	Accuracy Specificity Sensitivity	accuracy, specificity sensitivity	N/M	N/M	Accuracy (across participants, 5-class): 40%	Discrete Wavelet Transform + SVM	Traditional pipeline	two-sample t-test, ANOVA	No	No		Amount of data	No	N/A	No	Hubert Banville	TBR	TBC	Zafar2017
95		Deep Convolutional Neural Network for Emotion Recognition Using EEG and Peripheral Physiological Signal	2017	Lin, Li & Sun	International Conference on Image and Graphics	No	Conference	College of Computer Science of Zhejiang University, Hangzhou, China	China		Classification of EEG signals	Monitoring	Affective	Emotion	Improve SOTA: AlexNet on DEAP	AlexNet on Images (Raw EEG + Freq Bands) + other physiological sensors	Watching videos (check out DEAP details)	Using AlexNet on DEAP	N/M	Emotions	DEAP	Public	DEAP 32 subjects x 1min x 40 videos (6s windows)	12800	1280	32	32	512		1) Downsampling to 128Hz 2) Band-Pass Filter: 4.0 - 45Hz 3) Average to Common Reference* (?)	Yes	No	No	EEG -> 6 gray images (Raw EEG + Freq Bands) + 81 features from other physiological sensors	Frequency-domain	min-max	N/M	CNN	CNN	AlexNet	AlexNet is great for images, frequency bands can be converted to images...	Yes	6 Gray Images (2D)	5 CNN 1 FC (81+500)	6	N/M	N/M	N/M			2 Softmax	N/M	Fine-tuning AlexNet	Pre-training	SGD	SGD	LR: 0.001 (decreases every 500 iterations)	200	Empirically	Yes	N/M	N/M	Inter	10-Fold CV	k-fold	Train: 90% Test: 10%	Accuracy F1-Score	accuracy, f1-score	N/M	N/M	Arousal - Accuracy: 87.30% Arousal - F1-Score: 78.24% Valence - Accuracy: 85.50% Valence - F1-Score: 80.06%	Many other SOTA (check paper) They outperform all others.	Traditional pipeline	No	No	No	To achieve better performances, data preprocessing of the original signal was also adopted. The provided experimental results prove the effectiveness and validate the proposed contributions of our method by achieving superior performance over the existing methods on DEAP Dataset.	N/M	No	N/A	No	Yannick Roy	TBR	TBC	Lin2017
96		Cross-subject recognition of operator functional states via EEG and switching deep belief networks with adaptive weights	2017	Yin & Zhang	Neurocomputing	No	Journal	University of Shanghai	China	18	Classification of EEG signals	Monitoring	Cognitive	Mental workload & fatigue	Improve SOTA on cross-subject operator functional state recogntion	Exploit "new" improvements in deep learning	Cabin air management simulation (AutoCAMS)	Using switching deep belief network with adaptive weights	(Nihon Kohden)	None	Internal Recordings	Private	8 subjects x 1080 EEG Segments per subject (2s windows)	8640	288	8	11	500	Offline	1) Adaptive exponential smooth (to remove outliers)	Yes	Yes	Yes	Centroid frequency, log-energy entropy, mean, five power components, Shannon entropy, sum of energy, variance, zero-crossing rate of each channel and power differences between channel pairs	Frequency-domain	z-score	Matlab 2011b	Switching DBN	DBN	One DBN per subject	The member DBN is switched at different time instants to fit the non-stationarity of the EEG features recorded from a novel testing subjects.	Yes	152x1	4	4	Sigmoid	N/M	N/M	3	Low MW Medium MW High MW	3	N/M	Unsupervised pre-training of DBNs to learn representation of features for each subject (layer by layer). Supervised fine-tuning of the complete model.	Pre-training	N/M	N/M	Pre-training: 0.1 Fine-tuning: 1	10	N/M	N/M	Gaussian noise to feature vector	N/M	Inter	Leave-One-Subject-Out	Leave-One-Subject-Out	Train: 7 subjects Test: 1 subject	Accuracy True positive True negative False positive False negative	accuracy, true positives, true negatives, false positives, false negatives	AMD4CPU 1.9GHz, 8G RAM	N/M	Mental workload: 77% Mental fatigure: 68% MW+MF: 54%	KNN, Naive Bayes, Logistic Regression, LSSVM, SAE, DBN (all with and without PCA)	DL & Trad.	Two-tailed Wilcoxon sign-rank test	No	No	Results of the proposed method outperform all baselines. When the number of subjects increases, the performance gap between SDBN and baselines increases, suggesting that the number of subjects plays a fundamental role.	Number of subjects is crucial to obtain a good performance	No	N/A	No	Isabela Albuquerque	Yannick Roy	TBC	Yin2017
97		Vowel classification from imagined speech using sub-band EEG frequencies and deep belief networks	2017	Sree & Kavita	IEEE International Conference on Signal Processing, Communications and Networking	No	Conference	SSN College of Engineering	India	4	Classification of EEG signals	BCI	Active	Speech decoding	Improve SOTA on vowel classification	Use DBNs to extract EEG features	Speech imagery	N/M	Super Spec (RMS)	None	Internal Recordings	Private	5 subjects x 75s experiment x ?? trials Between 15-20 min per subject not clear...	N/M	87.5	5	32	128		1) Band-pass 1-60Hz	Yes	No	No	Energy features of Wavelet transform: Root Mean Square, Mean Absolute Value, Integrated EEG, Simple Square Integral, Variance of EEG, Average Amplitude Change	Frequency-domain	z-score	N/M	DBN	DBN	N/M	N/M	No	N/M	7	7	N/M	N/M	N/M			N/M	N/M	N/M	N/M	N/M	N/M	LR: 0.002	N/M	N/M	N/M	N/M	Log-likelihood	Inter	No	No	Train: 80% Test: 20%	Accuracy	accuracy	N/M	N/M	~87.5% (I believe this the average value for all vowels and EEG bands)	No	None	No	No	No	Vowels were more accurately classified in the theta and gamma bands	N/M	No	N/A	No	Isabela Albuquerque	TBR	TBC	Sree2017
98		Bullying incidences identification within an immersive environment using HD EEG-based analysis: A Swarm Decomposition and Deep Learning approach	2017	Baltatzis, Bintsi, Apostolidis & Hadjileontiadis	Nature Scientific Reports	No	Journal	Aristotle University of Thessaloniki, Khalifa University of Science and Technology	Greece	8	Classification of EEG signals	Monitoring	Affective	Bullying incidents	New task: classifying bullying stimuli	Classifying bullying stimuli in 2D or VR presentation	Watching stimuli (2D or in VR) of bullying situations	N/M	(EGI)	None	Internal Recordings	Upon request	T1: 256 × 256 × 14 × 17 (x3 SWD) (channels × samples × trials × subject) T2: 256 × 192 × 16 × 17 (x3 SWD) (channels × samples × trials × subject) not clear...	1530	N/M	17	256	250		1) Bandpass 0.3-30 Hz 2) Artefact detection, bad channel replacement, baseline correction 3) Channel-wise normalization (- mean, / max) 4) Highpass @7Hz 5) Downsample to 128 Hz	Yes	Yes	Yes	1) Swarm decomposition to get oscillatory modes 2) k-means clustering to re-order channels based on the respective distance to each other	Other	N/M	N/M	CNN	CNN	N/M	N/M	Yes	256 x 128	2	2	ReLU	N/M	N/M			2 or 4	N/M	Standard optimization	Standard	N/M	N/M	N/M	N/M	N/M	N/M	N/M	"Softmax"	Inter	10-Fold CV	k-fold	Train: 75% Valid: 10% of 75% Test: 25%	Accuracy Precision Recall ROC AUC	accuracy, precision, recall, ROC AUC	N/M	N/M	2-class: Accuracy, precision, recall, AUC (test): 0.937, 0.9403, 0.9395, 0.9869 4-class: Accuracy, precision, recall, AUC (test): 0.8858, 0.8775. 0.87475, 0.975	No Swarm decomposition or clustering Just clustering Just Swarm decomposition	Traditional pipeline	No	No	No	Swarm Decomposition was an important step in getting high accuracy. Withouth k-means clustering the network was overfitting.	Larger nets take more resources	No	N/A	No	Hubert Banville	TBR	Yes	Baltatzis2017
99		Classification and discrimination of focal and non-focal EEG signals based on deep neural network	2017	Taqi, Al-Azzo, Mariofanna & Al-Saadi	International Conference on Current Research in Computer Science and Information Technology (ICCIT)	No	Conference	University of Arkansas at Little Rock Asiacell Company for Telecommunication, Iraq	USA	7	Classification of EEG signals	Clinical	Epilepsy	Detection	Improve SOTA	Detecting Focal vs Non-Focal Seizures with existing Deep Nets: AlexNet, LeNet, GoogleNet	Seizures (Bern-Barcelona Dataset)	deep neural network (DNN) is a high-res model that get sophisticated hierarchical features. (e.g. AlexNet, LeNet, GoogleNet)	N/M	Seizures	Bern-Barcelona EEG DB	Public	Bern-Barcelona EEG DB (600 out of 3750) 600 signal pairs: 300/300 x 40s (40s windows ??)	600	400	5	-1	256		None	No	No	No	None (Raw EEG)	Raw EEG	N/M	Caffe	N/M (pre-trained models)	N/M	N/M (pre-trained models)	Using SOTA Networks in Vision/Images for EEG. (AlexNet, LeNet, GoogleNet)	No	256x256 (images)	N/M (pre-trained models)	N/M	N/M (pre-trained models)	N/M (pre-trained models)	N/M			2	N/M	pre-trained models (AlexNet, LeNet, GoogleNet)	Pre-training	N/M (pre-trained models)	N/M	N/M	N/M	N/M	N/M	N/M	N/M	Inter	No	No	Train: 75% Test: 25%	Accuracy	accuracy	NVidia GPUs	N/M	LeNet, AlexNet, GoogleNet 100% (with different numbers of TEs) LeNet is the best compromise	Anindya et al., 2016 : 89.4% (EMD-DWT domain, K-nearest neighbor classifier) R. Sharma et al.,2015 : 84% (DWT domain, KNN, PNN, fuzzy and LS-SVM) R. Sharma et al.,2014 : 85% (EMD domain, LS-SVM classifier)	Traditional pipeline	No	No	No	As a future task, we are looking forward to investigating approaches for EEG signals classification of other diseases, drunk people, or ECG signals classification	N/M	No	N/A	No	Yannick Roy	TBR	Yes	Taqi2017
100		Deep Transfer Learning for Cross-subject and Cross-experiment Prediction of Image Rapid Serial Visual Presentation Events from EEG Data	2017	Hajinoroozi, Mao & Lin	International Conference on Augmented Cognition	No	Conference	University of Texas at San Antonio National Sun Yat-sen University, Tawain	USA	11	Classification of EEG signals	BCI	Reactive	RSVP	Novel Approach: Transfer Learning	Transfer learning on RSVP task with CNN on Raw EEG: (1) Cross-Suject (2) Cross-Experiment	RSVP (3 datasets from 1990, 1999, 2013)	Transfer learning has a lot of potential for BCI training.	ActiveTwo (BioSemi)	RSVP	USA DoD (1999); USA Army (1990); Touryan et al. (2013)	Private	DS #1 - CT2WS: 15 subjects x 15min DS #2 - Static: 16 subjects x 15min DS #3 - Expertise: 10 subjects x 5 sessions x 60min (1s windows, no overlap)	65831; 62553; 21680	1097.2; 1042.6; 361.3	15; 16; 10	64; 64; 256	512; 512; 512		1) Bandpass filter: 0.1 - 55 Hz 2) Downsampled to 128 Hz 3) Epoching: 1s window	Yes	No	No	None (Raw EEG)	Raw EEG	N/M	N/M	STCNN (Spatial-Temporal CNN)	CNN	Pretty much a CNN with a fancy name. 2 Conv Layers + 3 FC with dropout	Trying to capture Spatial and Temporal information from Raw EEG	Yes	64x128	CNN: 2 FC: 3	5	ReLU	Dropout	Yes			2 Target / Non-Target (softmax)	N/M	The paper is about transfer learning. Training on 1 dataset, then fine-tuning (or not) on the other.	Pre-training	N/M	N/M	N/M	N/M	N/M	N/M	N/M	N/M	Inter	10-Fold CV	k-fold	Train: 90% Valid: 10%	ROC AUC	ROC AUC	N/M	N/M	Ranging from 73-77%, depending on source / target datasets and transfer type (Cross-Subject or Cross-Experiment)	Bagging, XLDA, LDA	Traditional pipeline	No	All Layers: Subject Specific CNN Layers: Mostly Subj. Specific All Layers: General Info	Analysis of performance with transferred layers	This study represents the first comprehensive investigation of CNN transferability for EEG based classification and our results provide important information that will guide the design of more sophisticated deep transfer learning algorithms for EEG based classifications in BCI applications.	N/M	No	N/A	No	Yannick Roy	TBR	TBC	Hajinoroozi2017