|
a |
|
b/EEGModels.py |
|
|
1 |
""" |
|
|
2 |
ARL_EEGModels - A collection of Convolutional Neural Network models for EEG |
|
|
3 |
Signal Processing and Classification, using Keras and Tensorflow |
|
|
4 |
|
|
|
5 |
Requirements: |
|
|
6 |
(1) tensorflow == 2.X (as of this writing, 2.0 - 2.3 have been verified |
|
|
7 |
as working) |
|
|
8 |
|
|
|
9 |
To run the EEG/MEG ERP classification sample script, you will also need |
|
|
10 |
|
|
|
11 |
(4) mne >= 0.17.1 |
|
|
12 |
(5) PyRiemann >= 0.2.5 |
|
|
13 |
(6) scikit-learn >= 0.20.1 |
|
|
14 |
(7) matplotlib >= 2.2.3 |
|
|
15 |
|
|
|
16 |
To use: |
|
|
17 |
|
|
|
18 |
(1) Place this file in the PYTHONPATH variable in your IDE (i.e.: Spyder) |
|
|
19 |
(2) Import the model as |
|
|
20 |
|
|
|
21 |
from EEGModels import EEGNet |
|
|
22 |
|
|
|
23 |
model = EEGNet(nb_classes = ..., Chans = ..., Samples = ...) |
|
|
24 |
|
|
|
25 |
(3) Then compile and fit the model |
|
|
26 |
|
|
|
27 |
model.compile(loss = ..., optimizer = ..., metrics = ...) |
|
|
28 |
fitted = model.fit(...) |
|
|
29 |
predicted = model.predict(...) |
|
|
30 |
|
|
|
31 |
Portions of this project are works of the United States Government and are not |
|
|
32 |
subject to domestic copyright protection under 17 USC Sec. 105. Those |
|
|
33 |
portions are released world-wide under the terms of the Creative Commons Zero |
|
|
34 |
1.0 (CC0) license. |
|
|
35 |
|
|
|
36 |
Other portions of this project are subject to domestic copyright protection |
|
|
37 |
under 17 USC Sec. 105. Those portions are licensed under the Apache 2.0 |
|
|
38 |
license. The complete text of the license governing this material is in |
|
|
39 |
the file labeled LICENSE.TXT that is a part of this project's official |
|
|
40 |
distribution. |
|
|
41 |
""" |
|
|
42 |
|
|
|
43 |
from tensorflow.keras.models import Model |
|
|
44 |
from tensorflow.keras.layers import Dense, Activation, Permute, Dropout |
|
|
45 |
from tensorflow.keras.layers import Conv2D, MaxPooling2D, AveragePooling2D |
|
|
46 |
from tensorflow.keras.layers import SeparableConv2D, DepthwiseConv2D |
|
|
47 |
from tensorflow.keras.layers import BatchNormalization |
|
|
48 |
from tensorflow.keras.layers import SpatialDropout2D |
|
|
49 |
from tensorflow.keras.regularizers import l1_l2 |
|
|
50 |
from tensorflow.keras.layers import Input, Flatten |
|
|
51 |
from tensorflow.keras.constraints import max_norm |
|
|
52 |
from tensorflow.keras import backend as K |
|
|
53 |
|
|
|
54 |
|
|
|
55 |
def EEGNet(nb_classes, Chans = 64, Samples = 128, |
|
|
56 |
dropoutRate = 0.5, kernLength = 64, F1 = 8, |
|
|
57 |
D = 2, F2 = 16, norm_rate = 0.25, dropoutType = 'Dropout'): |
|
|
58 |
""" Keras Implementation of EEGNet |
|
|
59 |
http://iopscience.iop.org/article/10.1088/1741-2552/aace8c/meta |
|
|
60 |
|
|
|
61 |
Note that this implements the newest version of EEGNet and NOT the earlier |
|
|
62 |
version (version v1 and v2 on arxiv). We strongly recommend using this |
|
|
63 |
architecture as it performs much better and has nicer properties than |
|
|
64 |
our earlier version. For example: |
|
|
65 |
|
|
|
66 |
1. Depthwise Convolutions to learn spatial filters within a |
|
|
67 |
temporal convolution. The use of the depth_multiplier option maps |
|
|
68 |
exactly to the number of spatial filters learned within a temporal |
|
|
69 |
filter. This matches the setup of algorithms like FBCSP which learn |
|
|
70 |
spatial filters within each filter in a filter-bank. This also limits |
|
|
71 |
the number of free parameters to fit when compared to a fully-connected |
|
|
72 |
convolution. |
|
|
73 |
|
|
|
74 |
2. Separable Convolutions to learn how to optimally combine spatial |
|
|
75 |
filters across temporal bands. Separable Convolutions are Depthwise |
|
|
76 |
Convolutions followed by (1x1) Pointwise Convolutions. |
|
|
77 |
|
|
|
78 |
|
|
|
79 |
While the original paper used Dropout, we found that SpatialDropout2D |
|
|
80 |
sometimes produced slightly better results for classification of ERP |
|
|
81 |
signals. However, SpatialDropout2D significantly reduced performance |
|
|
82 |
on the Oscillatory dataset (SMR, BCI-IV Dataset 2A). We recommend using |
|
|
83 |
the default Dropout in most cases. |
|
|
84 |
|
|
|
85 |
Assumes the input signal is sampled at 128Hz. If you want to use this model |
|
|
86 |
for any other sampling rate you will need to modify the lengths of temporal |
|
|
87 |
kernels and average pooling size in blocks 1 and 2 as needed (double the |
|
|
88 |
kernel lengths for double the sampling rate, etc). Note that we haven't |
|
|
89 |
tested the model performance with this rule so this may not work well. |
|
|
90 |
|
|
|
91 |
The model with default parameters gives the EEGNet-8,2 model as discussed |
|
|
92 |
in the paper. This model should do pretty well in general, although it is |
|
|
93 |
advised to do some model searching to get optimal performance on your |
|
|
94 |
particular dataset. |
|
|
95 |
|
|
|
96 |
We set F2 = F1 * D (number of input filters = number of output filters) for |
|
|
97 |
the SeparableConv2D layer. We haven't extensively tested other values of this |
|
|
98 |
parameter (say, F2 < F1 * D for compressed learning, and F2 > F1 * D for |
|
|
99 |
overcomplete). We believe the main parameters to focus on are F1 and D. |
|
|
100 |
|
|
|
101 |
Inputs: |
|
|
102 |
|
|
|
103 |
nb_classes : int, number of classes to classify |
|
|
104 |
Chans, Samples : number of channels and time points in the EEG data |
|
|
105 |
dropoutRate : dropout fraction |
|
|
106 |
kernLength : length of temporal convolution in first layer. We found |
|
|
107 |
that setting this to be half the sampling rate worked |
|
|
108 |
well in practice. For the SMR dataset in particular |
|
|
109 |
since the data was high-passed at 4Hz we used a kernel |
|
|
110 |
length of 32. |
|
|
111 |
F1, F2 : number of temporal filters (F1) and number of pointwise |
|
|
112 |
filters (F2) to learn. Default: F1 = 8, F2 = F1 * D. |
|
|
113 |
D : number of spatial filters to learn within each temporal |
|
|
114 |
convolution. Default: D = 2 |
|
|
115 |
dropoutType : Either SpatialDropout2D or Dropout, passed as a string. |
|
|
116 |
|
|
|
117 |
""" |
|
|
118 |
|
|
|
119 |
if dropoutType == 'SpatialDropout2D': |
|
|
120 |
dropoutType = SpatialDropout2D |
|
|
121 |
elif dropoutType == 'Dropout': |
|
|
122 |
dropoutType = Dropout |
|
|
123 |
else: |
|
|
124 |
raise ValueError('dropoutType must be one of SpatialDropout2D ' |
|
|
125 |
'or Dropout, passed as a string.') |
|
|
126 |
|
|
|
127 |
input1 = Input(shape = (Chans, Samples, 1)) |
|
|
128 |
|
|
|
129 |
################################################################## |
|
|
130 |
block1 = Conv2D(F1, (1, kernLength), padding = 'same', |
|
|
131 |
input_shape = (Chans, Samples, 1), |
|
|
132 |
use_bias = False)(input1) |
|
|
133 |
block1 = BatchNormalization()(block1) |
|
|
134 |
block1 = DepthwiseConv2D((Chans, 1), use_bias = False, |
|
|
135 |
depth_multiplier = D, |
|
|
136 |
depthwise_constraint = max_norm(1.))(block1) |
|
|
137 |
block1 = BatchNormalization()(block1) |
|
|
138 |
block1 = Activation('elu')(block1) |
|
|
139 |
block1 = AveragePooling2D((1, 4))(block1) |
|
|
140 |
block1 = dropoutType(dropoutRate)(block1) |
|
|
141 |
|
|
|
142 |
block2 = SeparableConv2D(F2, (1, 16), |
|
|
143 |
use_bias = False, padding = 'same')(block1) |
|
|
144 |
block2 = BatchNormalization()(block2) |
|
|
145 |
block2 = Activation('elu')(block2) |
|
|
146 |
block2 = AveragePooling2D((1, 8))(block2) |
|
|
147 |
block2 = dropoutType(dropoutRate)(block2) |
|
|
148 |
|
|
|
149 |
flatten = Flatten(name = 'flatten')(block2) |
|
|
150 |
|
|
|
151 |
dense = Dense(nb_classes, name = 'dense', |
|
|
152 |
kernel_constraint = max_norm(norm_rate))(flatten) |
|
|
153 |
softmax = Activation('softmax', name = 'softmax')(dense) |
|
|
154 |
|
|
|
155 |
return Model(inputs=input1, outputs=softmax) |
|
|
156 |
|
|
|
157 |
|
|
|
158 |
|
|
|
159 |
|
|
|
160 |
def EEGNet_SSVEP(nb_classes = 12, Chans = 8, Samples = 256, |
|
|
161 |
dropoutRate = 0.5, kernLength = 256, F1 = 96, |
|
|
162 |
D = 1, F2 = 96, dropoutType = 'Dropout'): |
|
|
163 |
""" SSVEP Variant of EEGNet, as used in [1]. |
|
|
164 |
|
|
|
165 |
Inputs: |
|
|
166 |
|
|
|
167 |
nb_classes : int, number of classes to classify |
|
|
168 |
Chans, Samples : number of channels and time points in the EEG data |
|
|
169 |
dropoutRate : dropout fraction |
|
|
170 |
kernLength : length of temporal convolution in first layer |
|
|
171 |
F1, F2 : number of temporal filters (F1) and number of pointwise |
|
|
172 |
filters (F2) to learn. |
|
|
173 |
D : number of spatial filters to learn within each temporal |
|
|
174 |
convolution. |
|
|
175 |
dropoutType : Either SpatialDropout2D or Dropout, passed as a string. |
|
|
176 |
|
|
|
177 |
|
|
|
178 |
[1]. Waytowich, N. et. al. (2018). Compact Convolutional Neural Networks |
|
|
179 |
for Classification of Asynchronous Steady-State Visual Evoked Potentials. |
|
|
180 |
Journal of Neural Engineering vol. 15(6). |
|
|
181 |
http://iopscience.iop.org/article/10.1088/1741-2552/aae5d8 |
|
|
182 |
|
|
|
183 |
""" |
|
|
184 |
|
|
|
185 |
if dropoutType == 'SpatialDropout2D': |
|
|
186 |
dropoutType = SpatialDropout2D |
|
|
187 |
elif dropoutType == 'Dropout': |
|
|
188 |
dropoutType = Dropout |
|
|
189 |
else: |
|
|
190 |
raise ValueError('dropoutType must be one of SpatialDropout2D ' |
|
|
191 |
'or Dropout, passed as a string.') |
|
|
192 |
|
|
|
193 |
input1 = Input(shape = (Chans, Samples, 1)) |
|
|
194 |
|
|
|
195 |
################################################################## |
|
|
196 |
block1 = Conv2D(F1, (1, kernLength), padding = 'same', |
|
|
197 |
input_shape = (Chans, Samples, 1), |
|
|
198 |
use_bias = False)(input1) |
|
|
199 |
block1 = BatchNormalization()(block1) |
|
|
200 |
block1 = DepthwiseConv2D((Chans, 1), use_bias = False, |
|
|
201 |
depth_multiplier = D, |
|
|
202 |
depthwise_constraint = max_norm(1.))(block1) |
|
|
203 |
block1 = BatchNormalization()(block1) |
|
|
204 |
block1 = Activation('elu')(block1) |
|
|
205 |
block1 = AveragePooling2D((1, 4))(block1) |
|
|
206 |
block1 = dropoutType(dropoutRate)(block1) |
|
|
207 |
|
|
|
208 |
block2 = SeparableConv2D(F2, (1, 16), |
|
|
209 |
use_bias = False, padding = 'same')(block1) |
|
|
210 |
block2 = BatchNormalization()(block2) |
|
|
211 |
block2 = Activation('elu')(block2) |
|
|
212 |
block2 = AveragePooling2D((1, 8))(block2) |
|
|
213 |
block2 = dropoutType(dropoutRate)(block2) |
|
|
214 |
|
|
|
215 |
flatten = Flatten(name = 'flatten')(block2) |
|
|
216 |
|
|
|
217 |
dense = Dense(nb_classes, name = 'dense')(flatten) |
|
|
218 |
softmax = Activation('softmax', name = 'softmax')(dense) |
|
|
219 |
|
|
|
220 |
return Model(inputs=input1, outputs=softmax) |
|
|
221 |
|
|
|
222 |
|
|
|
223 |
|
|
|
224 |
def EEGNet_old(nb_classes, Chans = 64, Samples = 128, regRate = 0.0001, |
|
|
225 |
dropoutRate = 0.25, kernels = [(2, 32), (8, 4)], strides = (2, 4)): |
|
|
226 |
""" Keras Implementation of EEGNet_v1 (https://arxiv.org/abs/1611.08024v2) |
|
|
227 |
|
|
|
228 |
This model is the original EEGNet model proposed on arxiv |
|
|
229 |
https://arxiv.org/abs/1611.08024v2 |
|
|
230 |
|
|
|
231 |
with a few modifications: we use striding instead of max-pooling as this |
|
|
232 |
helped slightly in classification performance while also providing a |
|
|
233 |
computational speed-up. |
|
|
234 |
|
|
|
235 |
Note that we no longer recommend the use of this architecture, as the new |
|
|
236 |
version of EEGNet performs much better overall and has nicer properties. |
|
|
237 |
|
|
|
238 |
Inputs: |
|
|
239 |
|
|
|
240 |
nb_classes : total number of final categories |
|
|
241 |
Chans, Samples : number of EEG channels and samples, respectively |
|
|
242 |
regRate : regularization rate for L1 and L2 regularizations |
|
|
243 |
dropoutRate : dropout fraction |
|
|
244 |
kernels : the 2nd and 3rd layer kernel dimensions (default is |
|
|
245 |
the [2, 32] x [8, 4] configuration) |
|
|
246 |
strides : the stride size (note that this replaces the max-pool |
|
|
247 |
used in the original paper) |
|
|
248 |
|
|
|
249 |
""" |
|
|
250 |
|
|
|
251 |
# start the model |
|
|
252 |
input_main = Input((Chans, Samples)) |
|
|
253 |
layer1 = Conv2D(16, (Chans, 1), input_shape=(Chans, Samples, 1), |
|
|
254 |
kernel_regularizer = l1_l2(l1=regRate, l2=regRate))(input_main) |
|
|
255 |
layer1 = BatchNormalization()(layer1) |
|
|
256 |
layer1 = Activation('elu')(layer1) |
|
|
257 |
layer1 = Dropout(dropoutRate)(layer1) |
|
|
258 |
|
|
|
259 |
permute_dims = 2, 1, 3 |
|
|
260 |
permute1 = Permute(permute_dims)(layer1) |
|
|
261 |
|
|
|
262 |
layer2 = Conv2D(4, kernels[0], padding = 'same', |
|
|
263 |
kernel_regularizer=l1_l2(l1=0.0, l2=regRate), |
|
|
264 |
strides = strides)(permute1) |
|
|
265 |
layer2 = BatchNormalization()(layer2) |
|
|
266 |
layer2 = Activation('elu')(layer2) |
|
|
267 |
layer2 = Dropout(dropoutRate)(layer2) |
|
|
268 |
|
|
|
269 |
layer3 = Conv2D(4, kernels[1], padding = 'same', |
|
|
270 |
kernel_regularizer=l1_l2(l1=0.0, l2=regRate), |
|
|
271 |
strides = strides)(layer2) |
|
|
272 |
layer3 = BatchNormalization()(layer3) |
|
|
273 |
layer3 = Activation('elu')(layer3) |
|
|
274 |
layer3 = Dropout(dropoutRate)(layer3) |
|
|
275 |
|
|
|
276 |
flatten = Flatten(name = 'flatten')(layer3) |
|
|
277 |
|
|
|
278 |
dense = Dense(nb_classes, name = 'dense')(flatten) |
|
|
279 |
softmax = Activation('softmax', name = 'softmax')(dense) |
|
|
280 |
|
|
|
281 |
return Model(inputs=input_main, outputs=softmax) |
|
|
282 |
|
|
|
283 |
|
|
|
284 |
|
|
|
285 |
def DeepConvNet(nb_classes, Chans = 64, Samples = 256, |
|
|
286 |
dropoutRate = 0.5): |
|
|
287 |
""" Keras implementation of the Deep Convolutional Network as described in |
|
|
288 |
Schirrmeister et. al. (2017), Human Brain Mapping. |
|
|
289 |
|
|
|
290 |
This implementation assumes the input is a 2-second EEG signal sampled at |
|
|
291 |
128Hz, as opposed to signals sampled at 250Hz as described in the original |
|
|
292 |
paper. We also perform temporal convolutions of length (1, 5) as opposed |
|
|
293 |
to (1, 10) due to this sampling rate difference. |
|
|
294 |
|
|
|
295 |
Note that we use the max_norm constraint on all convolutional layers, as |
|
|
296 |
well as the classification layer. We also change the defaults for the |
|
|
297 |
BatchNormalization layer. We used this based on a personal communication |
|
|
298 |
with the original authors. |
|
|
299 |
|
|
|
300 |
ours original paper |
|
|
301 |
pool_size 1, 2 1, 3 |
|
|
302 |
strides 1, 2 1, 3 |
|
|
303 |
conv filters 1, 5 1, 10 |
|
|
304 |
|
|
|
305 |
Note that this implementation has not been verified by the original |
|
|
306 |
authors. |
|
|
307 |
|
|
|
308 |
""" |
|
|
309 |
|
|
|
310 |
# start the model |
|
|
311 |
input_main = Input((Chans, Samples, 1)) |
|
|
312 |
block1 = Conv2D(25, (1, 5), |
|
|
313 |
input_shape=(Chans, Samples, 1), |
|
|
314 |
kernel_constraint = max_norm(2., axis=(0,1,2)))(input_main) |
|
|
315 |
block1 = Conv2D(25, (Chans, 1), |
|
|
316 |
kernel_constraint = max_norm(2., axis=(0,1,2)))(block1) |
|
|
317 |
block1 = BatchNormalization(epsilon=1e-05, momentum=0.9)(block1) |
|
|
318 |
block1 = Activation('elu')(block1) |
|
|
319 |
block1 = MaxPooling2D(pool_size=(1, 2), strides=(1, 2))(block1) |
|
|
320 |
block1 = Dropout(dropoutRate)(block1) |
|
|
321 |
|
|
|
322 |
block2 = Conv2D(50, (1, 5), |
|
|
323 |
kernel_constraint = max_norm(2., axis=(0,1,2)))(block1) |
|
|
324 |
block2 = BatchNormalization(epsilon=1e-05, momentum=0.9)(block2) |
|
|
325 |
block2 = Activation('elu')(block2) |
|
|
326 |
block2 = MaxPooling2D(pool_size=(1, 2), strides=(1, 2))(block2) |
|
|
327 |
block2 = Dropout(dropoutRate)(block2) |
|
|
328 |
|
|
|
329 |
block3 = Conv2D(100, (1, 5), |
|
|
330 |
kernel_constraint = max_norm(2., axis=(0,1,2)))(block2) |
|
|
331 |
block3 = BatchNormalization(epsilon=1e-05, momentum=0.9)(block3) |
|
|
332 |
block3 = Activation('elu')(block3) |
|
|
333 |
block3 = MaxPooling2D(pool_size=(1, 2), strides=(1, 2))(block3) |
|
|
334 |
block3 = Dropout(dropoutRate)(block3) |
|
|
335 |
|
|
|
336 |
block4 = Conv2D(200, (1, 5), |
|
|
337 |
kernel_constraint = max_norm(2., axis=(0,1,2)))(block3) |
|
|
338 |
block4 = BatchNormalization(epsilon=1e-05, momentum=0.9)(block4) |
|
|
339 |
block4 = Activation('elu')(block4) |
|
|
340 |
block4 = MaxPooling2D(pool_size=(1, 2), strides=(1, 2))(block4) |
|
|
341 |
block4 = Dropout(dropoutRate)(block4) |
|
|
342 |
|
|
|
343 |
flatten = Flatten()(block4) |
|
|
344 |
|
|
|
345 |
dense = Dense(nb_classes, kernel_constraint = max_norm(0.5))(flatten) |
|
|
346 |
softmax = Activation('softmax')(dense) |
|
|
347 |
|
|
|
348 |
return Model(inputs=input_main, outputs=softmax) |
|
|
349 |
|
|
|
350 |
|
|
|
351 |
# need these for ShallowConvNet |
|
|
352 |
def square(x): |
|
|
353 |
return K.square(x) |
|
|
354 |
|
|
|
355 |
def log(x): |
|
|
356 |
return K.log(K.clip(x, min_value = 1e-7, max_value = 10000)) |
|
|
357 |
|
|
|
358 |
|
|
|
359 |
def ShallowConvNet(nb_classes, Chans = 64, Samples = 128, dropoutRate = 0.5): |
|
|
360 |
""" Keras implementation of the Shallow Convolutional Network as described |
|
|
361 |
in Schirrmeister et. al. (2017), Human Brain Mapping. |
|
|
362 |
|
|
|
363 |
Assumes the input is a 2-second EEG signal sampled at 128Hz. Note that in |
|
|
364 |
the original paper, they do temporal convolutions of length 25 for EEG |
|
|
365 |
data sampled at 250Hz. We instead use length 13 since the sampling rate is |
|
|
366 |
roughly half of the 250Hz which the paper used. The pool_size and stride |
|
|
367 |
in later layers is also approximately half of what is used in the paper. |
|
|
368 |
|
|
|
369 |
Note that we use the max_norm constraint on all convolutional layers, as |
|
|
370 |
well as the classification layer. We also change the defaults for the |
|
|
371 |
BatchNormalization layer. We used this based on a personal communication |
|
|
372 |
with the original authors. |
|
|
373 |
|
|
|
374 |
ours original paper |
|
|
375 |
pool_size 1, 35 1, 75 |
|
|
376 |
strides 1, 7 1, 15 |
|
|
377 |
conv filters 1, 13 1, 25 |
|
|
378 |
|
|
|
379 |
Note that this implementation has not been verified by the original |
|
|
380 |
authors. We do note that this implementation reproduces the results in the |
|
|
381 |
original paper with minor deviations. |
|
|
382 |
""" |
|
|
383 |
|
|
|
384 |
# start the model |
|
|
385 |
input_main = Input((Chans, Samples, 1)) |
|
|
386 |
block1 = Conv2D(40, (1, 13), |
|
|
387 |
input_shape=(Chans, Samples, 1), |
|
|
388 |
kernel_constraint = max_norm(2., axis=(0,1,2)))(input_main) |
|
|
389 |
block1 = Conv2D(40, (Chans, 1), use_bias=False, |
|
|
390 |
kernel_constraint = max_norm(2., axis=(0,1,2)))(block1) |
|
|
391 |
block1 = BatchNormalization(epsilon=1e-05, momentum=0.9)(block1) |
|
|
392 |
block1 = Activation(square)(block1) |
|
|
393 |
block1 = AveragePooling2D(pool_size=(1, 35), strides=(1, 7))(block1) |
|
|
394 |
block1 = Activation(log)(block1) |
|
|
395 |
block1 = Dropout(dropoutRate)(block1) |
|
|
396 |
flatten = Flatten()(block1) |
|
|
397 |
dense = Dense(nb_classes, kernel_constraint = max_norm(0.5))(flatten) |
|
|
398 |
softmax = Activation('softmax')(dense) |
|
|
399 |
|
|
|
400 |
return Model(inputs=input_main, outputs=softmax) |
|
|
401 |
|
|
|
402 |
|