Diff of /README.md [000000] .. [21e041]

Switch to unified view

a b/README.md
1
## New!
2
### version for tensorflow 2 
3
4
[PDL_tf2.ipynb](https://github.com/miguelperezenciso/DLpipeline/blob/master/PDL_tf2.ipynb)
5
6
Includes example with keras tuner for hyperparameter optimization.
7
8
## DLpipeline
9
### A Guide on Deep Learning for Complex Trait Genomic Prediction: A Keras Based Pipeline
10
11
#### M Pérez-Enciso & LM Zingaretti
12
#### miguel.perez@uab.es, m.lau.zingaretti@gmail.com
13
14
If you find this resource useful, please cite:
15
16
[Pérez-Enciso M, Zingaretti LM. 2019. A Guide on Deep Learning for Complex Trait Genomic Prediction. Genes, 10, 553.](https://www.mdpi.com/2073-4425/10/7/553)
17
18
and possibly
19
20
[Bellot P, De Los Campos G, Pérez-Enciso M. 2018. Can Deep Learning Improve Genomic Prediction of Complex Human Traits? Genetics 210:809-819.](https://www.genetics.org/content/210/3/809)
21
22
[Zingaretti LM, Gezan SA, Ferrão LFV, Osorio LF, Monfort A, Muñoz PR, Whitaker VM, Pérez-Enciso M. 2020. Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species. Frontiers in Plant Science 11:25](https://doi.org/10.3389/fpls.2020.00025)
23
24
 * * *
25
 
26
Implementing DL, despite all its theoretical and computational complexities, is rather easy. This is thanks to Keras API (https://keras.io/) and TensorFlow (https://www.tensorflow.org/), which allow all intricacies to be encapsulated through very simple statements. TensorFlow is a machine-learning library developed by Google. In addition, the machine-learning python library scikit-learn (https://scikit-learn.org) is highly useful. Directly implementing DL in TensorFlow requires some knowledge of DL algorithms, and understanding the philosophy behind tensor (i.e., n-dimensional objects) manipulations. Fortunately, this can be avoided using Keras, a high-level python interface for TensorFlow and other DL libraries. Although alternatives to TensorFlow and Keras exist, we believe these two tools combined are currently the best options: they are simple to use and are well documented. 
27
28
Here we describe some Keras implementation details. Complete code is in [jupyter notebook](https://github.com/miguelperezenciso/DLpipeline/blob/master/PDL.ipynb), and example data are [DATA](https://github.com/miguelperezenciso/DLpipeline/tree/master/DATA) folder. To run the script, you need to have installed Keras and TensorFlow, preferably in a computer with GPU architecture. Installing TensorFlow, especially for the GPU architecture, may not be a smooth experience. If unsolved, an alternative is using a docker (i.e., a virtual machine) with all functionalities built-in, or a cloud-based machine already configured. One option is https://github.com/floydhub/dl-docker. 
29
30
### Important note
31
Code provided was programmed for tensorflow 1. Generally, only minor changes are required to adapt it to tensorflow 2, which has keras built in. Some packages do not work though, such as talos for parameter optimization. You are invited to consider kerastuner in the meantime (https://keras.io/keras_tuner/). We plan to provide fully adapted code to tf2 in the near future.
32
33
### Practical recommendations
34
Before you fully dive in deep learning, here are some generic thoughts that you should consider:
35
- Before starting, inspect the data, both SNPs and phenotypic distributions. Look for unexpected, weird patterns that may cause biases or other artefacts. Standardize the variables and targets.
36
- Use Keras with TensorFlow, together with Sci-Kit Learn, a collection of well documented, easy-to-use machine learning modules. Reuse, but test, available public software whenever possible.
37
- Exercise prudence if extremely good or very poor results are obtained. Compare with other simpler methods such as ridge regression or random forests. Ample literature does support that differences between methods should not be dramatic.
38
- Do not be too ambitious. Is your data set big enough to fit such complex models?
39
- Dedicate enough time and thinking to optimize hyperparameters. Finely tune early, stopping to improve prediction performance. If the number of SNPs is too large, you may preselect different subsets according to the p-value or try other criteria.
40
- Once an optimum hyperparameter set has been decided, restart the algorithm several times to assess the influence of initial values.
41
42
### Deep Learning Jargon
43
DL is full of specific terms, here a few of the most relevant ones are defined (just in case).
44
45
|Term|Definition|
46
|:----:|----------|
47
|**Activation function**|The mathematical function f that produces neuron’s output f(w’x + b), where w is a weights vector, x is an input vector, and b is bias, a scalar. Both w and b are to be estimated for all neurons.|
48
|**Backpropagation**|Backpropagation is an efficient algorithm to compute the **loss**, it propagates the error at the output layer level backward. Then, the gradient of previous layers can be computed easily using the chain rule for derivatives.|
49
|**Batch**|In **Stochastic Gradient Descent** algorithms, each of the sample partitions within a given **epoch**|
50
|**Convolution kernel**|Mathematically, a convolution is a function that can be defined as an ‘integral transform’ between two functions, where one of the functions must be a **kernel**. The discrete version of the operation is just the weighting sum of several copies of the original function (f) shifting over the kernel.|
51
|**Convolutional Neural Network (CNN)**|CNNs are an especial case of Neural Networks which uses convolution instead a full matrix multiplication in the hidden layers. A typical CNN is made up of dense fully connected layers and ‘convolutional layers’.|
52
|**Dropout**|Dropout means that a given percentage of neurons output is set to zero. The percentage is kept constant, but the specific neurons are randomly sampled in every iteration. The goal of dropout is to avoid overfitting.|
53
|**Early stopping**|An anti-overfitting strategy that consists of stopping the algorithm before it converges.|
54
|**Epoch**|In **SGD** and related algorithms, an iteration comprising all batches in a given partition. In the next epoch, a different partition is employed.|
55
|**Feature**|In machine learning terminology, independent variable, i.e., SNP here|
56
|**Generative Adversarial Network (GAN)**|GANs are based in a simple idea: train two networks simultaneously, the Generator (G), which defines a probability distribution based on the information from the samples, and the Discriminator (D), which distinguishes data produced by G from the real data.|
57
|**Kernel = Filter = Tensor**|In DL terminology, the kernel is a multidimensional array of weights.|
58
|**Learning rate**|Specify the speed of gradient update|
59
|**Loss**|Loss function measures how differences between observed and predicted target variables are quantified.|
60
|**Neural layer**|‘Neurons’ are arranged in layers, i.e., groups of neurons that take the output of previous group of neurons as input |
61
|**Neuron**|The basic unit of a DL algorithm. A ‘neuron’ takes as input a list of variable values (x) multiplied by ‘weights’ (w) and, as output, produces a non-linear transformation f(w’x + b) where f is the activation function and b is the bias. Both w and b need to be estimated for each neuron such that the loss is minimized across the whole set of neurons.|
62
|**Multilayer Perceptron (MLP)**|Multilayer Perceptron Network is one of the most popular DL architectures, which consists of a series of fully connected layers, called input, hidden and output layers. Layers are connected by a directed graph. |
63
|**Optimizer**|Algorithm to find weights (w and b) that minimize the loss function. Most DL optimizers are based on **Stochastic Gradient Descent** (SGD).|
64
|**Pooling**|A pooling function substitutes the output of a network at a certain location with a summary statistic of the neighboring outputs. This is one of the crucial steps on the CNN architecture. The most common pooling operations are maximum, mean, median.|
65
|**Recurrent Neural Network (RNN)**|RNN architecture considers information from multiple previous layers. Then, in the RNN model, the current hidden layer is a non-linear function of both the previous layer(s) and of the current input (x). The model has memory since the bias term is based on the ‘past’. These networks can be used in temporal-like data structures. |
66
|**Stochastic Gradient Descent (SGD)**|An optimizing algorithm that consists of randomly partitioning the whole data set in subsets called ‘batches’ or ‘minibatches’ and update the gradient using only that data subset. The next batch is used in next iteration.|
67
|**Weight regularization**|An excess of parameters (weights, w) may produce the phenomenon called ‘overfitting’, which means that the model adjusts to the observed data very well, but prediction of new unobserved data is very poor. To avoid this, weights are estimated subject to constraints, a strategy called ‘penalization’ or ‘regularization’. The two most frequent regularizations are the L1 and L2 norms, which set restrictions on the sum of absolute values of w (L1) or of the square values (L2)|
68
69
### A Generic Keras Pipeline
70
71
After uploading, preprocessing and partitioning the dataset, an analysis pipeline in Keras requires of five main steps:
72
* A model is instantiated: The most usual model is ```Sequential```, which allows adding layers with different properties step by step.
73
* The architecture is defined: Here, each layer and its properties are defined. For each layer, number of neurons, activation functions, regularization and initialization methods are specified.
74
* The model is compiled: Optimizer algorithm with associated parameters (e.g., learning rate) and loss function are specified. This step allows us to symbolically define the operations (‘graphs’) to be performed later with actual numbers.
75
* Training: The model is fitted to the data and parameters are estimated. The number of iterations (‘epochs’) and batch size are specified, input and target variables need to be provided. The input data size must match that defined in step 2.
76
* Model predictions are validated via cross-validation.
77
78
**IMPORTANT NOTE:** This assumes that the DL architecture (eg, number of neurons, layers...) has been specified. Determining the optimum architecture is a serious and time consuming task that should be carefully done. Check [below](https://github.com/miguelperezenciso/DLpipeline#hyperparameter-optimization) and the [Talos](https://autonomio.github.io/docs_talos/)  scripts in the [jupyter notebook](https://github.com/miguelperezenciso/DLpipeline/blob/master/PDL.ipynb).
79
80
A generic Keras script would look like:
81
82
```
83
# Load modules needed
84
import numpy as np
85
import pandas as pd
86
from sklearn.model_selection import train_test_split
87
88
# keras items 
89
from keras.models import Sequential
90
from keras.layers import Dense, Activation
91
92
# Load the dataset as a pandas data frame
93
# X is a N by nSNP array with SNP genotypes
94
X = pd.read_csv('DATA/wheat.X', header=None, sep='\s+')
95
# Y is a N b nTRAIT array with phenotypes
96
Y = pd.read_csv('DATA/wheat.Y', header=None, sep='\s+')
97
# The first trait is analyzed
98
y = Y[0] 
99
100
# Data partitioning into train and test (20%)
101
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
102
103
# no. of SNPs in data
104
nSNP = X_train.shape[1] 
105
106
# Instantiate model
107
model = Sequential()
108
109
# (We assume optimum DL configuration has been determined)
110
# Add first layer containing 64 neurons
111
model.add(Dense(64, input_dim=nSNP))
112
model.add(Activation('relu'))
113
# Add second layer, with 32 neurons
114
model.add(Dense(32))
115
model.add(Activation('softplus'))
116
# Last, output layer contains one neuron (ie, target is a real numeric value)
117
model.add(Dense(1))
118
119
# Model Compiling 
120
model.compile(loss='mean_squared_error', optimizer='sgd')
121
122
# list some properties of the network
123
model.summary()
124
125
# Training
126
model.fit(X_train, y_train, epochs=100)
127
128
# Cross-validation: get predicted target values
129
y_hat = model.predict(X_test)
130
131
# Computes squared error in prediction
132
mse_prediction = model.evaluate(X_test, y_test)
133
```
134
135
### Implementing Multilayer Perceptrons (MLPs)
136
In Keras, a MLP is implemented by adding ‘dense’ layers. In the following code, a two layer MLP with 64 and 32 neurons is defined, where the input dimension is 200 (i.e., the number of SNPs):
137
138
```
139
from keras.models import Sequential
140
from keras.layers import Dense, Activation
141
142
nSNP=200 # no. of SNPs in data
143
# Instantiate
144
model = Sequential()
145
# Add first layer
146
model.add(Dense(64, input_dim=nSNP))
147
model.add(Activation('relu'))
148
# Add second layer
149
model.add(Dense(32))
150
model.add(Activation('softplus'))
151
# Last, output layer with linear activation (default)
152
model.add(Dense(1))
153
```
154
155
As is clear from the code, activation functions are ‘relu’ and ‘softplus’ in the first and second layer, respectively.
156
157
### Implementing Convolutional Neural Networks (CNNs)
158
The following Keras code illustrates how a convolutional layer with max pooling is applied prior to the MLP described above:
159
160
```
161
from keras.models import Sequential
162
from keras.layers import Dense, Activation 
163
from keras.layers import Flatten, Conv1D, MaxPooling1D
164
165
nSNP=200 # no. of SNPs in data
166
nStride=3 # stride between convolutions
167
nFilter=32 # no. of convolutions
168
169
model = Sequential()
170
# add convolutional layer
171
model.add(Conv1D(nFilter, 
172
kernel_size=3, 
173
strides=nStride,        
174
input_shape=(nSNP,1)))
175
# add pooling layer: here takes maximum of two consecutive values
176
model.add(MaxPooling1D(pool_size=2))
177
# Solutions above are linearized to accommodate a standard layer
178
model.add(Flatten())
179
model.add(Dense(64))
180
model.add(Activation('relu'))
181
model.add(Dense(32))
182
model.add(Activation('softplus'))
183
model.add(Dense(1))
184
```
185
186
### Implementing Recurrent Neural Neiworks (RNNs)
187
The following model is a simple implementation of 3 layers of LSTM with 256 neurons per layer:
188
 
189
```
190
from keras.models import Sequential
191
from keras.layers import Dense, Activation
192
193
nSNP=200 # no. of SNPs in data
194
195
# Instantiate
196
model = Sequential()
197
model.add(LSTM(256,return_sequences=True, input_shape=(None,1), activation=’tanh’))
198
model.add(Dropout(0.1))
199
model.add(LSTM(256, return_sequences=True, activation=’tanh’))
200
model.add(Dropout(0.1))
201
model.add(LSTM(256, activation=’tanh’))
202
model.add(Dropout(0.1))
203
model.add(Dense(units=1))
204
model.add(Activation(’tanh’))
205
model.compile(loss=mse, optimizer=adam, metrics=['mae'])
206
207
# prints some details
208
model.summary()
209
```
210
211
### Implementing Generative Networks 
212
A Keras implementation of GANs can be found at https://github.com/eriklindernoren/Keras-GAN. 
213
214
### Activation Functions
215
In Keras, activation is defined for every Dense layer as
216
217
```model.add(Activation(‘activation’))```
218
219
where ```‘activation’``` can take values ‘sigmoid’, ‘relu’, etc (https://keras.io/activations/). 
220
### Loss
221
The loss is a measure of how differences between observed and predicted target variables are quantified. Keras allows three simple metrics to deal with quantitative, binary or multiclass outcome variables: mean squared error, binary cross entropy and multiclass cross entropy, respectively. Several other losses are also possible or can be manually specified. 
222
223
Categorical cross-entropy is defined, for *M* classes, as 
224
 
225
&sum;<sub>i=1</sub>&sum;<sub>c=1</sub>&gamma;log(p<sub>ic</sub>), with i=1..N, c=1..M
226
227
where *N* is the number of observations, &gamma; is an indicator variable taking value 1 if i-th observation pertains to c-th class and 0 otherwise, and *P* is the predicted probability for i-th observation of being of class c. 
228
229
Losses are declared in compiling the model:
230
231
```
232
# Stochastic Gradient Descent (‘sgd’) as optimization algorithm
233
# quantitative variable, regression
234
model.compile(loss='mean_squared_error', optimizer=’sgd’)
235
236
# binary classification
237
model.compile(loss='binary_crossentropy', optimizer=’sgd’)
238
239
# multi class classification
240
model.compile(loss='categorical_crossentropy', optimizer=’sgd’)
241
```
242
243
When using categorical losses, your targets should be in categorical format. In order to convert integer targets into categorical targets, you can use the Keras utility ```to_categorical```:
244
245
``` 
246
from keras.utils import to_categorical
247
categorical_labels = to_categorical(int_labels, num_classes=None)
248
```
249
250
See https://keras.io/utils/#to_categorical. 
251
252
The next table shows the most common combinations of Loss Function and Last Layer Activation to different problems. 
253
254
|Problem | Last Layer Activation | Loss |
255
| :-------: | :------: | :-----: |
256
| Binary classification  | Sigmoid| Binary cross-entropy|
257
| Multiclass  | Softmax    |Categorical cross-entropy |
258
| Regression  | Linear     |  MSE  |
259
| 'Logistic' Regression | Sigmoid  |MSE/Binary cross-entropy|
260
261
262
### Optimizers
263
One of the most popular numerical algorithms to optimize a loss is the **Gradient Descent**. We can mention three variants of GD: **Batch gradient descent**, which computes the loss function gradient for the whole training data-set , **Stochastic gradient descent (SGD)** which consists of randomly partitioning the whole data set in subsets called ‘batches’ and update the gradient using only a single subset, then the next batch is used for the next iteration and, finally,  **minibatch gradient descent**,  which  is a combination of the two previous methods and it is based on spliting the training dataset into small batches. The gradient is averaged over a small number of samples allowing to reduce noise and code speed acceleration. Numerous optimizers exist and no clear rule on which one is best exist.
264
265
 SGD can be outperformed by SGD variants such as:
266
 
267
 - **MOMENTUM** accelerates SGD by moving on the relevant direction. The term increases when the gradients are moving in the same direction, and is reduced otherwise. Keras SGD function has the momentum option, which is 0.0 at default. 
268
 
269
-  **NESTEROV** is also implemented in keras sgd, being False at default. It is a predictor- corrector algorithm which generally overcomes the Momentum estimator.  It is implemented into two steps: in the predictor stage, the trajectory is linearly extrapolated as in the Moment, but in the second stage, it is corrected resulting on a convergence acceleration. 
270
271
These optimizers can be implemented in Keras as:
272
273
```
274
sgd = optimizers.SGD(momentum=0.9, nesterov=True)
275
model.compile(loss='mean_squared_error', optimizer=sgd)
276
```
277
278
Keras also implements some adaptative optimizers functions as: 
279
- **Adagrad** allows to control the learning rate considering the occurrence of parameters updates, i.e. the learning rate drops when the frequency of updates increases. It is recommended when data are sparse. 
280
- **Adadelta** is an extension of the Adagrad which only adapts learning rates basing on a restricted windows (w) of past gradients. 
281
- **RMSPROP** (Root mean Square Propagation) is also an adaptative learning rate algorithm which combines SGD and Root mean square propagation. Basically, it uses the exponential weighted average instead of individual gradient of w at the backprop state adjusting, at once, the learning rate. It shows a good behavior in Recurrent Neural Networks. 
282
- **Adam** is an adaptative moment method where a learning rate is maintained for each weight and separately adapted. 
283
- **Adamax** is a variant of Adam based on infinite norm
284
- **Nadam** is a combination of Nesterov and Adam algorithms. 
285
286
See https://keras.io/optimizers/ 
287
288
289
### Protection against Overfitting
290
Keras allows implementing **early stopping** via the callback procedure. The user needs to provide a monitored quantity, say test loss, and the program stops when it stops improving (https://keras.io/callbacks/#earlystopping):
291
292
```
293
from keras.callbacks import EarlyStopping, Callback
294
295
early_stopper = EarlyStopping(monitor='val_loss',                   
296
                              min_delta=0.1, 
297
                              patience=2, 
298
                              verbose=0, 
299
                              mode='auto')
300
        
301
model.fit(X_train, 
302
          y_train, 
303
          epochs=100,       
304
          verbose=1, 
305
          validation_data(X_test, y_test), 
306
          callbacks=[early_stopper])
307
```
308
309
In Keras, the available **regularizers** are L1 and L2 norm regularizers, which can also be combined in the so called ‘Elastic Net’ procedure, i.e., a mixed L1 and L2 regularization. In Keras, regularizers are applied to either kernels (weights), bias or activity (neuron output) and are specified together with the rest of layer properties, e.g.:
310
311
```
312
from keras.models import Sequential
313
from keras.layers import Dense, Activation 
314
from keras import regularizers
315
316
model.add(Dense(64, 
317
                input_dim=64,
318
                kernel_regularizer=regularizers.l2(0.01), 
319
                activity_regularizer=regularizers.l1(0.01) ) )
320
```
321
322
In Keras, different **dropout** rates can be specified for each layer, after its definition, e.g.:
323
324
```
325
model.add(Dense(30, activation='relu'))
326
model.add(Dropout(0.2))
327
```
328
329
### Hyperparameter Optimization
330
DL is not a single method, it is a heterogenous class of machine learning algorithms that depend on numerous hyperparameters, e.g., number of layers, neurons per layer, dropout rate, activation function and so on. DL optimization does require a general idea of which hyperparameters to optimize, together with a plausible range of values. Optimizing hyperparameter values is perhaps the most daunting task in using DL, which of course need to be done without resorting to the validation datasets!, and has been the topic of multitude of specialized papers. While for certain tasks like image analyses there are some specialized pre-trained networks or general architectures, this is not the case for new problems such as genomic prediction. 
331
332
In any realistic scenario, it is impossible to explore the whole space of hyperparameters, and sensible ranges should be chosen a priori. This step requires some basic understanding of what is going on and a general idea of which hyperparameters to optimize, together with a plausible range of values. For instance, it is probably unnecessary to go beyond 3 – 5 layers or over say 100 neurons per layer. Testing up to four activation functions should probably capture all expected patterns. Similarly, each of dropout, L1 or L2 regularization does the same job and so only one hyperparameter can be explored. As for the optimization algorithm, we have not found important differences among those for the case of genomic prediction. If you are using CNNs, additional hyperparameters can be tuned, mainly number of filters and kernel width. In our experience with human data ([Bellot et al 2018](https://www.genetics.org/content/210/3/809)), the optimum kernel width was very small (~ three SNPs) but this will likely depend on the extent of linkage disequilibrium between markers and on the genetic architecture of the phenotype.
333
334
Once an initial hyperparameter space has been specified, a grid search could be performed if the number of hyperparameters is not very large (say ≤ 4), although a random search is much more efficient ([Goodfellow et al. 2016](https://www.deeplearningbook.org/)). Finally, other sophisticated approaches can be envisaged, such as genetic algorithms. In [Bellot et al 2018](https://www.genetics.org/content/210/3/809), we modified the implementation by Jan Liphardt (https://github.com/jliphard/DeepEvolve). The modified script can be retrieved from https://github.com/paubellot/DeepEvolve and https://github.com/paubellot/DL-Biobank/tree/master/GA. Our recommendation is that the number of generations should be relatively large. If computing time is too large, the data could be split into smaller subsets. In any case, we do recommend some narrow grid / random search to be performed around values suggested by the genetic algorithm.
335
336
There are numerous tools to assist in this task, e.g., hyperas (https://github.com/maxpumperla/hyperas), keras_auto (https://github.com/Tony607/Keras_auto). In the jupyter notebok we used talos (https://autonomio.github.io/docs_talos/). Talos allows grid, random and probabilistic hyperparameter search. Grid search is useful to systematically visualize the effect of a few predetermined hyperparameters and can be recommended for final tuning. For real world analyses, random or probabilistic searches should be preferred.
337
338
Finally, note that optimizing hyperparameters for all desired marker sets and phenotypes will be unfeasible. We recommend choosing a few hyperparameter combinations that are near-optimum across a range of phenotypes / marker sets and that span a diversity of architectures, e.g., with varying neuron layers.
339
340
The next table lists the main DL hyperparameters:
341
342
|Hyperparameter|Role | Issues |
343
| :-------: | :------: | :-----: |
344
|Optimizer  |Algorithm to optimize the loss function. Most are based in SGD.| Optimization algorithms for training deep models includes some specializations to solve different challenges. |
345
|Learning rate | Specify the speed of gradient update.|Can result in meandering if too low and in reaching local maxima if too high. |
346
|Batch size|Determines number of samples in each SGD step.  | Can slow convergence if too small.|
347
|Number of layers | Controls flexibility to fit the data |The bigger the number, the higher the flexibility but may increase overfitting.|
348
|Neurons per layer|The bigger the number, the higher the flexibility.|The bigger the number, the higher the flexibility but may increase overfitting and poor training.|
349
|Convolutinal kernel width*|A larger kernel allows learning more complex patterns.|A larger kernel allows learning more complex patterns.|
350
|Activation|Makes possible to learn non-linear complex functional mappings between the inputs and response variable|Numerous options. No uniformly best function.|
351
|Weight regularization|Controls overfitting.|Decreasing the weight regularization allows the model to fit the training data better, with the risk of a poor prediction.|
352
|Dropout|Controls overfitting.|A higher dropout helps to reduce overfitting.|
353
354
### Usage 
355
The full exambles can be found in PDL.ipynb file. Please make sure you have installed the right packages version in order to succefully run our examples. In the following link you can found more information about the required packages 
356
357
- [Requirements](https://github.com/miguelperezenciso/DLpipeline/blob/master/inst/md/requirements.md)
358
359
***
360
### Citations
361
[Bellot P, De Los Campos G, Pérez-Enciso M. 2018. Can Deep Learning Improve Genomic Prediction of Complex Human Traits? Genetics 210:809-819.](https://www.genetics.org/content/210/3/809)
362
363
[Pérez-Enciso M, Zingaretti LM. 2019. A Guide on Deep Learning for Complex Trait Genomic Prediction. Genes, 10, 553.](https://www.mdpi.com/2073-4425/10/7/553)
364
365
[Zingaretti LM, Gezan SA, Ferrão LFV, Osorio LF, Monfort A, Muñoz PR, Whitaker VM, Pérez-Enciso M. 2020. Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species. Frontiers in Plant Science 11:25](https://doi.org/10.3389/fpls.2020.00025)
366