Diff of /2020/README.md [000000] .. [376c87]

Switch to unified view

a b/2020/README.md
1
# AI for Genomics 2020
2
3
## Course Description:
4
5
Sponsored by the city of Montréal, with support from Mila – Quebec Artificial Intelligence Institute and IVADO (Institut de valorisation des données), the AI in Genomics program is a 12-week training that will allow participants to get hands-on experience in working with machine learning. The program will help to prepare participants with expertise in genomics to develop a foundational knowledge of advanced machine learning methodologies so that they can develop a better understanding of where and how these techniques could be used with genomics data.
6
7
Dates: 1/20/2020-4/13/2020
8
9
System to Q&A: [https://piazza.com/class/k4rmacqhp136ae](https://piazza.com/class/k4rmacqhp136ae)
10
11
12
## Instructors:
13
14
* [Joseph Paul Cohen](https://josephpcohen.com/) (Program Scientific Advisor)
15
16
* Tariq Daouda
17
18
* Paul Bertin
19
20
* Julie Hussin
21
22
* Ahmad Pesaranghader
23
24
* Sydney Swaine-Simon
25
26
27
## Lecture 1: Onboarding and  Introduction to neural networks
28
29
**(Tariq Daouda, January 24th @ 15 h -18 h)**
30
31
Participants should get a basic understanding of neural networks and deep learning as well as enough practical knowledge to start building neural networks. 
32
33
* Datasets
34
35
* Classification
36
37
* KNN
38
39
* Regression
40
41
* Evaluation: Accuracy (train, test, validation)
42
43
* Basics of Backprop (momentum?)
44
45
* Fully connected layers
46
47
* Non-linearities (Relu, tanh, sigmoid) 
48
49
* Conv (1D, 2D)
50
51
* pyTorch introduction (Colab)
52
53
* Practical: pyTorch feed forward: Fully connected & Conv
54
55
Slides (pdf): [link](slides/Week%201%20Slides.pdf)
56
57
Slides: [link](https://drive.google.com/file/d/1KCKd55MoAuhIX1_9Om1Ibt1iuXPj5WJB/view)
58
59
## Lecture 2: Representation learning and backprop
60
61
**(Joseph Paul Cohen, January 31st @ 15 h -18 h)**
62
63
**Location- John Molson School of Business - S2.445 - Classroom**
64
65
Deep learning overview, representation learning methods in detail (sammons map, t-sne), the backprop algorithm in detail, and regularization and its impact on optimization.
66
67
* (30min) What is deep learning overview ([Slides](https://docs.google.com/presentation/d/18iS4cwfkwhnslE2CzW_yOS0pjujIVY17ozzf2YIx5Dk/edit), [Slides (pdf)](slides/Week%202%20slides%20Part%203.pdf))
68
69
    * Define supervised and self-supervised prob perspective
70
71
    * How to approach problems (use sklearn)
72
73
    * Examples of go-to methods: logistic regression, decision tree etc (use sklearn)
74
75
* (45min) Backprop in more detail ([Slides](https://docs.google.com/presentation/d/1eWu8TvanOLRQehlzzgqT1Bl--vabX16iXKRLZCEqIV0/edit), [Slides (pdf)](slides/Week%202%20Slides.pdf))
76
77
    * Work through an example of manually performing the algorithm
78
79
    * Backpropagation (visualizing the chain rule)
80
81
    * Intuition for applying gradient updates for arbitrary functions
82
83
* (1hr) Representation learning ([Slides](https://docs.google.com/presentation/d/1Z-7FmOmCXgEZdzojNXj8AZwfeb6SstGi0BJX5T9qzjY/edit), [Slides (pdf)](slides/Week%202%20slides%20Part%202.pdf))
84
85
    * Non-linear dim reduction
86
87
    * word2vec
88
89
    * Sammons map (tutorial code)
90
91
    * t-SNE
92
93
    * Regularization
94
95
## Lecture 3: Challenges of Machine Learning for Transcriptomics
96
97
**(Paul Bertin, February 7th @ 15 h -18 h)**
98
99
Challenges facing machine learning and deep learning techniques when applied to transcriptomics: biases, high dimensionality, and interpretability of models. We will dive in the limitations of a machine learning assisted drug effect prediction pipeline and analyse each step to identify the challenges of ML for transcriptomics. ([Slides](https://drive.google.com/file/d/1ou6MMGbWMzRsYS5eyp1eIBhgumC7_nwJ/view?usp=sharing))
100
101
* From real world to input data (45min)
102
103
    * Dataset biases
104
105
    * Acquisition biases
106
107
    * Preprocessing
108
109
* The supervised learning pipeline (45min)
110
111
    * The curse of dimensionality
112
113
    * Making the right assumptions: inspiration from Computer Vision
114
115
    * Which assumptions for transcriptomics?
116
117
    * Gene interaction graphs?
118
119
    * Parameter sharing among genes?
120
121
    * Similar response to perturbation in latent space?
122
123
* Model interpretability (30min)
124
125
    * Feature importance for deep models
126
127
    * Simpson’s paradox
128
129
* Practical: (30min)
130
131
    * Pytorch deep learning pipeline
132
133
    * Saliency Maps
134
135
    * Deep Dream
136
137
## Lecture 4: Deep Learning Models in Genomics
138
139
**(Ahmad Pesaranghader and Julie Hussin, February 14th  @ 15 h -18 h)**
140
141
In the first part of this lecture, we introduce the different DL architectures used in population and functional genomics. In the second part of this lecture, we then introduce generative models and explore how they can be beneficial in the context of genomics, mainly for the augmentation of the training data.
142
143
**1. Deep learning in population genetics and multi-omics (1h)**
144
145
* Introduction to population and functional genomics
146
147
* Simulations in population genetics.
148
149
* Convolutional Neural Networks (CNNs) for population genetics inference
150
151
* Motif-based approaches in functional genomics
152
153
* DeepSEA ([link](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4768299/)) and state-of-the art models in functional genomics.
154
155
**2. Advanced deep learning models for genomics (1h30)**
156
157
* Variational AutoEncoders (VAEs)
158
159
* Generative Adversarial Networks (GANs)
160
161
* Limitations of vanilla GANs and vanilla VAEs
162
163
* GANs and VAEs in Genomics
164
165
* Discussion of interesting applications in the field mainly with respect to different omics data-types (current state-of-the-art and guideline for future work)
166
167
**3. Tutorial (30 mins)**
168
169
* Quick Implementation of vanilla VAE/GAN in PyTorch (Google Colab)
170
171
* GANs from the Paper: Generating and designing DNA with deep generative models ([https://arxiv.org/abs/1712.06148](https://arxiv.org/abs/1712.06148))
172
173
## Lecture 5: Ethics
174
175
**(Sydney Swaine-Simon, February 21st   @ 15 h -18 h)**
176
177
In this lecture we will discuss the ethics associated with Genomics data and developing machine learning algorithms. 
178
179
180
## Some papers and additional resources: 
181
182
* [https://github.com/gokceneraslan/awesome-deepbio](https://github.com/gokceneraslan/awesome-deepbio)
183
184
* [https://github.com/hussius/deeplearning-biology](https://github.com/hussius/deeplearning-biology)
185
186
* Libbrecht MW et al. Machine learning applications in genetics and genomics, Nat.Rev.Genetic 2015
187
188
* Jiang P et al. Big data mining yields novel insights on cancer, Nat Genet. 2015
189
190
* Deep learning: new computational modelling techniques for genomics (2019) : [https://sci-hub.tw/https://www.nature.com/articles/s41576-019-0122-6](https://sci-hub.tw/https://www.nature.com/articles/s41576-019-0122-6)
191
192
* [https://www.nature.com/articles/s41576-019-0122-6](https://www.nature.com/articles/s41576-019-0122-6)
193
194
* A primer on deep learning in genomics : [https://sci-hub.tw/https://www.nature.com/articles/s41588-018-0295-5](https://sci-hub.tw/https://www.nature.com/articles/s41588-018-0295-5)
195
196
* [https://www.nature.com/articles/s41588-018-0295-5](https://www.nature.com/articles/s41588-018-0295-5)
197
198
Functional genomics papers:
199
200
* DeepSEA: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4768299/](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4768299/) 
201
202
* DeFine: [https://www.ncbi.nlm.nih.gov/pubmed/29617928](https://www.ncbi.nlm.nih.gov/pubmed/29617928)
203
204
* DanQ: [https://www.ncbi.nlm.nih.gov/pubmed/27084946](https://www.ncbi.nlm.nih.gov/pubmed/27084946)
205
206
* DeeperBind: [https://arxiv.org/abs/1611.05777](https://arxiv.org/abs/1611.05777)
207
208
* SPEID: [https://link.springer.com/article/10.1007/s40484-019-0154-0](https://link.springer.com/article/10.1007/s40484-019-0154-0) 
209
210
Population genetics papers:
211
212
* [https://www.ncbi.nlm.nih.gov/pubmed/29331490](https://www.ncbi.nlm.nih.gov/pubmed/29331490)
213
214
* [https://www.ncbi.nlm.nih.gov/pubmed/30517664](https://www.ncbi.nlm.nih.gov/pubmed/30517664)
215
216
* [https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004845](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004845)
217
218
* [https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2927-x](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2927-x)
219
220
Torrente, Aurora, et al. "Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression." *PLOS ONE*, edited by Paolo Provero, vol. 11, no. 6, Public Library of Science, June 2016, p. e0157484, doi:10.1371/journal.pone.0157484.
221
222
Ching, Travers, et al. "Opportunities And Obstacles For Deep Learning In Biology And Medicine." *Journal of The Royal Society Interface*, Cold Spring Harbor Laboratory, Jan. 2018, doi:10.1101/142760.
223
224
[https://canvas.stanford.edu/courses/51037](https://canvas.stanford.edu/courses/51037)