Switch to unified view

a/README.md b/README.md
1
# Decoding gene regulation in the mouse embryo using single-cell multi-omics
1
# Decoding gene regulation in the mouse embryo using single-cell multi-omics
2
2
3
This repository contains the scripts to reproduce the results of the manuscript [Decoding gene regulation in the mouse embryo using single-cell multi-omics](https://www.biorxiv.org/content/10.1101/2022.06.15.496239v1). 
3
This repository contains the scripts to reproduce the results of the manuscript [Decoding gene regulation in the mouse embryo using single-cell multi-omics](https://www.biorxiv.org/content/10.1101/2022.06.15.496239v1). 
4
4
5
5
6
Abstract
6
Abstract
7
--------
7
--------
8
Following gastrulation, the three primary germ layers develop into the major organs in a process known as organogenesis. Single-cell RNA sequencing has enabled the profiling of the gene expression dynamics of these cell fate decisions, yet a comprehensive map of the interplay between transcription factors and cis-regulatory elements is lacking, as are the underlying gene regulatory networks. Here we generate a multi-omics atlas of mouse early organogenesis by simultaneously profiling gene expression and chromatin accessibility from tens of thousands of single cells. We develop a computational method to leverage the multi-modal readouts to predict transcription factor binding events in cis-regulatory elements, which we then use to infer gene regulatory networks that underpin lineage commitment events. Finally we show that these models can be used to generate in silico predictions of the effect of transcription factor perturbations. We validate this experimentally by showing that Brachyury is essential for the differentiation of neuromesodermal progenitors to somitic mesoderm fate by priming cis-regulatory elements.
8
Following gastrulation, the three primary germ layers develop into the major organs in a process known as organogenesis. Single-cell RNA sequencing has enabled the profiling of the gene expression dynamics of these cell fate decisions, yet a comprehensive map of the interplay between transcription factors and cis-regulatory elements is lacking, as are the underlying gene regulatory networks. Here we generate a multi-omics atlas of mouse early organogenesis by simultaneously profiling gene expression and chromatin accessibility from tens of thousands of single cells. We develop a computational method to leverage the multi-modal readouts to predict transcription factor binding events in cis-regulatory elements, which we then use to infer gene regulatory networks that underpin lineage commitment events. Finally we show that these models can be used to generate in silico predictions of the effect of transcription factor perturbations. We validate this experimentally by showing that Brachyury is essential for the differentiation of neuromesodermal progenitors to somitic mesoderm fate by priming cis-regulatory elements.
9
9
10
<p align="center"> 
10
<p align="center"> 
11
<img src="images/overview_github.png" width="900" height="400"/>
11
<img src="https://easymed.ai/models/AlyssaS/mouse_organogenesis-m-om/git/ci/main/tree/images/overview_github.png?raw=true" width="900" height="400"/>
12
</p>
12
</p>
13
13
14
14
15
Content
15
Content
16
-------
16
-------
17
* `/acc/`: analysis of chromatin accessibility data
17
* `/acc/`: analysis of chromatin accessibility data
18
* `/rna/`: analysis of RNA expression data
18
* `/rna/`: analysis of RNA expression data
19
* `/accrna/`: simultaneous analysis of RNA expression and chromatin accessibility data
19
* `/accrna/`: simultaneous analysis of RNA expression and chromatin accessibility data
20
20
21
Snakemake pipeline
21
Snakemake pipeline
22
-------
22
-------
23
We provide snakemake pipelines that can be used to reproduce many results. 
23
We provide snakemake pipelines that can be used to reproduce many results. 
24
* `/rna/snakemake`: snakemake pipeline for RNA expression
24
* `/rna/snakemake`: snakemake pipeline for RNA expression
25
* `/atac/ArchR/snakemake`: snakemake pipeline for chromatin accessibility using ArchR
25
* `/atac/ArchR/snakemake`: snakemake pipeline for chromatin accessibility using ArchR
26
* `/rna_atac/snakemake`: snakemake pipeline to integrate RNA expression and chromatin accessibility results (MOFA, in silico ChIP-seq, etc.)
26
* `/rna_atac/snakemake`: snakemake pipeline to integrate RNA expression and chromatin accessibility results (MOFA, in silico ChIP-seq, etc.)
27
27
28
Please note that the snakemake pipeline is rather complex and needs to be simplified and polished. It is currently useful to get an idea of the pipeline, but bare in mind that it  won't work straight away.
28
Please note that the snakemake pipeline is rather complex and needs to be simplified and polished. It is currently useful to get an idea of the pipeline, but bare in mind that it  won't work straight away.
29
29
30
IGV Genome browser session
30
IGV Genome browser session
31
-------
31
-------
32
We provide a precomputed IGV Genome Browser Session that can be used to interactively explore the ATAC-seq profiles, as shown in the screenshot below:
32
We provide a precomputed IGV Genome Browser Session that can be used to interactively explore the ATAC-seq profiles, as shown in the screenshot below:
33
33
34
<p align="center"> 
34
<p align="center"> 
35
<img src="images/igv_screenshot_github.png" width="650" height="350"/>
35
<img src="images/igv_screenshot_github.png" width="650" height="350"/>
36
</p>
36
</p>
37
37
38
It can be downloaded running the following command line:
38
It can be downloaded running the following command line:
39
```
39
```
40
wget ftp://ftpusr92:5FqIACU9@ftp1.babraham.ac.uk/igv_session_celltype.tar.gz
40
wget ftp://ftpusr92:5FqIACU9@ftp1.babraham.ac.uk/igv_session_celltype.tar.gz
41
```
41
```
42
42
43
Then load the file `igv_session.xml` using `File -> Open Session`.
43
Then load the file `igv_session.xml` using `File -> Open Session`.
44
44
45
<!-- The following [videotutorial](XXX) shows how to download and load the IGV session -->
45
<!-- The following [videotutorial](XXX) shows how to download and load the IGV session -->
46
46
47
R Shiny app
47
R Shiny app
48
-------
48
-------
49
The R shiny app for interactive data analaysis is available [here](https://www.bioinformatics.babraham.ac.uk/shiny/shiny_multiome_organogenesis/)
49
The R shiny app for interactive data analaysis is available [here](https://www.bioinformatics.babraham.ac.uk/shiny/shiny_multiome_organogenesis/)
50
50
51
<!-- Pre-recorded talk
51
<!-- Pre-recorded talk
52
-------
52
-------
53
This precorded talk by Ricard Argelaguet presents an overview of the study. -->
53
This precorded talk by Ricard Argelaguet presents an overview of the study. -->
54
54
55
Directories
55
Directories
56
-------
56
-------
57
* Mapping to the reference atlas: `/rna/mapping`
57
* Mapping to the reference atlas: `/rna/mapping`
58
* MOFA dimensionality reduction: `/rna_atac/mofa`
58
* MOFA dimensionality reduction: `/rna_atac/mofa`
59
* Analysis of gene markers: `/rna_atac/rna_vs_acc/pseudobulk/gene_markers_rna_vs_acc`
59
* Analysis of gene markers: `/rna_atac/rna_vs_acc/pseudobulk/gene_markers_rna_vs_acc`
60
* in silico ChIP-seq: `/rna_atac/virtual_chipseq_library`
60
* in silico ChIP-seq: `/rna_atac/virtual_chipseq_library`
61
* Metacell inference: `/rna/metacells/run`
61
* Metacell inference: `/rna/metacells/run`
62
* Catalogue of TF activities per cell type (Figure 3): `/rna_atac/rna_vs_chromvar_chip/pseudobulk/per_celltype`
62
* Catalogue of TF activities per cell type (Figure 3): `/rna_atac/rna_vs_chromvar_chip/pseudobulk/per_celltype`
63
* Gene regulatory network of NMP differentiation (Figure 4): `/rna_atac/gene_regulatory_networks/metacells/trajectories`
63
* Gene regulatory network of NMP differentiation (Figure 4): `/rna_atac/gene_regulatory_networks/metacells/trajectories`
64
64
65
Data
65
Data
66
----
66
----
67
<!-- The raw data is accessible at GEO ([XXXX](XXXX)).  -->
67
<!-- The raw data is accessible at GEO ([XXXX](XXXX)).  -->
68
The data can be downloaded from the following FTP server: 
68
The data can be downloaded from the following FTP server: 
69
```
69
```
70
Hostname    ftp1.babraham.ac.uk
70
Hostname    ftp1.babraham.ac.uk
71
Username    ftpusr92
71
Username    ftpusr92
72
Password    5FqIACU9
72
Password    5FqIACU9
73
FTP URL     ftp://ftpusr92:5FqIACU9@ftp1.babraham.ac.uk
73
FTP URL     ftp://ftpusr92:5FqIACU9@ftp1.babraham.ac.uk
74
```
74
```
75
75
76
Directory structure:
76
Directory structure:
77
77
78
- `sample_metadata.txt.gz`: cell metadata file
78
- `sample_metadata.txt.gz`: cell metadata file
79
- `results`: results folder
79
- `results`: results folder
80
    - `rna`: results based on RNA expression alone
80
    - `rna`: results based on RNA expression alone
81
    - `atac`: results based on chromatin accessibility alone
81
    - `atac`: results based on chromatin accessibility alone
82
    - `rna_atac`: results based on both RNA expression and chromatin accessibility
82
    - `rna_atac`: results based on both RNA expression and chromatin accessibility
83
- `data`: data folder
83
- `data`: data folder
84
    - `original`: CellRanger output files
84
    - `original`: CellRanger output files
85
    - `processed`: processed data objects
85
    - `processed`: processed data objects
86
        - `rna`: Seurat, anndata and SingleCellExperiment objects.
86
        - `rna`: Seurat, anndata and SingleCellExperiment objects.
87
        - `atac/archR`: ArchR objects
87
        - `atac/archR`: ArchR objects
88
- `igv_session_celltype.tar.gz`: IGV session of celltype-specific ATAC profiles
88
- `igv_session_celltype.tar.gz`: IGV session of celltype-specific ATAC profiles
89
- `igv_session_brachyury_ko.tar.gz`: IGV session of celltype-specific ATAC profiles for the Brachyury KO study
89
- `igv_session_brachyury_ko.tar.gz`: IGV session of celltype-specific ATAC profiles for the Brachyury KO study
90
90
91
To download a specific file:
91
To download a specific file:
92
```
92
```
93
wget ftp://ftpusr92:5FqIACU9@ftp1.babraham.ac.uk/data/processed/rna/SingleCellExperiment.rds .
93
wget ftp://ftpusr92:5FqIACU9@ftp1.babraham.ac.uk/data/processed/rna/SingleCellExperiment.rds .
94
```
94
```
95
95
96
To download everything (~60GB):
96
To download everything (~60GB):
97
```
97
```
98
wget -r ftp://ftpusr92:5FqIACU9@ftp1.babraham.ac.uk/ .
98
wget -r ftp://ftpusr92:5FqIACU9@ftp1.babraham.ac.uk/ .
99
```
99
```
100
100
101
If your download from the FTP server is slow, we also provide a temporary download via Dropbox [here](https://www.dropbox.com/sh/y4drtqi82vwl8vf/AAAsLcrye8jUTm1XPv7VNYhFa?dl=0)
101
If your download from the FTP server is slow, we also provide a temporary download via Dropbox [here](https://www.dropbox.com/sh/y4drtqi82vwl8vf/AAAsLcrye8jUTm1XPv7VNYhFa?dl=0)
102
102
103
103
104
Twitter thread
104
Twitter thread
105
--------
105
--------
106
We all know this is the most important day in the road towards a scientific publication:  
106
We all know this is the most important day in the road towards a scientific publication:  
107
107
108
[![Twitter URL](https://img.shields.io/twitter/url/https/twitter.com/bukotsunikki.svg?style=social)](https://twitter.com/RArgelaguet/status/1537146799772815366)
108
[![Twitter URL](https://img.shields.io/twitter/url/https/twitter.com/bukotsunikki.svg?style=social)](https://twitter.com/RArgelaguet/status/1537146799772815366)
109
109
110
Contact
110
Contact
111
-------
111
-------
112
We have created a Slack group to discuss results, questions, collaborations, etc. related to the study. Feel free to drop by [using this link](https://join.slack.com/t/mouseembryo10-waq1273/shared_invite/zt-1dxn064kk-garRxOLAhLOUFNZBwqzfqQ). Alternatively, feel free to reach me via email at rargelaguet@altoslabs.com
112
We have created a Slack group to discuss results, questions, collaborations, etc. related to the study. Feel free to drop by [using this link](https://join.slack.com/t/mouseembryo10-waq1273/shared_invite/zt-1dxn064kk-garRxOLAhLOUFNZBwqzfqQ). Alternatively, feel free to reach me via email at rargelaguet@altoslabs.com
113
113