Diff of /README.md [000000] .. [a43cea]

Switch to unified view

a b/README.md
1
# MODAS
2
## Introduction
3
MODAS (Multi-Omics Data Association Study toolkit) is an efficient software for high-dimensional omics data association analysis, featuring five main characteristics.
4
5
+ MODAS employs a novel whole-genome association study (GWAS) strategy to handle high-dimensional omics data. Specifically, MODAS generates pseudo-genotype files to reduce the dimensionality of the original SNP-based genotype data. It then performs block-based GWAS to identify significantly associated genomic regions (SAGR), and finally conducts SNP-based association analysis on the SAGR to obtain QTLs for high-dimensional omics data.
6
7
+ MODAS introduces Mendelian randomization algorithms to infer genetic regulatory relationships between molecular QTLs and phenotypic traits. This method helps to establish biological hypotheses and uncover potential genetic mechanisms.
8
9
+ MODAS uses contrastive PCA algorithm to uncover differential components in population omics data between different conditions. This functionality is particularly useful for identifying agronomically important genes involved in condition comparisons, such as genes related to crop stress tolerance.
10
11
+ MODAS employs an image matching algorithm to integrate multi-omics molecular QTLs, aiding in the elucidation of the genetic mechanisms underlying complex traits.
12
13
+ MODAS provides an HTML-based web interface for visualizing and querying GWAS results from omics data, facilitating further gene function exploration.
14
## Installation
15
### Install via pip
16
```
17
#Depends: R(>=3.6), python(>=3.8)
18
pip install MODAS
19
```
20
### Installation using conda
21
```
22
conda create -n modas python=3.8 -y
23
conda install -y -c conda-forge r-rcppeigen r=3.6 rpy2
24
$CONDA_PREFIX/bin/pip install MODAS
25
```
26
## New features in MODAS2
27
### Identifying stress-responsive Molecular QTL through contrastive PCA and Two-Way ANOVA
28
MODAS2 innovatively employs the contrastive PCA algorithm to separate the stress-response effects of natural variation on molecular traits from the genetic effects of natural variation on molecular traits. The separated stress response effects are then used as stress response indices for identifying stress-responsive molecular QTL. After identifying QTL using the stress response indices, the significance of the stress response for these QTL is further assessed through two-way ANOVA, resulting in the final stress-responsive molecular QTL. MODAS2 uses the subcommand `contrast` to identify stress-responsive molecular QTL. The command line is as follows:
29
```
30
MODAS contrast -stress_phe example_data/test_salt.phe.csv -control_phe example_data/test_control.phe.csv -gwas -g example_data/example_geno.contrast -genome_cluster example_data/example_contrast.genome_cluster.csv -p 10 -o example
31
```
32
`contrast` subcommand generates five files including `example.scpca_pc.phe.csv`, `example.scpca_pc.beta_test.csv`, `example.scpca_pc.normalized_phe.csv`, `example.region_gwas_qtl_res.anova.csv`, `example.region_gwas_qtl_res.anova.sig.csv` and `example.region_gwas_bad_qtl_res.csv`, which contain the stress response indices of molecular traits, statistical test results of the stress response indices, normalized stress response indices of molecular traits, stress response indices QTL results including two-way ANOVA P values, stress-responsive molecular QTL results and unreliable molecular QTL results, respectively.
33
### Multi-Trait QTL Colocalization Analysis Based on Image Matching Algorithms
34
QTL colocalization analysis is an effective method for integrating functional information across different traits. However, existing methods rarely perform multi-trait QTL colocalization analysis. MODAS2 employs an image matching algorithm to score the colocalization degree between pairs of QTLs, and then uses clustering algorithms to quickly achieve multi-trait QTL colocalization analysis. Multi-trait QTL colocalization analysis can be performed using the `coloc` subcommand of MODAS2. The command line is as follows:
35
```
36
MODAS coloc -qtl example_data/test_coloc.qtl.csv -g example_data/example_geno.contrast -gwas_dir example_data/gwas_coloc_test/ -p 6 -o example
37
```
38
`coloc` subcommand generates three files including `example.coloc_res.csv`, `example.coloc_pairwise.csv` and `example.dis_res.csv`, which contain the results of the co-localized QTL clusters, the pairwise QTL co-localization results, and the similarity results between pairwise QTLs, respectively.
39
40
__Note__: Sample data for MODAS2 can be downloaded via DOI [https://zenodo.org/doi/10.5281/zenodo.11951520](https://zenodo.org/doi/10.5281/zenodo.11951520).
41
## MODAS analytical pipeline
42
### Downloading example data
43
`MODAS_data` containing sample data for MODAS and omics data used in the article uploaded by Git extension Git Large File Storage (LFS), first download Git LFS from [https://git-lfs.github.com/](https://git-lfs.github.com/),  and place the `git-lfs` binary on your system’s executable `$PATH` or equivalent, then set up Git LFS for your user account by running:
44
```
45
git lfs install
46
```
47
next download `MODAS_data` by running:
48
```
49
git clone https://github.com/liusy-jz/MODAS_data.git
50
```
51
When the download is complete, first check the integrity of the downloaded data, `MODAS_data` contains five folders, namely `agronomic_traits`, `genotype`, `metabolome`, `transcriptome` and `example_data`, also contains a gene annotaion file for maize. The example folder contains sample data for MODAS, while other folders contain the omics data used in the article.<br/><br/>
52
Then, enter the `MODAS_data` directory,
53
```
54
cd MODAS_data
55
```
56
### Generate pseudo-genotype files
57
```
58
MODAS genoidx -g example_data/example_geno -genome_cluster -o example_geno
59
```
60
Pseudo-genotype files generated by `genoidx` subcommand will be saved as `example_geno.genome_cluster.csv`.
61
### Prescreen candidate genomic regions for omics data
62
The `prescreen` subcommand uses genome-wide genotype files to calculate the kinship matrix, first extract genotype files by:
63
```
64
tar -xvf genotype/chr_HAMP_genotype.tar.gz
65
```
66
Then, the pseudo-genotype file `example_geno.genome_cluster.csv` generated by `genoidx` and the `example_phe.csv` file under the `example_data` folder are used for prescreen analysis,
67
```
68
MODAS prescreen -g ./chr_HAMP -genome_cluster example_geno.genome_cluster.csv -phe example_data/example.phe.csv -o example
69
```
70
`prescreen` subcommand generates two files including `example.sig_omics_phe.csv` containing phenotype data and `example.phe_sig_qtl.csv` containing candidate genomic regions of phenotype.
71
### Perform regional association analysis to identify QTLs
72
The `prescreen` subcommand outputs are used for regional association analysis,
73
```
74
MODAS regiongwas -g ./chr_HAMP -phe example.sig_omics_phe.csv -phe_sig_qtl example.phe_sig_qtl.csv -o example
75
```
76
`regiongwas` subcommand generates two QTL files including `example.region_gwas_qtl_res.csv` containing reliable QTL results and `example.region_gwas_bad_qtl_res.csv` containing unreliable QTL results.
77
### Perform Mendelian randomization analysis
78
```
79
MODAS mr -g ./chr_HAMP -exposure ./example_data/example.exp.csv -outcome agronomic_traits/blup_traits_final.new.csv -qtl example_data/example_qtl_res.csv -mlm -o example
80
```
81
The results of Mendelian randomization analysis are saved as `example.MR.csv`.
82
### MR-based network analysis
83
MR-based network analysis is carried out by the parameter `-net` of `mr` subcommand. It uses transcriptome data for subnetwork modules analysis,
84
```
85
MODAS mr -g ./chr_HAMP -exposure ./example_data/network_example.exp.csv -outcome ./example_data/network_example.exp.csv -qtl example_data/network_example_qtl.csv -mlm -net -o network_example
86
```
87
Network analysis generated four files, including `network_example.MR.csv` containing gene pairs with MR effect, `network_example.edgelist` containing gene pairs with weight, `network_example.cluster_one.result.csv` containing all identified subnetwork modules, `network_example.sig.cluster_one.result.csv` containing significant subnetwork modules.
88
### co-associated gene analysis
89
Co-associated genes analysis is not a modas function. It is implemented by script `co-associated.py`. The analysis command line is as follows:
90
```
91
python3 example_data/co-associated.py example_data/co_associated.test.pvalue.csv co-associated_test
92
```
93
Then, a file containing co-associated gene labels and a heatmap showing relationship between co-associated genes are saved as `co-associated_test.cluster.csv` and `co-associated_test.cluster.heatmap.pdf`.
94
## Document
95
detail in [https://modas-bio.github.io/](https://modas-bio.github.io/)