|
a |
|
b/README.md |
|
|
1 |
<img src="docs/source/inmoose.png" width="600"> |
|
|
2 |
|
|
|
3 |
[](https://pypi.org/project/inmoose) |
|
|
4 |
[](https://pepy.tech/project/inmoose) |
|
|
5 |
[](https://coveralls.io/github/epigenelabs/inmoose) |
|
|
6 |
[](https://inmoose.readthedocs.io/en/latest/?badge=latest) |
|
|
7 |
[](LICENSE) |
|
|
8 |
|
|
|
9 |
# InMoose |
|
|
10 |
|
|
|
11 |
InMoose is the **In**tegrated **M**ulti **O**mic **O**pen **S**ource **E**nvironment. |
|
|
12 |
It is a collection of tools for the analysis of omic data. |
|
|
13 |
|
|
|
14 |
InMoose is developed and maintained by <img src="docs/source/epigenelogo.png" width="20"> [Epigene Labs](https://www.epigenelabs.com/). |
|
|
15 |
|
|
|
16 |
# Installation |
|
|
17 |
|
|
|
18 |
You can install InMoose directly with: |
|
|
19 |
|
|
|
20 |
``` |
|
|
21 |
pip install inmoose |
|
|
22 |
``` |
|
|
23 |
|
|
|
24 |
# Documentation |
|
|
25 |
|
|
|
26 |
Documentation is hosted on [readthedocs.org](https://inmoose.readthedocs.io/en/latest/). |
|
|
27 |
|
|
|
28 |
# Citing |
|
|
29 |
|
|
|
30 |
Depending on the features you use, you may cite one of the following papers: |
|
|
31 |
- Behdenna A, Colange M, Haziza J, Gema A, Appé G, Azencot CA and Nordor A. (2023) pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods. BMC Bioinformatics 7;24(1):459. https://doi.org/10.1186/s12859-023-05578-5. |
|
|
32 |
- Colange M, Appé G, Meunier L, Weill S, Nordor A, Behdenna A. (2024) |
|
|
33 |
Differential Expression Analysis with InMoose, the Integrated Multi-Omic Open-Source Environment in Python. BioRxiv. https://doi.org/XXX |
|
|
34 |
|
|
|
35 |
# Batch Effect Correction |
|
|
36 |
|
|
|
37 |
InMoose provides features to correct technical biases, also called batch |
|
|
38 |
effects, in transcriptomic data: |
|
|
39 |
- for microarray data, InMoose supersedes |
|
|
40 |
[pyCombat](https://github.com/epigenelabs/pycombat/), a Python3 implementation |
|
|
41 |
of [ComBat](https://doi.org/10.1093/biostatistics/kxj037), one of the most |
|
|
42 |
widely used tool for batch effect correction on microarray data. |
|
|
43 |
- for RNASeq data, InMoose features a port to Python3 of |
|
|
44 |
[ComBat-Seq](https://doi.org/10.1093/nargab/lqaa078), one of the most widely |
|
|
45 |
used tool for batch effect correction on RNASeq data. |
|
|
46 |
|
|
|
47 |
To use these functions, simply import them and call them with default |
|
|
48 |
parameters: |
|
|
49 |
```python |
|
|
50 |
from inmoose.pycombat import pycombat_norm, pycombat_seq |
|
|
51 |
|
|
|
52 |
microarray_corrected = pycombat_norm(microarray_data, microarray_batches) |
|
|
53 |
rnaseq_corrected = pycombat_seq(rnaseq_data, rnaseq_batches) |
|
|
54 |
``` |
|
|
55 |
|
|
|
56 |
* `microarray_data`, `rnaseq_data`: the expression matrices, containing the |
|
|
57 |
information about the gene expression (rows) for each sample (columns). |
|
|
58 |
* `microarray_batches`, `rnaseq_batches`: list of batch indices, describing the |
|
|
59 |
batch for each sample. The list of batches should contain as many elements as |
|
|
60 |
the number of samples in the expression matrix. |
|
|
61 |
|
|
|
62 |
|
|
|
63 |
# Cohort QC |
|
|
64 |
InMoose provides classes `CohortMetric` and `QCReport` to help to perform quality control (QC) on cohort datasets after batch effect correction. |
|
|
65 |
|
|
|
66 |
`CohortMetric`: This class handles the analysis and provides methods for performing quality control on cohort datasets. |
|
|
67 |
|
|
|
68 |
**Description** |
|
|
69 |
The `CohortMetric` class performs a range of quality control analyses, including: |
|
|
70 |
- Principal Component Analysis (PCA) to assess data variation. |
|
|
71 |
- Comparison of sample distributions across different datasets or batches. |
|
|
72 |
- Quantification of the effect of batch correction. |
|
|
73 |
- Silhouette Score calculation to assess how well batches are separated. |
|
|
74 |
- Entropy calculation to evaluate the mixing of samples from different batches. |
|
|
75 |
|
|
|
76 |
**Usage Example** |
|
|
77 |
```python |
|
|
78 |
from inmoose.cohort_qc.cohort_metric import CohortMetric |
|
|
79 |
|
|
|
80 |
cohort_quality_control = CohortMetric( |
|
|
81 |
clinical_df=clinical_data, |
|
|
82 |
batch_column="batch", |
|
|
83 |
data_expression_df=gene_expression_after_correction, |
|
|
84 |
data_expression_df_before=gene_expression_before_correction, |
|
|
85 |
covariates=["biopsy_site", "sample_type"] |
|
|
86 |
) |
|
|
87 |
``` |
|
|
88 |
|
|
|
89 |
`QCReport`: This class takes a CohortMetric argument, and generates an HTML report summarizing the QC results. |
|
|
90 |
|
|
|
91 |
**Description** |
|
|
92 |
The `QCReport` class extends `CohortMetric` and generates a comprehensive HTML report based on the quality control analysis. It includes visualizations and summaries of PCA, batch correction, Silhouette Scores, entropy, and more. |
|
|
93 |
|
|
|
94 |
**Usage Example** |
|
|
95 |
```python |
|
|
96 |
from inmoose.cohort_qc.qc_report import QCReport |
|
|
97 |
|
|
|
98 |
# Generate and save the QC report |
|
|
99 |
qc_report = QCReport(cohort_quality_control) |
|
|
100 |
qc_report.save_html_report_local(output_path='reports') |
|
|
101 |
``` |
|
|
102 |
|
|
|
103 |
# Differential Expression Analysis |
|
|
104 |
|
|
|
105 |
InMoose provides features to analyse diffentially expressed genes in bulk |
|
|
106 |
transcriptomic data: |
|
|
107 |
- for microarray data, InMoose features a port of |
|
|
108 |
[limma](https://doi.org/10.1093/nar/gkv007), the *de facto* standard tool |
|
|
109 |
for differential expression analysis on microarray data. |
|
|
110 |
- for RNASeq data, InMoose features a ports to Python3 of |
|
|
111 |
[edgeR](https://doi.org/10.12688/f1000research.8987.2) and |
|
|
112 |
[DESeq2](https://doi.org/10.1186/s13059-014-0550-8), two of the most widely |
|
|
113 |
used tools for differential expression analysis on RNASeq data. |
|
|
114 |
|
|
|
115 |
See the dedicated sections of the |
|
|
116 |
[documentation](https://inmoose.readthedocs.io/en/latest/). |
|
|
117 |
|
|
|
118 |
# Consensus clustering |
|
|
119 |
InMoose provides features to compute consensus clustering, a resampling based algorithm compatible with any clustering algorithms which class implementation is instantiated with parameter `n_clusters`, and possess a `fit_predict` method, which is invoked on data. |
|
|
120 |
Consensus clustering helps determining the best number of clusters to use and output confidence metrics and plots. |
|
|
121 |
|
|
|
122 |
|
|
|
123 |
To use these functions, import the consensusClustering class and a clustering algorithm class: |
|
|
124 |
```python |
|
|
125 |
from inmoose.consensus_clustering.consensus_clustering import consensusClustering |
|
|
126 |
from sklearn.cluster import AgglomerativeClustering |
|
|
127 |
|
|
|
128 |
CC = consensusClustering(AgglomerativeClustering) |
|
|
129 |
CC.compute_consensus_clustering(numpy_ndarray) |
|
|
130 |
``` |
|
|
131 |
|
|
|
132 |
# How to contribute |
|
|
133 |
|
|
|
134 |
Please refer to [CONTRIBUTING.md](https://github.com/epigenelabs/inmoose/blob/master/CONTRIBUTING.md) to learn more about the contribution guidelines. |
|
|
135 |
|