|
a/README.md |
|
b/README.md |
1 |
<img src="docs/source/inmoose.png" width="600"> |
1 |
|
2 |
|
|
|
3 |
[](https://pypi.org/project/inmoose) |
2 |
[](https://pypi.org/project/inmoose)
|
4 |
[](https://pepy.tech/project/inmoose) |
3 |
[](https://pepy.tech/project/inmoose)
|
5 |
[](https://coveralls.io/github/epigenelabs/inmoose) |
4 |
[](https://coveralls.io/github/epigenelabs/inmoose)
|
6 |
[](https://inmoose.readthedocs.io/en/latest/?badge=latest) |
5 |
[](https://inmoose.readthedocs.io/en/latest/?badge=latest)
|
7 |
[](LICENSE) |
6 |
[](LICENSE) |
8 |
|
7 |
|
9 |
# InMoose |
8 |
# InMoose |
10 |
|
9 |
|
11 |
InMoose is the **In**tegrated **M**ulti **O**mic **O**pen **S**ource **E**nvironment. |
10 |
InMoose is the **In**tegrated **M**ulti **O**mic **O**pen **S**ource **E**nvironment.
|
12 |
It is a collection of tools for the analysis of omic data. |
11 |
It is a collection of tools for the analysis of omic data. |
13 |
|
12 |
|
14 |
InMoose is developed and maintained by <img src="docs/source/epigenelogo.png" width="20"> [Epigene Labs](https://www.epigenelabs.com/). |
13 |
InMoose is developed and maintained by <img src="docs/source/epigenelogo.png" width="20"> [Epigene Labs](https://www.epigenelabs.com/). |
15 |
|
14 |
|
16 |
# Installation |
15 |
# Installation |
17 |
|
16 |
|
18 |
You can install InMoose directly with: |
17 |
You can install InMoose directly with: |
19 |
|
18 |
|
20 |
``` |
19 |
```
|
21 |
pip install inmoose |
20 |
pip install inmoose
|
22 |
``` |
21 |
``` |
23 |
|
22 |
|
24 |
# Documentation |
23 |
# Documentation |
25 |
|
24 |
|
26 |
Documentation is hosted on [readthedocs.org](https://inmoose.readthedocs.io/en/latest/). |
25 |
Documentation is hosted on [readthedocs.org](https://inmoose.readthedocs.io/en/latest/). |
27 |
|
26 |
|
28 |
# Citing |
27 |
# Citing |
29 |
|
28 |
|
30 |
Depending on the features you use, you may cite one of the following papers: |
29 |
Depending on the features you use, you may cite one of the following papers:
|
31 |
- Behdenna A, Colange M, Haziza J, Gema A, Appé G, Azencot CA and Nordor A. (2023) pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods. BMC Bioinformatics 7;24(1):459. https://doi.org/10.1186/s12859-023-05578-5. |
30 |
- Behdenna A, Colange M, Haziza J, Gema A, Appé G, Azencot CA and Nordor A. (2023) pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods. BMC Bioinformatics 7;24(1):459. https://doi.org/10.1186/s12859-023-05578-5.
|
32 |
- Colange M, Appé G, Meunier L, Weill S, Nordor A, Behdenna A. (2024) |
31 |
- Colange M, Appé G, Meunier L, Weill S, Nordor A, Behdenna A. (2024)
|
33 |
Differential Expression Analysis with InMoose, the Integrated Multi-Omic Open-Source Environment in Python. BioRxiv. https://doi.org/XXX |
32 |
Differential Expression Analysis with InMoose, the Integrated Multi-Omic Open-Source Environment in Python. BioRxiv. https://doi.org/XXX |
34 |
|
33 |
|
35 |
# Batch Effect Correction |
34 |
# Batch Effect Correction |
36 |
|
35 |
|
37 |
InMoose provides features to correct technical biases, also called batch |
36 |
InMoose provides features to correct technical biases, also called batch
|
38 |
effects, in transcriptomic data: |
37 |
effects, in transcriptomic data:
|
39 |
- for microarray data, InMoose supersedes |
38 |
- for microarray data, InMoose supersedes
|
40 |
[pyCombat](https://github.com/epigenelabs/pycombat/), a Python3 implementation |
39 |
[pyCombat](https://github.com/epigenelabs/pycombat/), a Python3 implementation
|
41 |
of [ComBat](https://doi.org/10.1093/biostatistics/kxj037), one of the most |
40 |
of [ComBat](https://doi.org/10.1093/biostatistics/kxj037), one of the most
|
42 |
widely used tool for batch effect correction on microarray data. |
41 |
widely used tool for batch effect correction on microarray data.
|
43 |
- for RNASeq data, InMoose features a port to Python3 of |
42 |
- for RNASeq data, InMoose features a port to Python3 of
|
44 |
[ComBat-Seq](https://doi.org/10.1093/nargab/lqaa078), one of the most widely |
43 |
[ComBat-Seq](https://doi.org/10.1093/nargab/lqaa078), one of the most widely
|
45 |
used tool for batch effect correction on RNASeq data. |
44 |
used tool for batch effect correction on RNASeq data. |
46 |
|
45 |
|
47 |
To use these functions, simply import them and call them with default |
46 |
To use these functions, simply import them and call them with default
|
48 |
parameters: |
47 |
parameters:
|
49 |
```python |
48 |
```python
|
50 |
from inmoose.pycombat import pycombat_norm, pycombat_seq |
49 |
from inmoose.pycombat import pycombat_norm, pycombat_seq |
51 |
|
50 |
|
52 |
microarray_corrected = pycombat_norm(microarray_data, microarray_batches) |
51 |
microarray_corrected = pycombat_norm(microarray_data, microarray_batches)
|
53 |
rnaseq_corrected = pycombat_seq(rnaseq_data, rnaseq_batches) |
52 |
rnaseq_corrected = pycombat_seq(rnaseq_data, rnaseq_batches)
|
54 |
``` |
53 |
``` |
55 |
|
54 |
|
56 |
* `microarray_data`, `rnaseq_data`: the expression matrices, containing the |
55 |
* `microarray_data`, `rnaseq_data`: the expression matrices, containing the
|
57 |
information about the gene expression (rows) for each sample (columns). |
56 |
information about the gene expression (rows) for each sample (columns).
|
58 |
* `microarray_batches`, `rnaseq_batches`: list of batch indices, describing the |
57 |
* `microarray_batches`, `rnaseq_batches`: list of batch indices, describing the
|
59 |
batch for each sample. The list of batches should contain as many elements as |
58 |
batch for each sample. The list of batches should contain as many elements as
|
60 |
the number of samples in the expression matrix. |
59 |
the number of samples in the expression matrix. |
61 |
|
60 |
|
62 |
|
61 |
|
63 |
# Cohort QC |
62 |
# Cohort QC
|
64 |
InMoose provides classes `CohortMetric` and `QCReport` to help to perform quality control (QC) on cohort datasets after batch effect correction. |
63 |
InMoose provides classes `CohortMetric` and `QCReport` to help to perform quality control (QC) on cohort datasets after batch effect correction. |
65 |
|
64 |
|
66 |
`CohortMetric`: This class handles the analysis and provides methods for performing quality control on cohort datasets. |
65 |
`CohortMetric`: This class handles the analysis and provides methods for performing quality control on cohort datasets. |
67 |
|
66 |
|
68 |
**Description** |
67 |
**Description**
|
69 |
The `CohortMetric` class performs a range of quality control analyses, including: |
68 |
The `CohortMetric` class performs a range of quality control analyses, including:
|
70 |
- Principal Component Analysis (PCA) to assess data variation. |
69 |
- Principal Component Analysis (PCA) to assess data variation.
|
71 |
- Comparison of sample distributions across different datasets or batches. |
70 |
- Comparison of sample distributions across different datasets or batches.
|
72 |
- Quantification of the effect of batch correction. |
71 |
- Quantification of the effect of batch correction.
|
73 |
- Silhouette Score calculation to assess how well batches are separated. |
72 |
- Silhouette Score calculation to assess how well batches are separated.
|
74 |
- Entropy calculation to evaluate the mixing of samples from different batches. |
73 |
- Entropy calculation to evaluate the mixing of samples from different batches. |
75 |
|
74 |
|
76 |
**Usage Example** |
75 |
**Usage Example**
|
77 |
```python |
76 |
```python
|
78 |
from inmoose.cohort_qc.cohort_metric import CohortMetric |
77 |
from inmoose.cohort_qc.cohort_metric import CohortMetric |
79 |
|
78 |
|
80 |
cohort_quality_control = CohortMetric( |
79 |
cohort_quality_control = CohortMetric(
|
81 |
clinical_df=clinical_data, |
80 |
clinical_df=clinical_data,
|
82 |
batch_column="batch", |
81 |
batch_column="batch",
|
83 |
data_expression_df=gene_expression_after_correction, |
82 |
data_expression_df=gene_expression_after_correction,
|
84 |
data_expression_df_before=gene_expression_before_correction, |
83 |
data_expression_df_before=gene_expression_before_correction,
|
85 |
covariates=["biopsy_site", "sample_type"] |
84 |
covariates=["biopsy_site", "sample_type"]
|
86 |
) |
85 |
)
|
87 |
``` |
86 |
``` |
88 |
|
87 |
|
89 |
`QCReport`: This class takes a CohortMetric argument, and generates an HTML report summarizing the QC results. |
88 |
`QCReport`: This class takes a CohortMetric argument, and generates an HTML report summarizing the QC results. |
90 |
|
89 |
|
91 |
**Description** |
90 |
**Description**
|
92 |
The `QCReport` class extends `CohortMetric` and generates a comprehensive HTML report based on the quality control analysis. It includes visualizations and summaries of PCA, batch correction, Silhouette Scores, entropy, and more. |
91 |
The `QCReport` class extends `CohortMetric` and generates a comprehensive HTML report based on the quality control analysis. It includes visualizations and summaries of PCA, batch correction, Silhouette Scores, entropy, and more. |
93 |
|
92 |
|
94 |
**Usage Example** |
93 |
**Usage Example**
|
95 |
```python |
94 |
```python
|
96 |
from inmoose.cohort_qc.qc_report import QCReport |
95 |
from inmoose.cohort_qc.qc_report import QCReport |
97 |
|
96 |
|
98 |
# Generate and save the QC report |
97 |
# Generate and save the QC report
|
99 |
qc_report = QCReport(cohort_quality_control) |
98 |
qc_report = QCReport(cohort_quality_control)
|
100 |
qc_report.save_html_report_local(output_path='reports') |
99 |
qc_report.save_html_report_local(output_path='reports')
|
101 |
``` |
100 |
``` |
102 |
|
101 |
|
103 |
# Differential Expression Analysis |
102 |
# Differential Expression Analysis |
104 |
|
103 |
|
105 |
InMoose provides features to analyse diffentially expressed genes in bulk |
104 |
InMoose provides features to analyse diffentially expressed genes in bulk
|
106 |
transcriptomic data: |
105 |
transcriptomic data:
|
107 |
- for microarray data, InMoose features a port of |
106 |
- for microarray data, InMoose features a port of
|
108 |
[limma](https://doi.org/10.1093/nar/gkv007), the *de facto* standard tool |
107 |
[limma](https://doi.org/10.1093/nar/gkv007), the *de facto* standard tool
|
109 |
for differential expression analysis on microarray data. |
108 |
for differential expression analysis on microarray data.
|
110 |
- for RNASeq data, InMoose features a ports to Python3 of |
109 |
- for RNASeq data, InMoose features a ports to Python3 of
|
111 |
[edgeR](https://doi.org/10.12688/f1000research.8987.2) and |
110 |
[edgeR](https://doi.org/10.12688/f1000research.8987.2) and
|
112 |
[DESeq2](https://doi.org/10.1186/s13059-014-0550-8), two of the most widely |
111 |
[DESeq2](https://doi.org/10.1186/s13059-014-0550-8), two of the most widely
|
113 |
used tools for differential expression analysis on RNASeq data. |
112 |
used tools for differential expression analysis on RNASeq data. |
114 |
|
113 |
|
115 |
See the dedicated sections of the |
114 |
See the dedicated sections of the
|
116 |
[documentation](https://inmoose.readthedocs.io/en/latest/). |
115 |
[documentation](https://inmoose.readthedocs.io/en/latest/). |
117 |
|
116 |
|
118 |
# Consensus clustering |
117 |
# Consensus clustering
|
119 |
InMoose provides features to compute consensus clustering, a resampling based algorithm compatible with any clustering algorithms which class implementation is instantiated with parameter `n_clusters`, and possess a `fit_predict` method, which is invoked on data. |
118 |
InMoose provides features to compute consensus clustering, a resampling based algorithm compatible with any clustering algorithms which class implementation is instantiated with parameter `n_clusters`, and possess a `fit_predict` method, which is invoked on data.
|
120 |
Consensus clustering helps determining the best number of clusters to use and output confidence metrics and plots. |
119 |
Consensus clustering helps determining the best number of clusters to use and output confidence metrics and plots. |
121 |
|
120 |
|
122 |
|
121 |
|
123 |
To use these functions, import the consensusClustering class and a clustering algorithm class: |
122 |
To use these functions, import the consensusClustering class and a clustering algorithm class:
|
124 |
```python |
123 |
```python
|
125 |
from inmoose.consensus_clustering.consensus_clustering import consensusClustering |
124 |
from inmoose.consensus_clustering.consensus_clustering import consensusClustering
|
126 |
from sklearn.cluster import AgglomerativeClustering |
125 |
from sklearn.cluster import AgglomerativeClustering |
127 |
|
126 |
|
128 |
CC = consensusClustering(AgglomerativeClustering) |
127 |
CC = consensusClustering(AgglomerativeClustering)
|
129 |
CC.compute_consensus_clustering(numpy_ndarray) |
128 |
CC.compute_consensus_clustering(numpy_ndarray)
|
130 |
``` |
129 |
``` |
131 |
|
130 |
|
132 |
# How to contribute |
131 |
# How to contribute |
133 |
|
132 |
|
134 |
Please refer to [CONTRIBUTING.md](https://github.com/epigenelabs/inmoose/blob/master/CONTRIBUTING.md) to learn more about the contribution guidelines. |
133 |
Please refer to [CONTRIBUTING.md](https://github.com/epigenelabs/inmoose/blob/master/CONTRIBUTING.md) to learn more about the contribution guidelines. |
135 |
|
134 |
|