a/README.md b/README.md
1
<img src="docs/source/inmoose.png" width="600">
1
2
3
[![pypi version](https://img.shields.io/pypi/v/inmoose)](https://pypi.org/project/inmoose)
2
[![pypi version](https://img.shields.io/pypi/v/inmoose)](https://pypi.org/project/inmoose)
4
[![pypiDownloads](https://static.pepy.tech/badge/inmoose)](https://pepy.tech/project/inmoose)
3
[![pypiDownloads](https://static.pepy.tech/badge/inmoose)](https://pepy.tech/project/inmoose)
5
[![coverage](https://img.shields.io/coverallsCoverage/github/epigenelabs/inmoose.svg)](https://coveralls.io/github/epigenelabs/inmoose)
4
[![coverage](https://img.shields.io/coverallsCoverage/github/epigenelabs/inmoose.svg)](https://coveralls.io/github/epigenelabs/inmoose)
6
[![Documentation Status](https://readthedocs.org/projects/inmoose/badge/?version=latest)](https://inmoose.readthedocs.io/en/latest/?badge=latest)
5
[![Documentation Status](https://readthedocs.org/projects/inmoose/badge/?version=latest)](https://inmoose.readthedocs.io/en/latest/?badge=latest)
7
[![license](https://img.shields.io/pypi/l/inmoose)](LICENSE)
6
[![license](https://img.shields.io/pypi/l/inmoose)](LICENSE)
8
7
9
# InMoose
8
# InMoose
10
9
11
InMoose is the **In**tegrated **M**ulti **O**mic **O**pen **S**ource **E**nvironment.
10
InMoose is the **In**tegrated **M**ulti **O**mic **O**pen **S**ource **E**nvironment.
12
It is a collection of tools for the analysis of omic data.
11
It is a collection of tools for the analysis of omic data.
13
12
14
InMoose is developed and maintained by <img src="docs/source/epigenelogo.png" width="20"> [Epigene Labs](https://www.epigenelabs.com/).
13
InMoose is developed and maintained by <img src="docs/source/epigenelogo.png" width="20"> [Epigene Labs](https://www.epigenelabs.com/).
15
14
16
# Installation
15
# Installation
17
16
18
You can install InMoose directly with:
17
You can install InMoose directly with:
19
18
20
```
19
```
21
pip install inmoose
20
pip install inmoose
22
```
21
```
23
22
24
# Documentation
23
# Documentation
25
24
26
Documentation is hosted on [readthedocs.org](https://inmoose.readthedocs.io/en/latest/).
25
Documentation is hosted on [readthedocs.org](https://inmoose.readthedocs.io/en/latest/).
27
26
28
# Citing
27
# Citing
29
28
30
Depending on the features you use, you may cite one of the following papers:
29
Depending on the features you use, you may cite one of the following papers:
31
- Behdenna A, Colange M, Haziza J, Gema A, Appé G, Azencot CA and Nordor A. (2023) pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods. BMC Bioinformatics 7;24(1):459. https://doi.org/10.1186/s12859-023-05578-5.
30
- Behdenna A, Colange M, Haziza J, Gema A, Appé G, Azencot CA and Nordor A. (2023) pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods. BMC Bioinformatics 7;24(1):459. https://doi.org/10.1186/s12859-023-05578-5.
32
- Colange M, Appé G, Meunier L, Weill S, Nordor A, Behdenna A. (2024)
31
- Colange M, Appé G, Meunier L, Weill S, Nordor A, Behdenna A. (2024)
33
  Differential Expression Analysis with InMoose, the Integrated Multi-Omic Open-Source Environment in Python. BioRxiv. https://doi.org/XXX
32
  Differential Expression Analysis with InMoose, the Integrated Multi-Omic Open-Source Environment in Python. BioRxiv. https://doi.org/XXX
34
33
35
# Batch Effect Correction
34
# Batch Effect Correction
36
35
37
InMoose provides features to correct technical biases, also called batch
36
InMoose provides features to correct technical biases, also called batch
38
effects, in transcriptomic data:
37
effects, in transcriptomic data:
39
- for microarray data, InMoose supersedes
38
- for microarray data, InMoose supersedes
40
  [pyCombat](https://github.com/epigenelabs/pycombat/), a Python3 implementation
39
  [pyCombat](https://github.com/epigenelabs/pycombat/), a Python3 implementation
41
  of [ComBat](https://doi.org/10.1093/biostatistics/kxj037), one of the most
40
  of [ComBat](https://doi.org/10.1093/biostatistics/kxj037), one of the most
42
  widely used tool for batch effect correction on microarray data.
41
  widely used tool for batch effect correction on microarray data.
43
- for RNASeq data, InMoose features a port to Python3 of
42
- for RNASeq data, InMoose features a port to Python3 of
44
  [ComBat-Seq](https://doi.org/10.1093/nargab/lqaa078), one of the most widely
43
  [ComBat-Seq](https://doi.org/10.1093/nargab/lqaa078), one of the most widely
45
  used tool for batch effect correction on RNASeq data.
44
  used tool for batch effect correction on RNASeq data.
46
45
47
To use these functions, simply import them and call them with default
46
To use these functions, simply import them and call them with default
48
parameters:
47
parameters:
49
```python
48
```python
50
from inmoose.pycombat import pycombat_norm, pycombat_seq
49
from inmoose.pycombat import pycombat_norm, pycombat_seq
51
50
52
microarray_corrected = pycombat_norm(microarray_data, microarray_batches)
51
microarray_corrected = pycombat_norm(microarray_data, microarray_batches)
53
rnaseq_corrected = pycombat_seq(rnaseq_data, rnaseq_batches)
52
rnaseq_corrected = pycombat_seq(rnaseq_data, rnaseq_batches)
54
```
53
```
55
54
56
* `microarray_data`, `rnaseq_data`: the expression matrices, containing the
55
* `microarray_data`, `rnaseq_data`: the expression matrices, containing the
57
  information about the gene expression (rows) for each sample (columns).
56
  information about the gene expression (rows) for each sample (columns).
58
* `microarray_batches`, `rnaseq_batches`: list of batch indices, describing the
57
* `microarray_batches`, `rnaseq_batches`: list of batch indices, describing the
59
  batch for each sample. The list of batches should contain as many elements as
58
  batch for each sample. The list of batches should contain as many elements as
60
  the number of samples in the expression matrix.
59
  the number of samples in the expression matrix.
61
60
62
61
63
# Cohort QC
62
# Cohort QC
64
InMoose provides classes `CohortMetric` and `QCReport` to help to perform quality control (QC) on cohort datasets after batch effect correction.
63
InMoose provides classes `CohortMetric` and `QCReport` to help to perform quality control (QC) on cohort datasets after batch effect correction.
65
64
66
`CohortMetric`: This class handles the analysis and provides methods for performing quality control on cohort datasets.
65
`CohortMetric`: This class handles the analysis and provides methods for performing quality control on cohort datasets.
67
66
68
**Description**
67
**Description**
69
The `CohortMetric` class performs a range of quality control analyses, including:
68
The `CohortMetric` class performs a range of quality control analyses, including:
70
- Principal Component Analysis (PCA) to assess data variation.
69
- Principal Component Analysis (PCA) to assess data variation.
71
- Comparison of sample distributions across different datasets or batches.
70
- Comparison of sample distributions across different datasets or batches.
72
- Quantification of the effect of batch correction.
71
- Quantification of the effect of batch correction.
73
- Silhouette Score calculation to assess how well batches are separated.
72
- Silhouette Score calculation to assess how well batches are separated.
74
- Entropy calculation to evaluate the mixing of samples from different batches.
73
- Entropy calculation to evaluate the mixing of samples from different batches.
75
74
76
**Usage Example**
75
**Usage Example**
77
```python
76
```python
78
from inmoose.cohort_qc.cohort_metric import CohortMetric
77
from inmoose.cohort_qc.cohort_metric import CohortMetric
79
78
80
cohort_quality_control = CohortMetric(
79
cohort_quality_control = CohortMetric(
81
    clinical_df=clinical_data,
80
    clinical_df=clinical_data,
82
    batch_column="batch",
81
    batch_column="batch",
83
    data_expression_df=gene_expression_after_correction,
82
    data_expression_df=gene_expression_after_correction,
84
    data_expression_df_before=gene_expression_before_correction,
83
    data_expression_df_before=gene_expression_before_correction,
85
    covariates=["biopsy_site", "sample_type"]
84
    covariates=["biopsy_site", "sample_type"]
86
)
85
)
87
```
86
```
88
87
89
`QCReport`: This class takes a CohortMetric argument, and generates an HTML report summarizing the QC results.
88
`QCReport`: This class takes a CohortMetric argument, and generates an HTML report summarizing the QC results.
90
89
91
**Description**
90
**Description**
92
The `QCReport` class extends `CohortMetric` and generates a comprehensive HTML report based on the quality control analysis. It includes visualizations and summaries of PCA, batch correction, Silhouette Scores, entropy, and more.
91
The `QCReport` class extends `CohortMetric` and generates a comprehensive HTML report based on the quality control analysis. It includes visualizations and summaries of PCA, batch correction, Silhouette Scores, entropy, and more.
93
92
94
**Usage Example**
93
**Usage Example**
95
```python
94
```python
96
from inmoose.cohort_qc.qc_report import QCReport
95
from inmoose.cohort_qc.qc_report import QCReport
97
96
98
# Generate and save the QC report
97
# Generate and save the QC report
99
qc_report = QCReport(cohort_quality_control)
98
qc_report = QCReport(cohort_quality_control)
100
qc_report.save_html_report_local(output_path='reports')
99
qc_report.save_html_report_local(output_path='reports')
101
```
100
```
102
101
103
# Differential Expression Analysis
102
# Differential Expression Analysis
104
103
105
InMoose provides features to analyse diffentially expressed genes in bulk
104
InMoose provides features to analyse diffentially expressed genes in bulk
106
transcriptomic data:
105
transcriptomic data:
107
- for microarray data, InMoose features a port of
106
- for microarray data, InMoose features a port of
108
  [limma](https://doi.org/10.1093/nar/gkv007), the *de facto* standard tool
107
  [limma](https://doi.org/10.1093/nar/gkv007), the *de facto* standard tool
109
  for differential expression analysis on microarray data.
108
  for differential expression analysis on microarray data.
110
- for RNASeq data, InMoose features a ports to Python3 of
109
- for RNASeq data, InMoose features a ports to Python3 of
111
  [edgeR](https://doi.org/10.12688/f1000research.8987.2) and
110
  [edgeR](https://doi.org/10.12688/f1000research.8987.2) and
112
  [DESeq2](https://doi.org/10.1186/s13059-014-0550-8), two of the most widely
111
  [DESeq2](https://doi.org/10.1186/s13059-014-0550-8), two of the most widely
113
  used tools for differential expression analysis on RNASeq data.
112
  used tools for differential expression analysis on RNASeq data.
114
113
115
See the dedicated sections of the
114
See the dedicated sections of the
116
[documentation](https://inmoose.readthedocs.io/en/latest/).
115
[documentation](https://inmoose.readthedocs.io/en/latest/).
117
116
118
# Consensus clustering
117
# Consensus clustering
119
InMoose provides features to compute consensus clustering, a resampling based algorithm compatible with any clustering algorithms which class implementation is instantiated with parameter `n_clusters`, and possess a `fit_predict` method, which is invoked on data.
118
InMoose provides features to compute consensus clustering, a resampling based algorithm compatible with any clustering algorithms which class implementation is instantiated with parameter `n_clusters`, and possess a `fit_predict` method, which is invoked on data.
120
Consensus clustering helps determining the best number of clusters to use and output confidence metrics and plots.
119
Consensus clustering helps determining the best number of clusters to use and output confidence metrics and plots.
121
120
122
121
123
To use these functions, import the consensusClustering class and a clustering algorithm class:
122
To use these functions, import the consensusClustering class and a clustering algorithm class:
124
```python
123
```python
125
from inmoose.consensus_clustering.consensus_clustering import consensusClustering
124
from inmoose.consensus_clustering.consensus_clustering import consensusClustering
126
from sklearn.cluster import AgglomerativeClustering
125
from sklearn.cluster import AgglomerativeClustering
127
126
128
CC = consensusClustering(AgglomerativeClustering)
127
CC = consensusClustering(AgglomerativeClustering)
129
CC.compute_consensus_clustering(numpy_ndarray)
128
CC.compute_consensus_clustering(numpy_ndarray)
130
```
129
```
131
130
132
# How to contribute
131
# How to contribute
133
132
134
Please refer to [CONTRIBUTING.md](https://github.com/epigenelabs/inmoose/blob/master/CONTRIBUTING.md) to learn more about the contribution guidelines.
133
Please refer to [CONTRIBUTING.md](https://github.com/epigenelabs/inmoose/blob/master/CONTRIBUTING.md) to learn more about the contribution guidelines.
135
134