--- a/README.md +++ b/README.md @@ -1,135 +1,134 @@ -<img src="docs/source/inmoose.png" width="600"> - -[](https://pypi.org/project/inmoose) -[](https://pepy.tech/project/inmoose) -[](https://coveralls.io/github/epigenelabs/inmoose) -[](https://inmoose.readthedocs.io/en/latest/?badge=latest) -[](LICENSE) - -# InMoose - -InMoose is the **In**tegrated **M**ulti **O**mic **O**pen **S**ource **E**nvironment. -It is a collection of tools for the analysis of omic data. - -InMoose is developed and maintained by <img src="docs/source/epigenelogo.png" width="20"> [Epigene Labs](https://www.epigenelabs.com/). - -# Installation - -You can install InMoose directly with: - -``` -pip install inmoose -``` - -# Documentation - -Documentation is hosted on [readthedocs.org](https://inmoose.readthedocs.io/en/latest/). - -# Citing - -Depending on the features you use, you may cite one of the following papers: -- Behdenna A, Colange M, Haziza J, Gema A, Appé G, Azencot CA and Nordor A. (2023) pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods. BMC Bioinformatics 7;24(1):459. https://doi.org/10.1186/s12859-023-05578-5. -- Colange M, Appé G, Meunier L, Weill S, Nordor A, Behdenna A. (2024) - Differential Expression Analysis with InMoose, the Integrated Multi-Omic Open-Source Environment in Python. BioRxiv. https://doi.org/XXX - -# Batch Effect Correction - -InMoose provides features to correct technical biases, also called batch -effects, in transcriptomic data: -- for microarray data, InMoose supersedes - [pyCombat](https://github.com/epigenelabs/pycombat/), a Python3 implementation - of [ComBat](https://doi.org/10.1093/biostatistics/kxj037), one of the most - widely used tool for batch effect correction on microarray data. -- for RNASeq data, InMoose features a port to Python3 of - [ComBat-Seq](https://doi.org/10.1093/nargab/lqaa078), one of the most widely - used tool for batch effect correction on RNASeq data. - -To use these functions, simply import them and call them with default -parameters: -```python -from inmoose.pycombat import pycombat_norm, pycombat_seq - -microarray_corrected = pycombat_norm(microarray_data, microarray_batches) -rnaseq_corrected = pycombat_seq(rnaseq_data, rnaseq_batches) -``` - -* `microarray_data`, `rnaseq_data`: the expression matrices, containing the - information about the gene expression (rows) for each sample (columns). -* `microarray_batches`, `rnaseq_batches`: list of batch indices, describing the - batch for each sample. The list of batches should contain as many elements as - the number of samples in the expression matrix. - - -# Cohort QC -InMoose provides classes `CohortMetric` and `QCReport` to help to perform quality control (QC) on cohort datasets after batch effect correction. - -`CohortMetric`: This class handles the analysis and provides methods for performing quality control on cohort datasets. - -**Description** -The `CohortMetric` class performs a range of quality control analyses, including: -- Principal Component Analysis (PCA) to assess data variation. -- Comparison of sample distributions across different datasets or batches. -- Quantification of the effect of batch correction. -- Silhouette Score calculation to assess how well batches are separated. -- Entropy calculation to evaluate the mixing of samples from different batches. - -**Usage Example** -```python -from inmoose.cohort_qc.cohort_metric import CohortMetric - -cohort_quality_control = CohortMetric( - clinical_df=clinical_data, - batch_column="batch", - data_expression_df=gene_expression_after_correction, - data_expression_df_before=gene_expression_before_correction, - covariates=["biopsy_site", "sample_type"] -) -``` - -`QCReport`: This class takes a CohortMetric argument, and generates an HTML report summarizing the QC results. - -**Description** -The `QCReport` class extends `CohortMetric` and generates a comprehensive HTML report based on the quality control analysis. It includes visualizations and summaries of PCA, batch correction, Silhouette Scores, entropy, and more. - -**Usage Example** -```python -from inmoose.cohort_qc.qc_report import QCReport - -# Generate and save the QC report -qc_report = QCReport(cohort_quality_control) -qc_report.save_html_report_local(output_path='reports') -``` - -# Differential Expression Analysis - -InMoose provides features to analyse diffentially expressed genes in bulk -transcriptomic data: -- for microarray data, InMoose features a port of - [limma](https://doi.org/10.1093/nar/gkv007), the *de facto* standard tool - for differential expression analysis on microarray data. -- for RNASeq data, InMoose features a ports to Python3 of - [edgeR](https://doi.org/10.12688/f1000research.8987.2) and - [DESeq2](https://doi.org/10.1186/s13059-014-0550-8), two of the most widely - used tools for differential expression analysis on RNASeq data. - -See the dedicated sections of the -[documentation](https://inmoose.readthedocs.io/en/latest/). - -# Consensus clustering -InMoose provides features to compute consensus clustering, a resampling based algorithm compatible with any clustering algorithms which class implementation is instantiated with parameter `n_clusters`, and possess a `fit_predict` method, which is invoked on data. -Consensus clustering helps determining the best number of clusters to use and output confidence metrics and plots. - - -To use these functions, import the consensusClustering class and a clustering algorithm class: -```python -from inmoose.consensus_clustering.consensus_clustering import consensusClustering -from sklearn.cluster import AgglomerativeClustering - -CC = consensusClustering(AgglomerativeClustering) -CC.compute_consensus_clustering(numpy_ndarray) -``` - -# How to contribute - -Please refer to [CONTRIBUTING.md](https://github.com/epigenelabs/inmoose/blob/master/CONTRIBUTING.md) to learn more about the contribution guidelines. - + +[](https://pypi.org/project/inmoose) +[](https://pepy.tech/project/inmoose) +[](https://coveralls.io/github/epigenelabs/inmoose) +[](https://inmoose.readthedocs.io/en/latest/?badge=latest) +[](LICENSE) + +# InMoose + +InMoose is the **In**tegrated **M**ulti **O**mic **O**pen **S**ource **E**nvironment. +It is a collection of tools for the analysis of omic data. + +InMoose is developed and maintained by <img src="docs/source/epigenelogo.png" width="20"> [Epigene Labs](https://www.epigenelabs.com/). + +# Installation + +You can install InMoose directly with: + +``` +pip install inmoose +``` + +# Documentation + +Documentation is hosted on [readthedocs.org](https://inmoose.readthedocs.io/en/latest/). + +# Citing + +Depending on the features you use, you may cite one of the following papers: +- Behdenna A, Colange M, Haziza J, Gema A, Appé G, Azencot CA and Nordor A. (2023) pyComBat, a Python tool for batch effects correction in high-throughput molecular data using empirical Bayes methods. BMC Bioinformatics 7;24(1):459. https://doi.org/10.1186/s12859-023-05578-5. +- Colange M, Appé G, Meunier L, Weill S, Nordor A, Behdenna A. (2024) + Differential Expression Analysis with InMoose, the Integrated Multi-Omic Open-Source Environment in Python. BioRxiv. https://doi.org/XXX + +# Batch Effect Correction + +InMoose provides features to correct technical biases, also called batch +effects, in transcriptomic data: +- for microarray data, InMoose supersedes + [pyCombat](https://github.com/epigenelabs/pycombat/), a Python3 implementation + of [ComBat](https://doi.org/10.1093/biostatistics/kxj037), one of the most + widely used tool for batch effect correction on microarray data. +- for RNASeq data, InMoose features a port to Python3 of + [ComBat-Seq](https://doi.org/10.1093/nargab/lqaa078), one of the most widely + used tool for batch effect correction on RNASeq data. + +To use these functions, simply import them and call them with default +parameters: +```python +from inmoose.pycombat import pycombat_norm, pycombat_seq + +microarray_corrected = pycombat_norm(microarray_data, microarray_batches) +rnaseq_corrected = pycombat_seq(rnaseq_data, rnaseq_batches) +``` + +* `microarray_data`, `rnaseq_data`: the expression matrices, containing the + information about the gene expression (rows) for each sample (columns). +* `microarray_batches`, `rnaseq_batches`: list of batch indices, describing the + batch for each sample. The list of batches should contain as many elements as + the number of samples in the expression matrix. + + +# Cohort QC +InMoose provides classes `CohortMetric` and `QCReport` to help to perform quality control (QC) on cohort datasets after batch effect correction. + +`CohortMetric`: This class handles the analysis and provides methods for performing quality control on cohort datasets. + +**Description** +The `CohortMetric` class performs a range of quality control analyses, including: +- Principal Component Analysis (PCA) to assess data variation. +- Comparison of sample distributions across different datasets or batches. +- Quantification of the effect of batch correction. +- Silhouette Score calculation to assess how well batches are separated. +- Entropy calculation to evaluate the mixing of samples from different batches. + +**Usage Example** +```python +from inmoose.cohort_qc.cohort_metric import CohortMetric + +cohort_quality_control = CohortMetric( + clinical_df=clinical_data, + batch_column="batch", + data_expression_df=gene_expression_after_correction, + data_expression_df_before=gene_expression_before_correction, + covariates=["biopsy_site", "sample_type"] +) +``` + +`QCReport`: This class takes a CohortMetric argument, and generates an HTML report summarizing the QC results. + +**Description** +The `QCReport` class extends `CohortMetric` and generates a comprehensive HTML report based on the quality control analysis. It includes visualizations and summaries of PCA, batch correction, Silhouette Scores, entropy, and more. + +**Usage Example** +```python +from inmoose.cohort_qc.qc_report import QCReport + +# Generate and save the QC report +qc_report = QCReport(cohort_quality_control) +qc_report.save_html_report_local(output_path='reports') +``` + +# Differential Expression Analysis + +InMoose provides features to analyse diffentially expressed genes in bulk +transcriptomic data: +- for microarray data, InMoose features a port of + [limma](https://doi.org/10.1093/nar/gkv007), the *de facto* standard tool + for differential expression analysis on microarray data. +- for RNASeq data, InMoose features a ports to Python3 of + [edgeR](https://doi.org/10.12688/f1000research.8987.2) and + [DESeq2](https://doi.org/10.1186/s13059-014-0550-8), two of the most widely + used tools for differential expression analysis on RNASeq data. + +See the dedicated sections of the +[documentation](https://inmoose.readthedocs.io/en/latest/). + +# Consensus clustering +InMoose provides features to compute consensus clustering, a resampling based algorithm compatible with any clustering algorithms which class implementation is instantiated with parameter `n_clusters`, and possess a `fit_predict` method, which is invoked on data. +Consensus clustering helps determining the best number of clusters to use and output confidence metrics and plots. + + +To use these functions, import the consensusClustering class and a clustering algorithm class: +```python +from inmoose.consensus_clustering.consensus_clustering import consensusClustering +from sklearn.cluster import AgglomerativeClustering + +CC = consensusClustering(AgglomerativeClustering) +CC.compute_consensus_clustering(numpy_ndarray) +``` + +# How to contribute + +Please refer to [CONTRIBUTING.md](https://github.com/epigenelabs/inmoose/blob/master/CONTRIBUTING.md) to learn more about the contribution guidelines. +