|
a |
|
b/README.md |
|
|
1 |
# Mowgli: Multi Omics Wasserstein inteGrative anaLysIs |
|
|
2 |
[](https://github.com/gjhuizing/Mowgli/actions/workflows/main.yml) |
|
|
3 |
[](https://codecov.io/gh/cantinilab/Mowgli) |
|
|
4 |
[](https://mowgli.readthedocs.io/en/latest/?badge=latest) |
|
|
5 |
[](https://img.shields.io/pypi/v/mowgli?color=blue) |
|
|
6 |
[](https://github.com/psf/black) |
|
|
7 |
[](https://zenodo.org/badge/latestdoi/391909874) |
|
|
8 |
|
|
|
9 |
Mowgli is a novel method for the integration of paired multi-omics data with any type and number of omics, combining integrative Nonnegative Matrix Factorization and Optimal Transport. [Read the paper!](https://www.nature.com/articles/s41467-023-43019-2) |
|
|
10 |
|
|
|
11 |
 |
|
|
12 |
|
|
|
13 |
## Install the package |
|
|
14 |
|
|
|
15 |
Mowgli is implemented as a Python package seamlessly integrated within the scverse ecosystem, in particular Muon and Scanpy. |
|
|
16 |
|
|
|
17 |
### via PyPI (recommended) |
|
|
18 |
|
|
|
19 |
On all operating systems, the easiest way to install Mowgli is via PyPI. Installation should typically take a minute and is continuously tested with Python 3.10 on an Ubuntu virtual machine. |
|
|
20 |
|
|
|
21 |
```bash |
|
|
22 |
pip install mowgli |
|
|
23 |
``` |
|
|
24 |
|
|
|
25 |
### via GitHub (development version) |
|
|
26 |
|
|
|
27 |
```bash |
|
|
28 |
git clone git@github.com:cantinilab/Mowgli.git |
|
|
29 |
pip install ./Mowgli/ |
|
|
30 |
``` |
|
|
31 |
|
|
|
32 |
### Test your installation (optional) |
|
|
33 |
|
|
|
34 |
```bash |
|
|
35 |
pytest . |
|
|
36 |
``` |
|
|
37 |
|
|
|
38 |
## Getting started |
|
|
39 |
|
|
|
40 |
Mowgli takes as an input a Muon object and populates its `obsm` and `uns` fields with the embeddings and dictionaries, respectively. Visit [mowgli.rtfd.io](https://mowgli.rtfd.io/) for more documentation and tutorials. |
|
|
41 |
|
|
|
42 |
You may download a preprocessed 10X Multiome demo dataset [here](https://figshare.com/s/4c8e72cbb188d8e1cce8). |
|
|
43 |
|
|
|
44 |
A GPU is not required for small datasets, but is strongly recommended above 1,000 cells. On CPU, the [cell lines demo](https://mowgli.readthedocs.io/en/latest/vignettes/Liu%20cell%20lines.html) (206 cells) should run in under 5 minutes and the [PBMC demo](https://mowgli.readthedocs.io/en/latest/vignettes/PBMC.html) (500 cells) should run in under 10 minutes (tested on a Ubuntu 20.04 machine with an 11th gen i7 processor). |
|
|
45 |
|
|
|
46 |
```python |
|
|
47 |
import mowgli |
|
|
48 |
import mudata as md |
|
|
49 |
import scanpy as sc |
|
|
50 |
|
|
|
51 |
# Load data into a Muon object. |
|
|
52 |
mdata = md.read_h5mu("my_data.h5mu") |
|
|
53 |
|
|
|
54 |
# Initialize and train the model. |
|
|
55 |
model = mowgli.models.MowgliModel(latent_dim=15) |
|
|
56 |
model.train(mdata) |
|
|
57 |
|
|
|
58 |
# Visualize the embedding with UMAP. |
|
|
59 |
sc.pp.neighbors(mdata, use_rep="W_OT") |
|
|
60 |
sc.tl.umap(mdata) |
|
|
61 |
sc.pl.umap(mdata) |
|
|
62 |
``` |
|
|
63 |
|
|
|
64 |
## Publication |
|
|
65 |
|
|
|
66 |
```bibtex |
|
|
67 |
@article{huizing2023paired, |
|
|
68 |
title={Paired single-cell multi-omics data integration with Mowgli}, |
|
|
69 |
author={Huizing, Geert-Jan and Deutschmann, Ina Maria and Peyr{\'e}, Gabriel and Cantini, Laura}, |
|
|
70 |
journal={Nature Communications}, |
|
|
71 |
volume={14}, |
|
|
72 |
number={1}, |
|
|
73 |
pages={7711}, |
|
|
74 |
year={2023}, |
|
|
75 |
publisher={Nature Publishing Group UK London} |
|
|
76 |
} |
|
|
77 |
``` |
|
|
78 |
|
|
|
79 |
If you're looking for the repository with code to reproduce the experiments in our preprint, [here is is!](https://github.com/cantinilab/mowgli_reproducibility) |