[![GitHub Actions CI Status](https://github.com/nf-core/deepmodeloptim/actions/workflows/ci.yml/badge.svg)](https://github.com/nf-core/deepmodeloptim/actions/workflows/ci.yml)
[![GitHub Actions Linting Status](https://github.com/nf-core/deepmodeloptim/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/deepmodeloptim/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/deepmodeloptim/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)
[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)

[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-core/deepmodeloptim)

[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23deepmodeloptim-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/deepmodeloptim)[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core)[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)

## 📌 **Quick intro** check out this 👉🏻 [video](https://www.youtube.com/watch?v=dC5p_tXQpEs&list=PLPZ8WHdZGxmVKQga4KE15YVt95i-QXVvE&index=25)!

## Introduction

**nf-core/deepmodeloptim** is a bioinformatics end-to-end pipeline designed to facilitate the testing and development of deep learning models for genomics.

Deep learning model development in natural science is an empirical and costly process. Despite the existence of generic tools for the tuning of hyperparameters and the training of the models, the connection between these procedures and the impact coming from the data is often underlooked, or at least not easily automatized. Indeed, researchers must define a pre-processing pipeline, an architecture, find the best parameters for said architecture and iterate over this process, often manually.

Leveraging the power of Nextflow (polyglotism, container integration, scalable on the cloud), this pipeline will help users to 1) automatize the testing of the model, 2) gain useful insights with respect to the learning behaviour of the model, and hence 3) accelerate the development.

## Pipeline summary

It takes as input:

- A dataset
- A configuration file to describe the data pre-processing steps to be performed
- An user defined PyTorch model
- A configuration file describing the range of parameters for the PyTorch model

It then transforms the data according to all possible pre-processing steps, finds the best architecture parameters for each of the transformed datasets, performs sanity checks on the models and train a minimal deep learning version for each dataset/architecture.

Those experiments are then compiled into an intuitive report, making it easier for scientists to pick the best design choice to be sent to large scale training.






## Usage

 [!NOTE]
 If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.

<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
     Explain what rows and columns represent. For instance (please edit as appropriate):

First, prepare a samplesheet with your input data that looks as follows:

`samplesheet.csv`:

```csv
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
```

Each row represents a fastq file (single-end) or a pair of fastq files (paired end).

-->

Now, you can run the pipeline using:

<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->

```bash
nextflow run nf-core/deepmodeloptim \
   -profile <docker/singularity/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR>
```

[!WARNING]
 Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).

For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/deepmodeloptim/usage) and the [parameter documentation](https://nf-co.re/deepmodeloptim/parameters).

## Pipeline output

To see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/deepmodeloptim/results) tab on the nf-core website pipeline page.
For more details about the output files and reports, please refer to the
[output documentation](https://nf-co.re/deepmodeloptim/output).

<!-- TODO
 Reconciliate previous readme with a nf-core format one.
-->

## Code requirements

### Data

The data is provided as a csv where the header columns are in the following format : name:type:class

_name_ is user given (note that it has an impact on experiment definition).

_type_ is either "input", "meta", or "label". "input" types are fed into the mode, "meta" types are registered but not transformed nor fed into the models and "label" is used as a training label.

_class_ is a supported class of data for which encoding methods have been created, please raise an issue on github or contribute a PR if a class of your interest is not implemented

#### csv general example

| input1:input:input_type | input2:input:input_type | meta1:meta:meta_type | label1:label:label_type | label2:label:label_type |
| ----------------------- | ----------------------- | -------------------- | ----------------------- | ----------------------- |
| sample1 input1          | sample1 input2          | sample1 meta1        | sample1 label1          | sample1 label2          |
| sample2 input1          | sample2 input2          | sample2 meta1        | sample2 label1          | sample2 label2          |
| sample3 input1          | sample3 input2          | sample3 meta1        | sample3 label1          | sample3 label2          |

#### csv specific example

| mouse_dna:input:dna | mouse_rnaseq:label:float |
| ------------------- | ------------------------ |
| ACTAGGCATGCTAGTCG   | 0.53                     |
| ACTGGGGCTAGTCGAA    | 0.23                     |
| GATGTTCTGATGCT      | 0.98                     |

### Model

In STIMULUS, users input a .py file containing a model written in pytorch (see examples in bin/tests/models)

Said models should obey to minor standards:

1. The model class you want to train should start with "Model", there should be exactly one class starting with "Model".

```python

import torch
import torch.nn as nn

class SubClass(nn.Module):
    """
    a subclass, this will be invisible to Stimulus
    """

class ModelClass(nn.Module):
    """
    the PyTorch model to be trained by Stimulus, can use SubClass if needed
    """

class ModelAnotherClass(nn.Module):
    """
    uh oh, this will return an error as there are two classes starting with Model
    """

```

2. The model "forward" function should have input variables with the **same names** as the defined input names in the csv input file

```python

import torch
import torch.nn as nn

class ModelClass(nn.Module):
    """
    the PyTorch model to be trained by Stimulus
    """
    def __init__():
        # your model definition here
        pass

    def forward(self, mouse_dna):
        output = model_layers(mouse_dna)

```

3. The model should include a **batch** named function that takes as input a dictionary of input "x", a dictionary of labels "y", a Callable loss function and a callable optimizer.

In order to allow **batch** to take as input a Callable loss, we define an extra compute_loss function that parses the correct output to the correct loss class.

```python

import torch
import torch.nn as nn
from typing import Callable, Optional, Tuple

class ModelClass(nn.Module):
    """
    the PyTorch model to be trained by Stimulus
    """

    def __init__():
        # your model definition here
        pass

    def forward(self, mouse_dna):
        output = model_layers(mouse_dna)

    def compute_loss_mouse_rnaseq(self, output: torch.Tensor, mouse_rnaseq: torch.Tensor, loss_fn: Callable) -> torch.Tensor:
        """
        Compute the loss.
        `output` is the output tensor of the forward pass.
        `mouse_rnaseq` is the target tensor -> label column name.
        `loss_fn` is the loss function to be used.

        IMPORTANT : the input variable "mouse_rnaseq" has the same name as the label defined in the csv above.
        """
        return loss_fn(output, mouse_rnaseq)

    def batch(self, x: dict, y: dict, loss_fn: Callable, optimizer: Optional[Callable] = None) -> Tuple[torch.Tensor, dict]:
        """
        Perform one batch step.
        `x` is a dictionary with the input tensors.
        `y` is a dictionary with the target tensors.
        `loss_fn` is the loss function to be used.

        If `optimizer` is passed, it will perform the optimization step -> training step
        Otherwise, only return the forward pass output and loss -> evaluation step
        """
        output = self.forward(**x)
        loss = self.compute_loss_mouse_rnaseq(output, **y, loss_fn=loss_fn)
        if optimizer is not None:
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        return loss, output

```

If you don't want to optimize the loss function, the code above can be written in a simplified manner

```python

import torch
import torch.nn as nn
from typing import Callable, Optional, Tuple

class ModelClass(nn.Module):
    """
    the PyTorch model to be trained by Stimulus
    """

    def __init__():
        # your model definition here
        pass

    def forward(self, mouse_dna):
        output = model_layers(mouse_dna)

    def batch(self, x: dict, y: dict, optimizer: Optional[Callable] = None) -> Tuple[torch.Tensor, dict]:
        """
        Perform one batch step.
        `x` is a dictionary with the input tensors.
        `y` is a dictionary with the target tensors.
        `loss_fn` is the loss function to be used.

        If `optimizer` is passed, it will perform the optimization step -> training step
        Otherwise, only return the forward pass output and loss -> evaluation step
        """
        output = self.forward(**x)
        loss = nn.MSELoss(output, y['mouse_rnaseq'])
        if optimizer is not None:
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        return loss, output

```

### Model parameter search design

### Experiment design

The file in which all information about how to handle the data before tuning is called an `experiment_config`. This file in `.json` format for now but it will be soon moved to `.yaml`. So this section could vary in the future.

The `experiment_config` is a mandatory input for the pipeline and can be passed with the flag `--exp_conf` followed by the `PATH` of the file you want to use. Two examples of `experiment_config` can be found in the `examples` directory.

### Experiment config content description.

## Credits

<!-- TODO
    Update the author list
-->

nf-core/deepmodeloptim was originally written by Mathys Grapotte ([@mathysgrapotte](https://github.com/mathysgrapotte)).

We would like to thank to all the contributors for their extensive assistance in the development of this pipeline, who include (but not limited to):

- Alessio Vignoli ([@alessiovignoli](https://github.com/alessiovignoli))
- Suzanne Jin ([@suzannejin](https://github.com/suzannejin))
- Luisa Santus ([@luisas](https://github.com/luisas))
- Jose Espinosa ([@JoseEspinosa](https://github.com/JoseEspinosa))
- Evan Floden ([@evanfloden](https://github.com/evanfloden))
- Igor Trujnara ([@itrujnara](https://github.com/itrujnara))

Special thanks for the artistic work on the logo to Maxime ([@maxulysse](https://github.com/maxulysse)), Suzanne ([@suzannejin](https://github.com/suzannejin)), Mathys ([@mathysgrapotte](https://github.com/mathysgrapotte)) and, not surprisingly, ChatGPT.

<!-- TODO nf-core: If applicable, make list of people who have also contributed -->

## Contributions and Support

If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).

For further information or help, don't hesitate to get in touch on the [Slack `#deepmodeloptim` channel](https://nfcore.slack.com/channels/deepmodeloptim) (you can join with [this invite](https://nf-co.re/join/slack)).

## Citations

<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
<!-- If you use nf-core/deepmodeloptim for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->

<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->

An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.

You can cite the `nf-core` publication as follows:

**The nf-core framework for community-curated bioinformatics pipelines.**

 Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

 _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).

	a/README.md		b/README.md
1	<h1>	1
2	<picture>
3	<source media="(prefers-color-scheme: dark)" srcset="docs/images/nf-core-deepmodeloptim_logo_dark.png">
4	<img alt="nf-core/deepmodeloptim" src="docs/images/nf-core-deepmodeloptim_logo_light.png">
5	</picture>
6	</h1>
7
8	[![GitHub Actions CI Status](https://github.com/nf-core/deepmodeloptim/actions/workflows/ci.yml/badge.svg)](https://github.com/nf-core/deepmodeloptim/actions/workflows/ci.yml)	2	[![GitHub Actions CI Status](https://github.com/nf-core/deepmodeloptim/actions/workflows/ci.yml/badge.svg)](https://github.com/nf-core/deepmodeloptim/actions/workflows/ci.yml)
9	[![GitHub Actions Linting Status](https://github.com/nf-core/deepmodeloptim/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/deepmodeloptim/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/deepmodeloptim/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)	3	[![GitHub Actions Linting Status](https://github.com/nf-core/deepmodeloptim/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/deepmodeloptim/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/deepmodeloptim/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.XXXXXXX-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.XXXXXXX)
10	[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)	4	[![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)
11		5
12	[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/)	6	[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/)
13	[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)	7	[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
14	[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)	8	[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
15	[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)	9	[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
16	[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-core/deepmodeloptim)	10	[![Launch on Seqera Platform](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Seqera%20Platform-%234256e7)](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-core/deepmodeloptim)
17		11
18	[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23deepmodeloptim-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/deepmodeloptim)[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core)[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)	12	[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23deepmodeloptim-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/deepmodeloptim)[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core)[![Follow on Mastodon](https://img.shields.io/badge/mastodon-nf__core-6364ff?labelColor=FFFFFF&logo=mastodon)](https://mstdn.science/@nf_core)[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core)
19		13
20	## 📌 Quick intro check out this 👉🏻 [video](https://www.youtube.com/watch?v=dC5p_tXQpEs&list=PLPZ8WHdZGxmVKQga4KE15YVt95i-QXVvE&index=25)!	14	## 📌 Quick intro check out this 👉🏻 [video](https://www.youtube.com/watch?v=dC5p_tXQpEs&list=PLPZ8WHdZGxmVKQga4KE15YVt95i-QXVvE&index=25)!
21		15
22	## Introduction	16	## Introduction
23		17
24	nf-core/deepmodeloptim is a bioinformatics end-to-end pipeline designed to facilitate the testing and development of deep learning models for genomics.	18	nf-core/deepmodeloptim is a bioinformatics end-to-end pipeline designed to facilitate the testing and development of deep learning models for genomics.
25		19
26	Deep learning model development in natural science is an empirical and costly process. Despite the existence of generic tools for the tuning of hyperparameters and the training of the models, the connection between these procedures and the impact coming from the data is often underlooked, or at least not easily automatized. Indeed, researchers must define a pre-processing pipeline, an architecture, find the best parameters for said architecture and iterate over this process, often manually.	20	Deep learning model development in natural science is an empirical and costly process. Despite the existence of generic tools for the tuning of hyperparameters and the training of the models, the connection between these procedures and the impact coming from the data is often underlooked, or at least not easily automatized. Indeed, researchers must define a pre-processing pipeline, an architecture, find the best parameters for said architecture and iterate over this process, often manually.
27		21
28	Leveraging the power of Nextflow (polyglotism, container integration, scalable on the cloud), this pipeline will help users to 1) automatize the testing of the model, 2) gain useful insights with respect to the learning behaviour of the model, and hence 3) accelerate the development.	22	Leveraging the power of Nextflow (polyglotism, container integration, scalable on the cloud), this pipeline will help users to 1) automatize the testing of the model, 2) gain useful insights with respect to the learning behaviour of the model, and hence 3) accelerate the development.
29		23
30	## Pipeline summary	24	## Pipeline summary
31		25
32	It takes as input:	26	It takes as input:
33		27
34	- A dataset	28	- A dataset
35	- A configuration file to describe the data pre-processing steps to be performed	29	- A configuration file to describe the data pre-processing steps to be performed
36	- An user defined PyTorch model	30	- An user defined PyTorch model
37	- A configuration file describing the range of parameters for the PyTorch model	31	- A configuration file describing the range of parameters for the PyTorch model
38		32
39	It then transforms the data according to all possible pre-processing steps, finds the best architecture parameters for each of the transformed datasets, performs sanity checks on the models and train a minimal deep learning version for each dataset/architecture.	33	It then transforms the data according to all possible pre-processing steps, finds the best architecture parameters for each of the transformed datasets, performs sanity checks on the models and train a minimal deep learning version for each dataset/architecture.
40		34
41	Those experiments are then compiled into an intuitive report, making it easier for scientists to pick the best design choice to be sent to large scale training.	35	Those experiments are then compiled into an intuitive report, making it easier for scientists to pick the best design choice to be sent to large scale training.
42		36
43	<picture>	37
44	<source media="(prefers-color-scheme: dark)" srcset="assets/metromap.png">
45	<img alt="nf-core/deepmodeloptim metro map" src="assets/metromap_light.png">
46	</picture>
47
48	## Usage	38	## Usage
49		39
50	> [!NOTE]	40	[!NOTE]
51	> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.	41	If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
52		42
53	<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.	43	<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
54	Explain what rows and columns represent. For instance (please edit as appropriate):	44	Explain what rows and columns represent. For instance (please edit as appropriate):
55		45
56	First, prepare a samplesheet with your input data that looks as follows:	46	First, prepare a samplesheet with your input data that looks as follows:
57		47
58	`samplesheet.csv`:	48	`samplesheet.csv`:
59		49
60	```csv	50	```csv
61	sample,fastq_1,fastq_2	51	sample,fastq_1,fastq_2
62	CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz	52	CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
63	```	53	```
64		54
65	Each row represents a fastq file (single-end) or a pair of fastq files (paired end).	55	Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
66		56
67	-->	57	-->
68		58
69	Now, you can run the pipeline using:	59	Now, you can run the pipeline using:
70		60
71	<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->	61	<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->
72		62
73	```bash	63	```bash
74	nextflow run nf-core/deepmodeloptim \	64	nextflow run nf-core/deepmodeloptim \
75	-profile <docker/singularity/.../institute> \	65	-profile <docker/singularity/.../institute> \
76	--input samplesheet.csv \	66	--input samplesheet.csv \
77	--outdir <OUTDIR>	67	--outdir <OUTDIR>
78	```	68	```
79		69
80	> [!WARNING]	70	[!WARNING]
81	> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _except for parameters_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).	71	Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _except for parameters_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).
82		72
83	For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/deepmodeloptim/usage) and the [parameter documentation](https://nf-co.re/deepmodeloptim/parameters).	73	For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/deepmodeloptim/usage) and the [parameter documentation](https://nf-co.re/deepmodeloptim/parameters).
84		74
85	## Pipeline output	75	## Pipeline output
86		76
87	To see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/deepmodeloptim/results) tab on the nf-core website pipeline page.	77	To see the results of an example test run with a full size dataset refer to the [results](https://nf-co.re/deepmodeloptim/results) tab on the nf-core website pipeline page.
88	For more details about the output files and reports, please refer to the	78	For more details about the output files and reports, please refer to the
89	[output documentation](https://nf-co.re/deepmodeloptim/output).	79	[output documentation](https://nf-co.re/deepmodeloptim/output).
90		80
91	<!-- TODO	81	<!-- TODO
92	Reconciliate previous readme with a nf-core format one.	82	Reconciliate previous readme with a nf-core format one.
93	-->	83	-->
94		84
95	## Code requirements	85	## Code requirements
96		86
97	### Data	87	### Data
98		88
99	The data is provided as a csv where the header columns are in the following format : name:type:class	89	The data is provided as a csv where the header columns are in the following format : name:type:class
100		90
101	_name_ is user given (note that it has an impact on experiment definition).	91	_name_ is user given (note that it has an impact on experiment definition).
102		92
103	_type_ is either "input", "meta", or "label". "input" types are fed into the mode, "meta" types are registered but not transformed nor fed into the models and "label" is used as a training label.	93	_type_ is either "input", "meta", or "label". "input" types are fed into the mode, "meta" types are registered but not transformed nor fed into the models and "label" is used as a training label.
104		94
105	_class_ is a supported class of data for which encoding methods have been created, please raise an issue on github or contribute a PR if a class of your interest is not implemented	95	_class_ is a supported class of data for which encoding methods have been created, please raise an issue on github or contribute a PR if a class of your interest is not implemented
106		96
107	#### csv general example	97	#### csv general example
108		98
109	\| input1:input:input_type \| input2:input:input_type \| meta1:meta:meta_type \| label1:label:label_type \| label2:label:label_type \|	99	\| input1:input:input_type \| input2:input:input_type \| meta1:meta:meta_type \| label1:label:label_type \| label2:label:label_type \|
110	\| ----------------------- \| ----------------------- \| -------------------- \| ----------------------- \| ----------------------- \|	100	\| ----------------------- \| ----------------------- \| -------------------- \| ----------------------- \| ----------------------- \|
111	\| sample1 input1 \| sample1 input2 \| sample1 meta1 \| sample1 label1 \| sample1 label2 \|	101	\| sample1 input1 \| sample1 input2 \| sample1 meta1 \| sample1 label1 \| sample1 label2 \|
112	\| sample2 input1 \| sample2 input2 \| sample2 meta1 \| sample2 label1 \| sample2 label2 \|	102	\| sample2 input1 \| sample2 input2 \| sample2 meta1 \| sample2 label1 \| sample2 label2 \|
113	\| sample3 input1 \| sample3 input2 \| sample3 meta1 \| sample3 label1 \| sample3 label2 \|	103	\| sample3 input1 \| sample3 input2 \| sample3 meta1 \| sample3 label1 \| sample3 label2 \|
114		104
115	#### csv specific example	105	#### csv specific example
116		106
117	\| mouse_dna:input:dna \| mouse_rnaseq:label:float \|	107	\| mouse_dna:input:dna \| mouse_rnaseq:label:float \|
118	\| ------------------- \| ------------------------ \|	108	\| ------------------- \| ------------------------ \|
119	\| ACTAGGCATGCTAGTCG \| 0.53 \|	109	\| ACTAGGCATGCTAGTCG \| 0.53 \|
120	\| ACTGGGGCTAGTCGAA \| 0.23 \|	110	\| ACTGGGGCTAGTCGAA \| 0.23 \|
121	\| GATGTTCTGATGCT \| 0.98 \|	111	\| GATGTTCTGATGCT \| 0.98 \|
122		112
123	### Model	113	### Model
124		114
125	In STIMULUS, users input a .py file containing a model written in pytorch (see examples in bin/tests/models)	115	In STIMULUS, users input a .py file containing a model written in pytorch (see examples in bin/tests/models)
126		116
127	Said models should obey to minor standards:	117	Said models should obey to minor standards:
128		118
129	1. The model class you want to train should start with "Model", there should be exactly one class starting with "Model".	119	1. The model class you want to train should start with "Model", there should be exactly one class starting with "Model".
130		120
131	```python	121	```python
132		122
133	import torch	123	import torch
134	import torch.nn as nn	124	import torch.nn as nn
135		125
136	class SubClass(nn.Module):	126	class SubClass(nn.Module):
137	"""	127	"""
138	a subclass, this will be invisible to Stimulus	128	a subclass, this will be invisible to Stimulus
139	"""	129	"""
140		130
141	class ModelClass(nn.Module):	131	class ModelClass(nn.Module):
142	"""	132	"""
143	the PyTorch model to be trained by Stimulus, can use SubClass if needed	133	the PyTorch model to be trained by Stimulus, can use SubClass if needed
144	"""	134	"""
145		135
146	class ModelAnotherClass(nn.Module):	136	class ModelAnotherClass(nn.Module):
147	"""	137	"""
148	uh oh, this will return an error as there are two classes starting with Model	138	uh oh, this will return an error as there are two classes starting with Model
149	"""	139	"""
150		140
151	```	141	```
152		142
153	2. The model "forward" function should have input variables with the same names as the defined input names in the csv input file	143	2. The model "forward" function should have input variables with the same names as the defined input names in the csv input file
154		144
155	```python	145	```python
156		146
157	import torch	147	import torch
158	import torch.nn as nn	148	import torch.nn as nn
159		149
160	class ModelClass(nn.Module):	150	class ModelClass(nn.Module):
161	"""	151	"""
162	the PyTorch model to be trained by Stimulus	152	the PyTorch model to be trained by Stimulus
163	"""	153	"""
164	def __init__():	154	def __init__():
165	# your model definition here	155	# your model definition here
166	pass	156	pass
167		157
168	def forward(self, mouse_dna):	158	def forward(self, mouse_dna):
169	output = model_layers(mouse_dna)	159	output = model_layers(mouse_dna)
170		160
171	```	161	```
172		162
173	3. The model should include a batch named function that takes as input a dictionary of input "x", a dictionary of labels "y", a Callable loss function and a callable optimizer.	163	3. The model should include a batch named function that takes as input a dictionary of input "x", a dictionary of labels "y", a Callable loss function and a callable optimizer.
174		164
175	In order to allow batch to take as input a Callable loss, we define an extra compute_loss function that parses the correct output to the correct loss class.	165	In order to allow batch to take as input a Callable loss, we define an extra compute_loss function that parses the correct output to the correct loss class.
176		166
177	```python	167	```python
178		168
179	import torch	169	import torch
180	import torch.nn as nn	170	import torch.nn as nn
181	from typing import Callable, Optional, Tuple	171	from typing import Callable, Optional, Tuple
182		172
183	class ModelClass(nn.Module):	173	class ModelClass(nn.Module):
184	"""	174	"""
185	the PyTorch model to be trained by Stimulus	175	the PyTorch model to be trained by Stimulus
186	"""	176	"""
187		177
188	def __init__():	178	def __init__():
189	# your model definition here	179	# your model definition here
190	pass	180	pass
191		181
192	def forward(self, mouse_dna):	182	def forward(self, mouse_dna):
193	output = model_layers(mouse_dna)	183	output = model_layers(mouse_dna)
194		184
195	def compute_loss_mouse_rnaseq(self, output: torch.Tensor, mouse_rnaseq: torch.Tensor, loss_fn: Callable) -> torch.Tensor:	185	def compute_loss_mouse_rnaseq(self, output: torch.Tensor, mouse_rnaseq: torch.Tensor, loss_fn: Callable) -> torch.Tensor:
196	"""	186	"""
197	Compute the loss.	187	Compute the loss.
198	`output` is the output tensor of the forward pass.	188	`output` is the output tensor of the forward pass.
199	`mouse_rnaseq` is the target tensor -> label column name.	189	`mouse_rnaseq` is the target tensor -> label column name.
200	`loss_fn` is the loss function to be used.	190	`loss_fn` is the loss function to be used.
201		191
202	IMPORTANT : the input variable "mouse_rnaseq" has the same name as the label defined in the csv above.	192	IMPORTANT : the input variable "mouse_rnaseq" has the same name as the label defined in the csv above.
203	"""	193	"""
204	return loss_fn(output, mouse_rnaseq)	194	return loss_fn(output, mouse_rnaseq)
205		195
206	def batch(self, x: dict, y: dict, loss_fn: Callable, optimizer: Optional[Callable] = None) -> Tuple[torch.Tensor, dict]:	196	def batch(self, x: dict, y: dict, loss_fn: Callable, optimizer: Optional[Callable] = None) -> Tuple[torch.Tensor, dict]:
207	"""	197	"""
208	Perform one batch step.	198	Perform one batch step.
209	`x` is a dictionary with the input tensors.	199	`x` is a dictionary with the input tensors.
210	`y` is a dictionary with the target tensors.	200	`y` is a dictionary with the target tensors.
211	`loss_fn` is the loss function to be used.	201	`loss_fn` is the loss function to be used.
212		202
213	If `optimizer` is passed, it will perform the optimization step -> training step	203	If `optimizer` is passed, it will perform the optimization step -> training step
214	Otherwise, only return the forward pass output and loss -> evaluation step	204	Otherwise, only return the forward pass output and loss -> evaluation step
215	"""	205	"""
216	output = self.forward(**x)	206	output = self.forward(**x)
217	loss = self.compute_loss_mouse_rnaseq(output, **y, loss_fn=loss_fn)	207	loss = self.compute_loss_mouse_rnaseq(output, **y, loss_fn=loss_fn)
218	if optimizer is not None:	208	if optimizer is not None:
219	optimizer.zero_grad()	209	optimizer.zero_grad()
220	loss.backward()	210	loss.backward()
221	optimizer.step()	211	optimizer.step()
222	return loss, output	212	return loss, output
223		213
224	```	214	```
225		215
226	If you don't want to optimize the loss function, the code above can be written in a simplified manner	216	If you don't want to optimize the loss function, the code above can be written in a simplified manner
227		217
228	```python	218	```python
229		219
230	import torch	220	import torch
231	import torch.nn as nn	221	import torch.nn as nn
232	from typing import Callable, Optional, Tuple	222	from typing import Callable, Optional, Tuple
233		223
234	class ModelClass(nn.Module):	224	class ModelClass(nn.Module):
235	"""	225	"""
236	the PyTorch model to be trained by Stimulus	226	the PyTorch model to be trained by Stimulus
237	"""	227	"""
238		228
239	def __init__():	229	def __init__():
240	# your model definition here	230	# your model definition here
241	pass	231	pass
242		232
243	def forward(self, mouse_dna):	233	def forward(self, mouse_dna):
244	output = model_layers(mouse_dna)	234	output = model_layers(mouse_dna)
245		235
246	def batch(self, x: dict, y: dict, optimizer: Optional[Callable] = None) -> Tuple[torch.Tensor, dict]:	236	def batch(self, x: dict, y: dict, optimizer: Optional[Callable] = None) -> Tuple[torch.Tensor, dict]:
247	"""	237	"""
248	Perform one batch step.	238	Perform one batch step.
249	`x` is a dictionary with the input tensors.	239	`x` is a dictionary with the input tensors.
250	`y` is a dictionary with the target tensors.	240	`y` is a dictionary with the target tensors.
251	`loss_fn` is the loss function to be used.	241	`loss_fn` is the loss function to be used.
252		242
253	If `optimizer` is passed, it will perform the optimization step -> training step	243	If `optimizer` is passed, it will perform the optimization step -> training step
254	Otherwise, only return the forward pass output and loss -> evaluation step	244	Otherwise, only return the forward pass output and loss -> evaluation step
255	"""	245	"""
256	output = self.forward(**x)	246	output = self.forward(**x)
257	loss = nn.MSELoss(output, y['mouse_rnaseq'])	247	loss = nn.MSELoss(output, y['mouse_rnaseq'])
258	if optimizer is not None:	248	if optimizer is not None:
259	optimizer.zero_grad()	249	optimizer.zero_grad()
260	loss.backward()	250	loss.backward()
261	optimizer.step()	251	optimizer.step()
262	return loss, output	252	return loss, output
263		253
264	```	254	```
265		255
266	### Model parameter search design	256	### Model parameter search design
267		257
268	### Experiment design	258	### Experiment design
269		259
270	The file in which all information about how to handle the data before tuning is called an `experiment_config`. This file in `.json` format for now but it will be soon moved to `.yaml`. So this section could vary in the future.	260	The file in which all information about how to handle the data before tuning is called an `experiment_config`. This file in `.json` format for now but it will be soon moved to `.yaml`. So this section could vary in the future.
271		261
272	The `experiment_config` is a mandatory input for the pipeline and can be passed with the flag `--exp_conf` followed by the `PATH` of the file you want to use. Two examples of `experiment_config` can be found in the `examples` directory.	262	The `experiment_config` is a mandatory input for the pipeline and can be passed with the flag `--exp_conf` followed by the `PATH` of the file you want to use. Two examples of `experiment_config` can be found in the `examples` directory.
273		263
274	### Experiment config content description.	264	### Experiment config content description.
275		265
276	## Credits	266	## Credits
277		267
278	<!-- TODO	268	<!-- TODO
279	Update the author list	269	Update the author list
280	-->	270	-->
281		271
282	nf-core/deepmodeloptim was originally written by Mathys Grapotte ([@mathysgrapotte](https://github.com/mathysgrapotte)).	272	nf-core/deepmodeloptim was originally written by Mathys Grapotte ([@mathysgrapotte](https://github.com/mathysgrapotte)).
283		273
284	We would like to thank to all the contributors for their extensive assistance in the development of this pipeline, who include (but not limited to):	274	We would like to thank to all the contributors for their extensive assistance in the development of this pipeline, who include (but not limited to):
285		275
286	- Alessio Vignoli ([@alessiovignoli](https://github.com/alessiovignoli))	276	- Alessio Vignoli ([@alessiovignoli](https://github.com/alessiovignoli))
287	- Suzanne Jin ([@suzannejin](https://github.com/suzannejin))	277	- Suzanne Jin ([@suzannejin](https://github.com/suzannejin))
288	- Luisa Santus ([@luisas](https://github.com/luisas))	278	- Luisa Santus ([@luisas](https://github.com/luisas))
289	- Jose Espinosa ([@JoseEspinosa](https://github.com/JoseEspinosa))	279	- Jose Espinosa ([@JoseEspinosa](https://github.com/JoseEspinosa))
290	- Evan Floden ([@evanfloden](https://github.com/evanfloden))	280	- Evan Floden ([@evanfloden](https://github.com/evanfloden))
291	- Igor Trujnara ([@itrujnara](https://github.com/itrujnara))	281	- Igor Trujnara ([@itrujnara](https://github.com/itrujnara))
292		282
293	Special thanks for the artistic work on the logo to Maxime ([@maxulysse](https://github.com/maxulysse)), Suzanne ([@suzannejin](https://github.com/suzannejin)), Mathys ([@mathysgrapotte](https://github.com/mathysgrapotte)) and, not surprisingly, ChatGPT.	283	Special thanks for the artistic work on the logo to Maxime ([@maxulysse](https://github.com/maxulysse)), Suzanne ([@suzannejin](https://github.com/suzannejin)), Mathys ([@mathysgrapotte](https://github.com/mathysgrapotte)) and, not surprisingly, ChatGPT.
294		284
295	<!-- TODO nf-core: If applicable, make list of people who have also contributed -->	285	<!-- TODO nf-core: If applicable, make list of people who have also contributed -->
296		286
297	## Contributions and Support	287	## Contributions and Support
298		288
299	If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).	289	If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).
300		290
301	For further information or help, don't hesitate to get in touch on the [Slack `#deepmodeloptim` channel](https://nfcore.slack.com/channels/deepmodeloptim) (you can join with [this invite](https://nf-co.re/join/slack)).	291	For further information or help, don't hesitate to get in touch on the [Slack `#deepmodeloptim` channel](https://nfcore.slack.com/channels/deepmodeloptim) (you can join with [this invite](https://nf-co.re/join/slack)).
302		292
303	## Citations	293	## Citations
304		294
305	<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->	295	<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
306	<!-- If you use nf-core/deepmodeloptim for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->	296	<!-- If you use nf-core/deepmodeloptim for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->
307		297
308	<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->	298	<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->
309		299
310	An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.	300	An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.
311		301
312	You can cite the `nf-core` publication as follows:	302	You can cite the `nf-core` publication as follows:
313		303
314	> The nf-core framework for community-curated bioinformatics pipelines.	304	The nf-core framework for community-curated bioinformatics pipelines.
315	>	305
316	> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.	306	Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
317	>	307
318	> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).	308	_Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).