a b/docs/contributing/contributing.md
1
# Contributing to Hippunfold
2
3
Hippunfold python package dependencies are managed with Poetry, which you\'ll need
4
installed on your machine. You can find instructions on the [poetry
5
website](https://python-poetry.org/docs/master/#installation).
6
7
HippUnfold also has a number of dependencies outside of python, including popular neuroimaging tools like `wb_command`, `ANTs`, `c3d`, and others listed in https://github.com/khanlab/hippunfold_deps. Thus we strongly recommend running HippUnfold with the `--use-singularity` flag, which will pull this container automatically and use it when required, unless you are comfortable using all of these tools yourself. 
8
9
Note: These instructions are only recommended if you are making changes to HippUnfold code to commit back to this repository, or if you are using Snakemake's cluster execution profiles. If not, it is easier to run HippUnfold when it is packaged into a single singularity container (e.g. `docker://khanlab/hippunfold:latest`). 
10
11
12
## Set-up your development environment:
13
14
Clone the repository and install dependencies and dev dependencies with
15
poetry:
16
17
    git clone http://github.com/khanlab/hippunfold
18
    cd hippunfold
19
    poetry install
20
21
Poetry will automatically create a virtual environment. To customize where 
22
these virtual environments are stored see poetry docs 
23
[here](https://python-poetry.org/docs/configuration/)
24
25
Then, you can run hippunfold with:
26
27
    poetry run hippunfold
28
29
or you can activate a virtualenv shell and then run hippunfold directly:
30
31
    poetry shell
32
    hippunfold
33
34
You can exit the poetry shell with `exit`.
35
36
Note: you can alternatively use `pip install` to install dependencies.
37
38
## Running code format quality checking and fixing:
39
40
Hippunfold uses [poethepoet](https://github.com/nat-n/poethepoet) as a
41
task runner. You can see what commands are available by running:
42
43
    poetry run poe
44
45
We use `black` and `snakefmt` to ensure
46
formatting and style of python and Snakefiles is consistent. There are
47
two task runners you can use to check and fix your code, and can be
48
invoked with:
49
50
    poetry run poe quality_check
51
    poetry run poe quality_fix
52
53
Note that if you are in a poetry shell, you do not need to prepend
54
`poetry run` to the command.
55
56
## Dry-run testing your workflow:
57
58
Using Snakemake\'s dry-run option (`--dry-run`/`-n`) is an easy way to verify any
59
changes to the workflow are working correctly. The `test_data` folder contains a 
60
number of *fake* bids datasets (i.e. datasets with zero-sized files) that are useful
61
for verifying different aspects of the workflow. These dry-run tests are
62
part of the automated github actions that run for every commit.
63
64
You can use the hippunfold CLI to perform a dry-run of the workflow,
65
e.g. here printing out every command as well:
66
67
    hippunfold test_data/bids_singleT2w test_out participant --modality T2w --use-singularity -np
68
69
As a shortcut, you can also use `snakemake` instead of the
70
hippunfold CLI, as the `snakebids.yml` config file is set-up
71
by default to use this same test dataset, as long as you run snakemake
72
from the `hippunfold` folder that contains the
73
`workflow` folder:
74
75
    cd hippunfold
76
    snakemake -np
77
78
## Wet-run testing your workflow:
79
80
Opensource data has been made available for wet-run testing any changes to HippUnfold on OSF [here](https://osf.io/k2nme/). These are real opensource data meant ot span a wide array of HippUnfold use cases with various modalities, resolutions, developmental stages, and species. Please note that some changes to hippUnfold will impact all worflows and should be run and then visually inspected for all these cases. Other changes may impact only one workflow (e.g. adding a new template) and so the recommended run parameters should be adjusted accordingly. 
81
82
## Instructions for Compute Canada
83
84
This section provides an example of how to set up a `pip installed` copy
85
of HippUnfold on Compute Canada\'s `graham` cluster.
86
87
### Setting up a dev environment on graham:
88
89
Here are some instructions to get your python environment set-up on
90
graham to run HippUnfold:
91
92
1.  Create a virtualenv and activate it:
93
94
        mkdir $SCRATCH/hippdev
95
        cd $SCRATCH/hippdev
96
        module load python/3.8
97
        virtualenv venv
98
        source venv/bin/activate
99
100
2.  Install HippUnfold
101
102
        git clone https://github.com/khanlab/hippunfold.git
103
        pip install hippunfold/
104
        
105
3. To run Hippunfold on Graham as a member of the Khan lab, please configure the
106
[neuroglia-helpers](https://github.com/khanlab/neuroglia-helpers) with the khanlab profile.
107
108
4. To avoid having to download trained models (see section [below](#deep-learning-nnu-net-model-files)), you can set an environment variable in your bash profile (~/.bash_profile) with the location of the
109
trained models. For Khan lab's members, the following line must be add to the bash profile file:
110
111
        export HIPPUNFOLD_CACHE_DIR="/project/6050199/akhanf/opt/hippunfold_trained_models"
112
113
114
Note: make sure to reload your bash profile if needed (`source ~./bash_profile`).        
115
116
5. For an easier execution in Graham, it's recommended to also install
117
[cc-slurm](https://github.com/khanlab/cc-slurm) snakemake profile for cluster execution with slurm.
118
119
Note if you want to run hippunfold with modifications to your cloned 
120
repository, you either need to pip install again, or run hippunfold the following, since 
121
an `editable` pip install is not allowed with pyproject:
122
123
        python <YOUR_HIPPUNFOLD_DIR>/hippunfold/run.py
124
125
### Running hippunfold jobs on graham:
126
127
Note that this requires
128
[neuroglia-helpers](https://github.com/khanlab/neuroglia-helpers) for
129
regularSubmit or regularInteractive wrappers, and the
130
[cc-slurm](https://github.com/khanlab/cc-slurm) snakemake profile for cluster execution with slurm.
131
132
In an interactive job (for testing):
133
134
    regularInteractive -n 8
135
    hippunfold <PATH_TO_BIDS_DIR> <PATH_TO_OUTPUT_DIR> participant \
136
    --participant_label 001 -j 8 --modality T1w --use-singularity \
137
    --singularity-prefix $SNAKEMAKE_SINGULARITY_DIR
138
139
Where:
140
 - `--participant_label 001` is used to specify only one subject from a BIDS
141
directory presumeably containing many subjects.
142
 - `-j 8` specifies the number of cores used
143
 - `--modality T1w` is used to specify that a T1w dataset is being processed
144
 - `--singularity-prefix $SNAKEMAKE_SINGULARITY_DIR` specifies the directory in
145
which singularity images will be stored. The environment variable is created
146
when installing neuroglia-helpers.
147
148
Submitting a job (for larger cores, more subjects), still single job,
149
but snakemake will parallelize over the 32 cores:
150
151
    regularSubmit -j Fat \
152
    hippunfold PATH_TO_BIDS_DIR PATH_TO_OUTPUT_DIR participant  -j 32 \
153
    --modality T1w --use-singularity --singularity-prefix $SNAKEMAKE_SINGULARITY_DIR
154
155
Scaling up to \~hundred subjects (needs cc-slurm snakemake profile
156
installed), submits 1 16core job per subject:
157
158
    hippunfold PATH_TO_BIDS_DIR PATH_TO_OUTPUT_DIR participant \
159
    --modality T1w --use-singularity --singularity-prefix $SNAKEMAKE_SINGULARITY_DIR \
160
    --profile cc-slurm
161
162
Scaling up to even more subjects (uses group-components to bundle
163
multiple subjects in each job), 1 32core job for N subjects (e.g. 10):
164
165
    hippunfold PATH_TO_BIDS_DIR PATH_TO_OUTPUT_DIR participant \
166
    --modality T1w --use-singularity --singularity-prefix $SNAKEMAKE_SINGULARITY_DIR \
167
    --profile cc-slurm --group-components subj=10
168
169
### Running hippunfold jobs on the CBS server
170
1. Clone the repository and install dependencies and dev dependencies with poetry:
171
172
       git clone http://github.com/khanlab/hippunfold
173
       cd hippunfold
174
       poetry install
175
If poetry is not installed, please refer to the [installation documentation](https://python-poetry.org/docs/). If the command poetry is not found, add the following line to your bashrc file located in your home directory (considering that the poetry binary is located under `$HOME/.local/bin`:
176
177
       export PATH=$PATH:$HOME/.local/bin
178
2. To avoid having to download containers and trained models (see section [below](#deep-learning-nnu-net-model-files)), add the `$SNAKEMAKE_SINGULARITY_DIR` and `$HIPPUNFOLD_CACHE_DIR` environment variables to the bashrc file. For Khan lab's members, add the following lines:
179
180
        export SNAKEMAKE_SINGULARITY_DIR="/cifs/khan/shared/containers/snakemake_containers"
181
        export HIPPUNFOLD_CACHE_DIR="/cifs/khan/shared/data/hippunfold_models"
182
183
3. HippUnfold might be executed using `poetry run hippunfold <arguments>` or through the `poetry shell` method. Refer to previous section for more information in regards to execution options. 
184
185
4. On the CBS server you should always set your output folder to a path inside `/localscratch`, and not your home folder or a `/srv` or `/cifs` path, and copy the final results out after they have finished computing. Please be aware that the CBS server may not be the most efficient option for running a large number of subjects (since you are limited in processing cores vs a HPC cluster).
186
187
5. If you are using input files in your home directory (or in your `graham` mount in your home directory), you may also need to also add the following to your bashrc file (Note: this will become a default system-enabled option soon)
188
189
        export SINGULARITY_BINDPATH="/home/ROBARTS:/home/ROBARTS"
190
191
## Deep learning nnU-net model files
192
193
The trained model files we use for hippunfold are large and thus are not
194
included directly in this github repository, and instead are downloaded
195
from Zenodo releases. 
196
197
### For HippUnfold versions earlier than 1.3.0 (< 1.3.0): 
198
If you are using the docker/singularity container, `docker://khanlab/hippunfold`, they are pre-downloaded there, in `/opt/hippunfold_cache`.
199
200
If you are not using this container, you will need to download the models before running hippunfold, by running:
201
202
    hippunfold_download_models
203
    
204
This console script (installed when you install hippunfold) downloads all the models to a cache dir on your system, 
205
which on Linux is typically `~/.cache/hippunfold`. To override this, you can set the `HIPPUNFOLD_CACHE_DIR` environment
206
variable before running `hippunfold_download_models` and `hippunfold`.
207
208
### NEW: For HippUnfold versions 1.3.0 and later (>= 1.3.0):
209
With the addition of new models, including all models in the container was not feasible and a change was made to 
210
**not include** any models in the docker/singularity containers. In these versions, the `hippunfold_download_models` command
211
is removed, and any models will simply be downloaded as part of the workflow. As before, all models will be stored in the system cache dir, 
212
which is typically `~/.cache/hippunfold`, and to override this can set the `HIPPUNFOLD_CACHE_DIR` environment variable before running `hippunfold`.
213
214
If you want to pre-download a model (e.g. if your compute nodes do not have internet access), you can run simply run `download_model` rule in HippUnfold e.g.:
215
216
```
217
hippunfold BIDS_DIR OUTPUT_DIR PARTICIPANT_LEVEL --modality T1w --until download_model -c 1
218
```
219
220
221
## Overriding Singularity cache directories
222
223
By default, singularity stores image caches in your home directory when you run `singularity pull` or `singularity run`. As described above, hippunfold also stores deep learning models in your home directory. If your home directory is full or otherwise inaccessible, you may want to change this with the following commands:
224
225
    export SINGULARITY_CACHEDIR=/YOURDIR/.cache/singularity
226
    export SINGULARITY_BINDPATH=/YOURDIR:/YOURDIR
227
    export HIPPUNFOLD_CACHE_DIR=/YOURDIR/.cache/hippunfold/
228
    
229
If you are running `hippunfold` with the `--use-singularity` option, hippunfold will download the required singularity containers for rules that require it. These containers are placed in the `.snakemake` folder in your hippunfold output directory, but this can be overriden with the Snakemake option: `--singularity-prefix DIRECTORY`
230
  
231