Diff of /docs/tutorials/tuning.md [000000] .. [cad161]

Switch to unified view

a b/docs/tutorials/tuning.md
1
# Hyperparameter Tuning
2
3
In this tutorial, we'll see how we can quickly tune hyperparameters of a deep learning model with EDS-NLP using the `edsnlp.tune` function.
4
5
Tuning refers to the process of optimizing the hyperparameters of a machine learning model to achieve the best performance. These hyperparameters include factors like learning rate, batch size, dropout rates, and model architecture parameters. Tuning is crucial because the right combination of hyperparameters can significantly improve model accuracy and efficiency, while poor choices can lead to overfitting, underfitting, or unnecessary computational costs. By systematically searching for the best hyperparameters, we ensure the model is both effective and efficient before the final training phase.
6
7
We strongly suggest you read the previous ["Training API tutorial"](./training.md) to understand how to train a deep learning model using a config file with EDS-NLP.
8
9
## 1. Creating a project
10
11
If you already have installed `edsnlp[ml]` and do not want to setup a project, you can skip to the [next section](#tuning-the-model).
12
13
Create a new project:
14
15
```{ .bash data-md-color-scheme="slate" }
16
mkdir my_ner_project
17
cd my_ner_project
18
19
touch README.md pyproject.toml
20
mkdir -p configs data/dataset
21
```
22
23
Add a standard `pyproject.toml` file with the following content. This
24
file will be used to manage the dependencies of the project and its versioning.
25
26
```{ .toml title="pyproject.toml"}
27
[project]
28
name = "my_ner_project"
29
version = "0.1.0"
30
description = ""
31
authors = [
32
    { name="Firstname Lastname", email="firstname.lastname@domain.com" }
33
]
34
readme = "README.md"
35
requires-python = ">3.7.1,<4.0"
36
37
dependencies = [
38
    "edsnlp[ml]>=0.16.0",
39
    "sentencepiece>=0.1.96",
40
    "optuna>=4.0.0",
41
    "plotly>=5.18.0",
42
    "ruamel.yaml>=0.18.0",
43
    "configobj>=5.0.9",
44
]
45
46
[project.optional-dependencies]
47
dev = [
48
    "dvc>=2.37.0; python_version >= '3.8'",
49
    "pandas>=1.1.0,<2.0.0; python_version < '3.8'",
50
    "pandas>=1.4.0,<2.0.0; python_version >= '3.8'",
51
    "pre-commit>=2.18.1",
52
    "accelerate>=0.21.0; python_version >= '3.8'",
53
    "rich-logger>=0.3.0"
54
]
55
```
56
57
We recommend using a virtual environment ("venv") to isolate the dependencies of your project and using [uv](https://docs.astral.sh/uv/) to install the dependencies:
58
59
```{ .bash data-md-color-scheme="slate" }
60
pip install uv
61
# skip the next two lines if you do not want a venv
62
uv venv .venv
63
source .venv/bin/activate
64
uv pip install -e ".[dev]" -p $(uv python find)
65
```
66
67
## 2. Tuning a model
68
69
### 2.1. Tuning Section in `config.yml` file
70
71
If you followed the ["Training API tutorial"](./training.md), you should already have a `configs/config.yml` file for training parameters.
72
73
To enable hyperparameter tuning, add the following `tuning` section to your `config.yml` file:
74
75
```{ .yaml title="configs/config.yml" }
76
tuning:
77
  # Output directory for tuning results.
78
  output_dir: 'results'
79
  # Checkpoint directory
80
  checkpoint_dir: 'checkpoint'
81
  # Number of gpu hours allowed for tuning.
82
  gpu_hours: 1.0
83
  # Number of fixed trials to tune hyperparameters (override gpu_hours).
84
  n_trials: 4
85
  # Enable two-phase tuning. In the first phase, the script will tune all hyperparameters.
86
  # In the second phase, it will focus only on the top 50% most important hyperparameters.
87
  two_phase_tuning: True
88
  # Metric used to evaluate trials.
89
  metric: "ner.micro.f"
90
  # Hyperparameters to tune.
91
  hyperparameters:
92
```
93
94
Let's detail the new parameters:
95
96
- `output_dir`: Directory where tuning results, visualizations, and best parameters will be saved.
97
- `checkpoint_dir`: Directory where the tuning checkpoint `study.pkl` will be saved each trial. This enables automatic resumption of tuning in case of a crash. To disable resumption, simply delete the `study.pkl` file.
98
- `gpu_hours`: Estimated total GPU time available for tuning, in hours. Given this time, the script will automatically compute for how many training trials we can tune hyperparameters. By default, `gpu_hours` is set to 1.
99
- `n_trials`: Number of training trials for tuning. If provided, it will override `gpu_hours` and tune the model for exactly `n_trial` trials.
100
- `two_phase_tuning`: If True, performs a two-phase tuning. In the first phase, all hyperparameters are tuned, and in the second phase, the top half (based on importance) are fine-tuned while freezing others. By default, `two_phase_tuning` is False.
101
- `metric`: Metric used to evaluate trials. It corresponds to a path in the scorer results (depending on the scorer used in the config). By default `metric` is set to "ner.micro.f".
102
- `hyperparameters`: The list of hyperparameters to tune and details about their tunings. We will discuss how it work in the following section.
103
104
### 2.2. Add hyperparameters to tune
105
106
In the `config.yml` file, the `tuning.hyperparameters` section defines the hyperparameters to optimize. Each hyperparameter can be specified with its type, range, and additional properties. To add a hyperparameter, follow this syntax:
107
108
```{ .yaml title="configs/config.yml" }
109
tuning:
110
  hyperparameters:
111
    # Hyperparameter path in `config.yml`.
112
    "nlp.components.ner.embedding.embedding.classifier_dropout":
113
      # Alias name. If not specified, full path will be the name.
114
      alias: "classifier_dropout"
115
      # Type of the hyperparameter: 'int', 'float', or 'categorical'.
116
      type: "float"
117
      # Lower bound for tuning.
118
      low: 0.
119
      # Upper bound for tuning.
120
      high: 0.3
121
      # Step for discretization (optional).
122
      step: 0.05
123
```
124
125
Since `edsnlp.tune` leverages the [Optuna](https://optuna.org/) framework, we recommend reviewing the following Optuna functions to understand the properties you can specify for hyperparameter sampling:
126
127
- [suggest_float](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html#optuna.trial.Trial.suggest_float) – For sampling floating-point hyperparameters.
128
- [suggest_int](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html#optuna.trial.Trial.suggest_int) – For sampling integer hyperparameters.
129
- [suggest_categorical](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html#optuna.trial.Trial.suggest_categorical) – For sampling categorical hyperparameters.
130
131
These resources provide detailed guidance on defining the sampling ranges, distributions, and additional properties for each type of hyperparameter.
132
133
134
### 2.3. Complete Example
135
136
Now, let's look at a complete example. Assume that we want to perform a two-phase tuning, for 40 gpu hours, on the following hyperparameters:
137
138
- `hidden_dropout_prob`: Dropout probability for hidden layers.
139
- `attention_dropout_prob`: Dropout probability for attention layers.
140
- `classifier_dropout`: Dropout probability for the classifier layer.
141
- `transformer_start_value`: Learning rate start value for the transformer.
142
- `transformer_max_value`: Maximum learning rate for the transformer.
143
- `transformer_warmup_rate`: Warmup rate for the transformer learning rate scheduler.
144
- `transformer_weight_decay`: Weight decay for the transformer optimizer.
145
- `other_start_value`: Learning rate start value for other components.
146
- `other_max_value`: Maximum learning rate for other components.
147
- `other_warmup_rate`: Warmup rate for the learning rate scheduler of other components.
148
- `other_weight_decay`: Weight decay for the optimizer of other components.
149
150
Then the full `config.yml` will be:
151
152
```{ .yaml title="configs/config.yml" }
153
vars:
154
  train: './data/dataset/train'
155
  dev: './data/dataset/test'
156
157
# 🤖 PIPELINE DEFINITION
158
nlp:
159
  '@core': pipeline
160
  lang: eds  # Word-level tokenization: use the "eds" tokenizer
161
  components:
162
    ner:
163
      '@factory': eds.ner_crf
164
      mode: 'joint'
165
      target_span_getter: 'gold_spans'
166
      span_setter: [ "ents", "*" ]
167
      infer_span_setter: true
168
      embedding:
169
        '@factory': eds.text_cnn
170
        kernel_sizes: [ 3 ]
171
        embedding:
172
          '@factory': eds.transformer
173
          model: prajjwal1/bert-tiny
174
          ignore_mismatched_sizes: True
175
          window: 128
176
          stride: 96
177
          # Dropout parameters passed to the underlying transformer object.
178
          hidden_dropout_prob: 0.1
179
          attention_probs_dropout_prob: 0.1
180
          classifier_dropout: 0.1
181
182
# 📈 SCORERS
183
scorer:
184
  ner:
185
    '@metrics': eds.ner_token
186
    span_getter: ${ nlp.components.ner.target_span_getter }
187
188
# 🎛️ OPTIMIZER
189
optimizer:
190
  "@core": optimizer
191
  optim: adamw
192
  groups:
193
    "^transformer":
194
      weight_decay: 1e-3
195
      lr:
196
        '@schedules': linear
197
        "warmup_rate": 0.1
198
        "start_value": 1e-5
199
        "max_value": 8e-5
200
    ".*":
201
      weight_decay: 1e-3
202
      lr:
203
        '@schedules': linear
204
        "warmup_rate": 0.1
205
        "start_value": 1e-5
206
        "max_value": 8e-5
207
  module: ${ nlp }
208
  total_steps: ${ train.max_steps }
209
210
# 📚 DATA
211
train_data:
212
  - data:
213
      '@readers': standoff
214
      path: ${ vars.train }
215
      converter:
216
        - '@factory': eds.standoff_dict2doc
217
          span_setter: 'gold_spans'
218
        - '@factory': eds.split
219
          nlp: null
220
          max_length: 256
221
          regex: '\n\n+'
222
    shuffle: dataset
223
    batch_size: 32 * 128 tokens
224
    pipe_names: [ "ner" ]
225
226
val_data:
227
  '@readers': standoff
228
  path: ${ vars.dev }
229
  converter:
230
    - '@factory': eds.standoff_dict2doc
231
      span_setter: 'gold_spans'
232
233
# 🚀 TRAIN SCRIPT OPTIONS
234
# -> python -m edsnlp.train --config configs/config.yml
235
train:
236
  nlp: ${ nlp }
237
  logger: True
238
  output_dir: 'artifacts'
239
  train_data: ${ train_data }
240
  val_data: ${ val_data }
241
  max_steps: 400
242
  validation_interval: ${ train.max_steps//2 }
243
  grad_max_norm: 1.0
244
  scorer: ${ scorer }
245
  optimizer: ${ optimizer }
246
  num_workers: 2
247
248
# 📦 PACKAGE SCRIPT OPTIONS
249
# -> python -m edsnlp.package --config configs/config.yml
250
package:
251
  pipeline: ${ train.output_dir }
252
  name: 'my_ner_model'
253
254
# ⚙️ TUNE SCRIPT OPTIONS
255
# -> python -m edsnlp.tune --config configs/config.yml
256
tuning:
257
  output_dir: 'results'
258
  checkpoint_dir: 'checkpoint'
259
  gpu_hours: 40.0
260
  two_phase_tuning: True
261
  metric: "ner.micro.f"
262
  hyperparameters:
263
    "nlp.components.ner.embedding.embedding.hidden_dropout_prob":
264
      alias: "hidden_dropout"
265
      type: "float"
266
      low: 0.
267
      high: 0.3
268
      step: 0.05
269
    "nlp.components.ner.embedding.embedding.attention_probs_dropout_prob":
270
      alias: "attention_dropout"
271
      type: "float"
272
      low: 0.
273
      high: 0.3
274
      step: 0.05
275
    "nlp.components.ner.embedding.embedding.classifier_dropout":
276
      alias: "classifier_dropout"
277
      type: "float"
278
      low: 0.
279
      high: 0.3
280
      step: 0.05
281
    "optimizer.groups.^transformer.lr.start_value":
282
      alias: "transformer_start_value"
283
      type: "float"
284
      low: 1e-6
285
      high: 1e-3
286
      log: True
287
    "optimizer.groups.^transformer.lr.max_value":
288
      alias: "transformer_max_value"
289
      type: "float"
290
      low: 1e-6
291
      high: 1e-3
292
      log: True
293
    "optimizer.groups.^transformer.lr.warmup_rate":
294
      alias: "transformer_warmup_rate"
295
      type: "float"
296
      low: 0.
297
      high: 0.3
298
      step: 0.05
299
    "optimizer.groups.^transformer.weight_decay":
300
      alias: "transformer_weight_decay"
301
      type: "float"
302
      low: 1e-4
303
      high: 1e-2
304
      log: True
305
    "optimizer.groups.'.*'.lr.warmup_rate":
306
      alias: "other_warmup_rate"
307
      type: "float"
308
      low: 0.
309
      high: 0.3
310
      step: 0.05
311
    "optimizer.groups.'.*'.lr.start_value":
312
      alias: "other_start_value"
313
      type: "float"
314
      low: 1e-6
315
      high: 1e-3
316
      log: True
317
    "optimizer.groups.'.*'.lr.max_value":
318
      alias: "other_max_value"
319
      type: "float"
320
      low: 1e-6
321
      high: 1e-3
322
      log: True
323
    "optimizer.groups.'.*'.weight_decay":
324
      alias: "other_weight_decay"
325
      type: "float"
326
      low: 1e-4
327
      high: 1e-2
328
      log: True
329
```
330
331
Finally, to lauch the tuning process, use the following command:
332
333
```{ .bash data-md-color-scheme="slate" }
334
python -m edsnlp.tune --config configs/config.yml --seed 42
335
```
336
337
## 3. Results
338
339
At the end of the tuning process, `edsnlp.tune` generates various results and saves them in the `output_dir` specified in the `config.yml` file:
340
341
- **Tuning Summary**: `result_summary.txt`, a summary file containing details about the best training trial, the best overall metric, the optimal hyperparameter values, and the average importance of each hyperparameter across all trials.
342
- **Optimal Configuration**: `config.yml`, containing the best hyperparameter values.
343
- **Graphs and Visualizations**: Various graphics illustrating the tuning process, such as:
344
  - [**Optimization History plot**](https://optuna.readthedocs.io/en/stable/reference/visualization/generated/optuna.visualization.plot_optimization_history.html#sphx-glr-reference-visualization-generated-optuna-visualization-plot-optimization-history-py): A line graph showing the performance of each trial over time, illustrating the optimization process and how the model's performance improves with each iteration.
345
  - [**Empirical Distribution Function (EDF) plot**](https://optuna.readthedocs.io/en/stable/reference/visualization/generated/optuna.visualization.plot_edf.html#sphx-glr-reference-visualization-generated-optuna-visualization-plot-edf-py): A graph showing the cumulative distribution of the results, helping you understand the distribution of performance scores and providing insights into the variability and robustness of the tuning process.
346
  - [**Contour plot**](https://optuna.readthedocs.io/en/stable/reference/visualization/generated/optuna.visualization.plot_contour.html#sphx-glr-reference-visualization-generated-optuna-visualization-plot-contour-py): A 2D plot that shows the relationship between two hyperparameters and their combined effect on the objective metric, providing a clear view of the optimal parameter regions.
347
  - [**Parallel Coordinate plot**](https://optuna.readthedocs.io/en/stable/reference/visualization/generated/optuna.visualization.plot_parallel_coordinate.html#sphx-glr-reference-visualization-generated-optuna-visualization-plot-parallel-coordinate-py): A multi-dimensional plot where each hyperparameter is represented as a vertical axis, and each trial is displayed as a line connecting the hyperparameter values, helping you analyze correlations and patterns across hyperparameters and their impact on performance.
348
  - [**Timeline plot**](https://optuna.readthedocs.io/en/stable/reference/visualization/generated/optuna.visualization.plot_timeline.html#sphx-glr-reference-visualization-generated-optuna-visualization-plot-timeline-py): A 2D plot that displays all trials and their statuses ("completed," "pruned," or "failed") over time, providing a clear overview of the progress and outcomes of the tuning process.
349
350
These outputs offer a comprehensive view of the tuning results, enabling you to better understand the optimization process and easily deploy the best configuration.
351
352
**Note**: If you enabled two-phase tuning, the `output_dir` will contain two subdirectories, `phase_1` and `phase_2`, each with their own result files as described earlier. This separation allows you to analyze the results from each phase individually.
353
354
## 4. Final Training
355
356
Now that the hyperparameters have been tuned, you can update your final `config.yml` with the best-performing hyperparameters and proceed to launch the final training using the ["Training API"](./training.md).