a b/yolov5/utils/loggers/wandb/README.md
1
📚 This guide explains how to use **Weights & Biases** (W&B) with YOLOv5 🚀. UPDATED 29 September 2021.
2
* [About Weights & Biases](#about-weights-&-biases)
3
* [First-Time Setup](#first-time-setup)
4
* [Viewing runs](#viewing-runs)
5
* [Disabling wandb](#disabling-wandb)
6
* [Advanced Usage: Dataset Versioning and Evaluation](#advanced-usage)
7
* [Reports: Share your work with the world!](#reports)
8
9
## About Weights & Biases
10
Think of [W&B](https://wandb.ai/site?utm_campaign=repo_yolo_wandbtutorial) like GitHub for machine learning models. With a few lines of code, save everything you need to debug, compare and reproduce your models — architecture, hyperparameters, git commits, model weights, GPU usage, and even datasets and predictions.
11
12
Used by top researchers including teams at OpenAI, Lyft, Github, and MILA, W&B is part of the new standard of best practices for machine learning. How W&B can help you optimize your machine learning workflows:
13
14
 * [Debug](https://wandb.ai/wandb/getting-started/reports/Visualize-Debug-Machine-Learning-Models--VmlldzoyNzY5MDk#Free-2) model performance in real time
15
 * [GPU usage](https://wandb.ai/wandb/getting-started/reports/Visualize-Debug-Machine-Learning-Models--VmlldzoyNzY5MDk#System-4) visualized automatically
16
 * [Custom charts](https://wandb.ai/wandb/customizable-charts/reports/Powerful-Custom-Charts-To-Debug-Model-Peformance--VmlldzoyNzY4ODI) for powerful, extensible visualization
17
 * [Share insights](https://wandb.ai/wandb/getting-started/reports/Visualize-Debug-Machine-Learning-Models--VmlldzoyNzY5MDk#Share-8) interactively with collaborators
18
 * [Optimize hyperparameters](https://docs.wandb.com/sweeps) efficiently
19
 * [Track](https://docs.wandb.com/artifacts) datasets, pipelines, and production models
20
21
## First-Time Setup
22
<details open>
23
 <summary> Toggle Details </summary>
24
When you first train, W&B will prompt you to create a new account and will generate an **API key** for you. If you are an existing user you can retrieve your key from https://wandb.ai/authorize. This key is used to tell W&B where to log your data. You only need to supply your key once, and then it is remembered on the same device.
25
26
W&B will create a cloud **project** (default is 'YOLOv5') for your training runs, and each new training run will be provided a unique run **name** within that project as project/name. You can also manually set your project and run name as:
27
28
 ```shell
29
 $ python train.py --project ... --name ...
30
 ```
31
32
YOLOv5 notebook example: <a href="https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> <a href="https://www.kaggle.com/ultralytics/yolov5"><img src="https://kaggle.com/static/images/open-in-kaggle.svg" alt="Open In Kaggle"></a>
33
<img width="960" alt="Screen Shot 2021-09-29 at 10 23 13 PM" src="https://user-images.githubusercontent.com/26833433/135392431-1ab7920a-c49d-450a-b0b0-0c86ec86100e.png">
34
35
36
 </details>
37
38
## Viewing Runs
39
<details open>
40
  <summary> Toggle Details </summary>
41
Run information streams from your environment to the W&B cloud console as you train. This allows you to monitor and even cancel runs in <b>realtime</b> . All important information is logged:
42
43
 * Training & Validation losses
44
 * Metrics: Precision, Recall, mAP@0.5, mAP@0.5:0.95
45
 * Learning Rate over time
46
 * A bounding box debugging panel, showing the training progress over time
47
 * GPU: Type, **GPU Utilization**, power, temperature, **CUDA memory usage**
48
 * System: Disk I/0, CPU utilization, RAM memory usage
49
 * Your trained model as W&B Artifact
50
 * Environment: OS and Python types, Git repository and state, **training command**
51
52
<p align="center"><img width="900" alt="Weights & Biases dashboard" src="https://user-images.githubusercontent.com/26833433/135390767-c28b050f-8455-4004-adb0-3b730386e2b2.png"></p>
53
</details>
54
55
 ## Disabling wandb
56
* training after running `wandb disabled` inside that directory creates no wandb run
57
![Screenshot (84)](https://user-images.githubusercontent.com/15766192/143441777-c780bdd7-7cb4-4404-9559-b4316030a985.png)
58
59
* To enable wandb again, run `wandb online`
60
![Screenshot (85)](https://user-images.githubusercontent.com/15766192/143441866-7191b2cb-22f0-4e0f-ae64-2dc47dc13078.png)
61
62
## Advanced Usage
63
You can leverage W&B artifacts and Tables integration to easily visualize and manage your datasets, models and training evaluations. Here are some quick examples to get you started.
64
<details open>
65
 <h3> 1: Train and Log Evaluation simultaneousy </h3>
66
   This is an extension of the previous section, but it'll also training after uploading the dataset. <b> This also evaluation Table</b>
67
   Evaluation table compares your predictions and ground truths across the validation set for each epoch. It uses the references to the already uploaded datasets,
68
   so no images will be uploaded from your system more than once.
69
 <details open>
70
  <summary> <b>Usage</b> </summary>
71
   <b>Code</b> <code> $ python train.py --upload_data val</code>
72
73
![Screenshot from 2021-11-21 17-40-06](https://user-images.githubusercontent.com/15766192/142761183-c1696d8c-3f38-45ab-991a-bb0dfd98ae7d.png)
74
 </details>
75
76
 <h3>2. Visualize and Version Datasets</h3>
77
 Log, visualize, dynamically query, and understand your data with <a href='https://docs.wandb.ai/guides/data-vis/tables'>W&B Tables</a>. You can use the following command to log your dataset as a W&B Table. This will generate a <code>{dataset}_wandb.yaml</code> file which can be used to train from dataset artifact.
78
 <details>
79
  <summary> <b>Usage</b> </summary>
80
   <b>Code</b> <code> $ python utils/logger/wandb/log_dataset.py --project ... --name ... --data .. </code>
81
82
 ![Screenshot (64)](https://user-images.githubusercontent.com/15766192/128486078-d8433890-98a3-4d12-8986-b6c0e3fc64b9.png)
83
 </details>
84
85
 <h3> 3: Train using dataset artifact </h3>
86
   When you upload a dataset as described in the first section, you get a new config file with an added `_wandb` to its name. This file contains the information that
87
   can be used to train a model directly from the dataset artifact. <b> This also logs evaluation </b>
88
 <details>
89
  <summary> <b>Usage</b> </summary>
90
   <b>Code</b> <code> $ python train.py --data {data}_wandb.yaml </code>
91
92
![Screenshot (72)](https://user-images.githubusercontent.com/15766192/128979739-4cf63aeb-a76f-483f-8861-1c0100b938a5.png)
93
 </details>
94
95
   <h3> 4: Save model checkpoints as artifacts </h3>
96
  To enable saving and versioning checkpoints of your experiment, pass `--save_period n` with the base cammand, where `n` represents checkpoint interval.
97
  You can also log both the dataset and model checkpoints simultaneously. If not passed, only the final model will be logged
98
99
 <details>
100
  <summary> <b>Usage</b> </summary>
101
   <b>Code</b> <code> $ python train.py --save_period 1 </code>
102
103
![Screenshot (68)](https://user-images.githubusercontent.com/15766192/128726138-ec6c1f60-639d-437d-b4ee-3acd9de47ef3.png)
104
 </details>
105
106
</details>
107
108
 <h3> 5: Resume runs from checkpoint artifacts. </h3>
109
Any run can be resumed using artifacts if the <code>--resume</code> argument starts with <code>wandb-artifact://</code> prefix followed by the run path, i.e, <code>wandb-artifact://username/project/runid </code>. This doesn't require the model checkpoint to be present on the local system.
110
111
 <details>
112
  <summary> <b>Usage</b> </summary>
113
   <b>Code</b> <code> $ python train.py --resume wandb-artifact://{run_path} </code>
114
115
![Screenshot (70)](https://user-images.githubusercontent.com/15766192/128728988-4e84b355-6c87-41ae-a591-14aecf45343e.png)
116
 </details>
117
118
  <h3> 6: Resume runs from dataset artifact & checkpoint artifacts. </h3>
119
 <b> Local dataset or model checkpoints are not required. This can be used to resume runs directly on a different device </b>
120
 The syntax is same as the previous section, but you'll need to lof both the dataset and model checkpoints as artifacts, i.e, set bot <code>--upload_dataset</code> or
121
 train from <code>_wandb.yaml</code> file and set <code>--save_period</code>
122
123
 <details>
124
  <summary> <b>Usage</b> </summary>
125
   <b>Code</b> <code> $ python train.py --resume wandb-artifact://{run_path} </code>
126
127
![Screenshot (70)](https://user-images.githubusercontent.com/15766192/128728988-4e84b355-6c87-41ae-a591-14aecf45343e.png)
128
 </details>
129
130
</details>
131
132
 <h3> Reports </h3>
133
W&B Reports can be created from your saved runs for sharing online. Once a report is created you will receive a link you can use to publically share your results. Here is an example report created from the COCO128 tutorial trainings of all four YOLOv5 models ([link](https://wandb.ai/glenn-jocher/yolov5_tutorial/reports/YOLOv5-COCO128-Tutorial-Results--VmlldzozMDI5OTY)).
134
135
<img width="900" alt="Weights & Biases Reports" src="https://user-images.githubusercontent.com/26833433/135394029-a17eaf86-c6c1-4b1d-bb80-b90e83aaffa7.png">
136
137
138
## Environments
139
140
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including [CUDA](https://developer.nvidia.com/cuda)/[CUDNN](https://developer.nvidia.com/cudnn), [Python](https://www.python.org/) and [PyTorch](https://pytorch.org/) preinstalled):
141
142
- **Google Colab and Kaggle** notebooks with free GPU: <a href="https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a> <a href="https://www.kaggle.com/ultralytics/yolov5"><img src="https://kaggle.com/static/images/open-in-kaggle.svg" alt="Open In Kaggle"></a>
143
- **Google Cloud** Deep Learning VM. See [GCP Quickstart Guide](https://github.com/ultralytics/yolov5/wiki/GCP-Quickstart)
144
- **Amazon** Deep Learning AMI. See [AWS Quickstart Guide](https://github.com/ultralytics/yolov5/wiki/AWS-Quickstart)
145
- **Docker Image**. See [Docker Quickstart Guide](https://github.com/ultralytics/yolov5/wiki/Docker-Quickstart) <a href="https://hub.docker.com/r/ultralytics/yolov5"><img src="https://img.shields.io/docker/pulls/ultralytics/yolov5?logo=docker" alt="Docker Pulls"></a>
146
147
148
## Status
149
150
![CI CPU testing](https://github.com/ultralytics/yolov5/workflows/CI%20CPU%20testing/badge.svg)
151
152
If this badge is green, all [YOLOv5 GitHub Actions](https://github.com/ultralytics/yolov5/actions) Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training ([train.py](https://github.com/ultralytics/yolov5/blob/master/train.py)), validation ([val.py](https://github.com/ultralytics/yolov5/blob/master/val.py)), inference ([detect.py](https://github.com/ultralytics/yolov5/blob/master/detect.py)) and export ([export.py](https://github.com/ultralytics/yolov5/blob/master/export.py)) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.