ultralytics 8.0.97 confusion matrix, windows, docs updates (#2511)
Co-authored-by: Yonghye Kwon <developer.0hye@gmail.com> Co-authored-by: Dowon <ks2515@naver.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Laughing <61612323+Laughing-q@users.noreply.github.com>
This commit is contained in:
parent
6ee3a9a74b
commit
d1107ca4cb
138 changed files with 744 additions and 351 deletions
|
|
@ -1,5 +1,6 @@
|
|||
---
|
||||
comments: true
|
||||
description: Learn how to train your dataset on single or multiple machines using YOLOv5 on multiple GPUs. Use simple commands with DDP mode for faster performance.
|
||||
---
|
||||
|
||||
📚 This guide explains how to properly use **multiple** GPUs to train a dataset with YOLOv5 🚀 on single or multiple machine(s).
|
||||
|
|
@ -21,11 +22,10 @@ pip install -r requirements.txt # install
|
|||
|
||||
## Training
|
||||
|
||||
Select a pretrained model to start training from. Here we select [YOLOv5s](https://github.com/ultralytics/yolov5/blob/master/models/yolov5s.yaml), the smallest and fastest model available. See our README [table](https://github.com/ultralytics/yolov5#pretrained-checkpoints) for a full comparison of all models. We will train this model with Multi-GPU on the [COCO](https://github.com/ultralytics/yolov5/blob/master/data/scripts/get_coco.sh) dataset.
|
||||
Select a pretrained model to start training from. Here we select [YOLOv5s](https://github.com/ultralytics/yolov5/blob/master/models/yolov5s.yaml), the smallest and fastest model available. See our README [table](https://github.com/ultralytics/yolov5#pretrained-checkpoints) for a full comparison of all models. We will train this model with Multi-GPU on the [COCO](https://github.com/ultralytics/yolov5/blob/master/data/scripts/get_coco.sh) dataset.
|
||||
|
||||
<p align="center"><img width="700" alt="YOLOv5 Models" src="https://github.com/ultralytics/yolov5/releases/download/v1.0/model_comparison.png"></p>
|
||||
|
||||
|
||||
### Single GPU
|
||||
|
||||
```bash
|
||||
|
|
@ -35,6 +35,7 @@ python train.py --batch 64 --data coco.yaml --weights yolov5s.pt --device 0
|
|||
### Multi-GPU [DataParallel](https://pytorch.org/docs/stable/nn.html#torch.nn.DataParallel) Mode (⚠️ not recommended)
|
||||
|
||||
You can increase the `device` to use Multiple GPUs in DataParallel mode.
|
||||
|
||||
```bash
|
||||
python train.py --batch 64 --data coco.yaml --weights yolov5s.pt --device 0,1
|
||||
```
|
||||
|
|
@ -68,21 +69,22 @@ python -m torch.distributed.run --nproc_per_node 2 train.py --batch 64 --data co
|
|||
<details markdown>
|
||||
<summary>Use SyncBatchNorm (click to expand)</summary>
|
||||
|
||||
[SyncBatchNorm](https://pytorch.org/docs/master/generated/torch.nn.SyncBatchNorm.html) could increase accuracy for multiple gpu training, however, it will slow down training by a significant factor. It is **only** available for Multiple GPU DistributedDataParallel training.
|
||||
[SyncBatchNorm](https://pytorch.org/docs/master/generated/torch.nn.SyncBatchNorm.html) could increase accuracy for multiple gpu training, however, it will slow down training by a significant factor. It is **only** available for Multiple GPU DistributedDataParallel training.
|
||||
|
||||
It is best used when the batch-size on **each** GPU is small (<= 8).
|
||||
|
||||
To use SyncBatchNorm, simple pass `--sync-bn` to the command like below,
|
||||
To use SyncBatchNorm, simple pass `--sync-bn` to the command like below,
|
||||
|
||||
```bash
|
||||
python -m torch.distributed.run --nproc_per_node 2 train.py --batch 64 --data coco.yaml --cfg yolov5s.yaml --weights '' --sync-bn
|
||||
```
|
||||
|
||||
</details>
|
||||
|
||||
<details markdown>
|
||||
<summary>Use Multiple machines (click to expand)</summary>
|
||||
|
||||
This is **only** available for Multiple GPU DistributedDataParallel training.
|
||||
This is **only** available for Multiple GPU DistributedDataParallel training.
|
||||
|
||||
Before we continue, make sure the files on all machines are the same, dataset, codebase, etc. Afterwards, make sure the machines can communicate to each other.
|
||||
|
||||
|
|
@ -94,18 +96,19 @@ To use it, you can do as the following,
|
|||
# On master machine 0
|
||||
python -m torch.distributed.run --nproc_per_node G --nnodes N --node_rank 0 --master_addr "192.168.1.1" --master_port 1234 train.py --batch 64 --data coco.yaml --cfg yolov5s.yaml --weights ''
|
||||
```
|
||||
|
||||
```bash
|
||||
# On machine R
|
||||
python -m torch.distributed.run --nproc_per_node G --nnodes N --node_rank R --master_addr "192.168.1.1" --master_port 1234 train.py --batch 64 --data coco.yaml --cfg yolov5s.yaml --weights ''
|
||||
```
|
||||
where `G` is number of GPU per machine, `N` is the number of machines, and `R` is the machine number from `0...(N-1)`.
|
||||
|
||||
where `G` is number of GPU per machine, `N` is the number of machines, and `R` is the machine number from `0...(N-1)`.
|
||||
Let's say I have two machines with two GPUs each, it would be `G = 2` , `N = 2`, and `R = 1` for the above.
|
||||
|
||||
Training will not start until <b>all </b> `N` machines are connected. Output will only be shown on master machine!
|
||||
|
||||
</details>
|
||||
|
||||
|
||||
### Notes
|
||||
|
||||
- Windows support is untested, Linux is recommended.
|
||||
|
|
@ -167,7 +170,6 @@ If you went through all the above, feel free to raise an Issue by giving as much
|
|||
|
||||
</details>
|
||||
|
||||
|
||||
## Environments
|
||||
|
||||
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including [CUDA](https://developer.nvidia.com/cuda)/[CUDNN](https://developer.nvidia.com/cudnn), [Python](https://www.python.org/) and [PyTorch](https://pytorch.org/) preinstalled):
|
||||
|
|
@ -177,14 +179,12 @@ YOLOv5 may be run in any of the following up-to-date verified environments (with
|
|||
- **Amazon** Deep Learning AMI. See [AWS Quickstart Guide](https://docs.ultralytics.com/yolov5/environments/aws_quickstart_tutorial/)
|
||||
- **Docker Image**. See [Docker Quickstart Guide](https://docs.ultralytics.com/yolov5/environments/docker_image_quickstart_tutorial/) <a href="https://hub.docker.com/r/ultralytics/yolov5"><img src="https://img.shields.io/docker/pulls/ultralytics/yolov5?logo=docker" alt="Docker Pulls"></a>
|
||||
|
||||
|
||||
## Status
|
||||
|
||||
<a href="https://github.com/ultralytics/yolov5/actions/workflows/ci-testing.yml"><img src="https://github.com/ultralytics/yolov5/actions/workflows/ci-testing.yml/badge.svg" alt="YOLOv5 CI"></a>
|
||||
|
||||
If this badge is green, all [YOLOv5 GitHub Actions](https://github.com/ultralytics/yolov5/actions) Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 [training](https://github.com/ultralytics/yolov5/blob/master/train.py), [validation](https://github.com/ultralytics/yolov5/blob/master/val.py), [inference](https://github.com/ultralytics/yolov5/blob/master/detect.py), [export](https://github.com/ultralytics/yolov5/blob/master/export.py) and [benchmarks](https://github.com/ultralytics/yolov5/blob/master/benchmarks.py) on macOS, Windows, and Ubuntu every 24 hours and on every commit.
|
||||
|
||||
|
||||
## Credits
|
||||
|
||||
I would like to thank @MagicFrogSJTU, who did all the heavy lifting, and @glenn-jocher for guiding us along the way.
|
||||
Loading…
Add table
Add a link
Reference in a new issue