ultralytics 8.0.97 confusion matrix, windows, docs updates (#2511)

Co-authored-by: Yonghye Kwon <developer.0hye@gmail.com>
Co-authored-by: Dowon <ks2515@naver.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Laughing <61612323+Laughing-q@users.noreply.github.com>
This commit is contained in:
Glenn Jocher 2023-05-09 21:20:34 +02:00 committed by GitHub
parent 6ee3a9a74b
commit d1107ca4cb
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
138 changed files with 744 additions and 351 deletions

View file

@ -1,10 +1,12 @@
---
comments: true
description: 'Ultralytics YOLOv5 Docs: Learn model structure, data augmentation &amp; training strategies. Build targets and the losses of object detection.'
---
## 1. Model Structure
YOLOv5 (v6.0/6.1) consists of:
- **Backbone**: `New CSP-Darknet53`
- **Neck**: `SPPF`, `New CSP-PAN`
- **Head**: `YOLOv3 Head`
@ -13,10 +15,9 @@ Model structure (`yolov5l.yaml`):
![yolov5](https://user-images.githubusercontent.com/31005897/172404576-c260dcf9-76bb-4bc8-b6a9-f2d987792583.png)
Some minor changes compared to previous versions:
1. Replace the `Focus` structure with `6x6 Conv2d`(more efficient, refer #4825)
1. Replace the `Focus` structure with `6x6 Conv2d`(more efficient, refer #4825)
2. Replace the `SPP` structure with `SPPF`(more than double the speed)
<details markdown>
@ -79,6 +80,7 @@ if __name__ == '__main__':
```
result:
```
True
spp time: 0.5373051166534424
@ -87,30 +89,26 @@ sppf time: 0.20780706405639648
</details>
## 2. Data Augmentation
- Mosaic
<img src="https://user-images.githubusercontent.com/31005897/159109235-c7aad8f2-1d4f-41f9-8d5f-b2fde6f2885e.png#pic_center" width=80%>
<img src="https://user-images.githubusercontent.com/31005897/159109235-c7aad8f2-1d4f-41f9-8d5f-b2fde6f2885e.png#pic_center" width=80%>
- Copy paste
<img src="https://user-images.githubusercontent.com/31005897/159116277-91b45033-6bec-4f82-afc4-41138866628e.png#pic_center" width=80%>
<img src="https://user-images.githubusercontent.com/31005897/159116277-91b45033-6bec-4f82-afc4-41138866628e.png#pic_center" width=80%>
- Random affine(Rotation, Scale, Translation and Shear)
<img src="https://user-images.githubusercontent.com/31005897/159109326-45cd5acb-14fa-43e7-9235-0f21b0021c7d.png#pic_center" width=80%>
<img src="https://user-images.githubusercontent.com/31005897/159109326-45cd5acb-14fa-43e7-9235-0f21b0021c7d.png#pic_center" width=80%>
- MixUp
<img src="https://user-images.githubusercontent.com/31005897/159109361-3b24333b-f481-478b-ae00-df7838f0b5cd.png#pic_center" width=80%>
<img src="https://user-images.githubusercontent.com/31005897/159109361-3b24333b-f481-478b-ae00-df7838f0b5cd.png#pic_center" width=80%>
- Albumentations
- Augment HSV(Hue, Saturation, Value)
<img src="https://user-images.githubusercontent.com/31005897/159109407-83d100ba-1aba-4f4b-aa03-4f048f815981.png#pic_center" width=80%>
<img src="https://user-images.githubusercontent.com/31005897/159109407-83d100ba-1aba-4f4b-aa03-4f048f815981.png#pic_center" width=80%>
- Random horizontal flip
<img src="https://user-images.githubusercontent.com/31005897/159109429-0d44619a-a76a-49eb-bfc0-6709860c043e.png#pic_center" width=80%>
<img src="https://user-images.githubusercontent.com/31005897/159109429-0d44619a-a76a-49eb-bfc0-6709860c043e.png#pic_center" width=80%>
## 3. Training Strategies
@ -121,13 +119,11 @@ sppf time: 0.20780706405639648
- Mixed precision
- Evolve hyper-parameters
## 4. Others
### 4.1 Compute Losses
The YOLOv5 loss consists of three parts:
The YOLOv5 loss consists of three parts:
- Classes loss(BCE loss)
- Objectness loss(BCE loss)
@ -136,12 +132,14 @@ The YOLOv5 loss consists of three parts:
![loss](https://latex.codecogs.com/svg.image?Loss=\lambda_1L_{cls}+\lambda_2L_{obj}+\lambda_3L_{loc})
### 4.2 Balance Losses
The objectness losses of the three prediction layers(`P3`, `P4`, `P5`) are weighted differently. The balance weights are `[4.0, 1.0, 0.4]` respectively.
![obj_loss](https://latex.codecogs.com/svg.image?L_{obj}=4.0\cdot&space;L_{obj}^{small}+1.0\cdot&space;L_{obj}^{medium}+0.4\cdot&space;L_{obj}^{large})
### 4.3 Eliminate Grid Sensitivity
In YOLOv2 and YOLOv3, the formula for calculating the predicted target information is:
In YOLOv2 and YOLOv3, the formula for calculating the predicted target information is:
![b_x](https://latex.codecogs.com/svg.image?b_x=\sigma(t_x)+c_x)
![b_y](https://latex.codecogs.com/svg.image?b_y=\sigma(t_y)+c_y)
@ -152,12 +150,12 @@ In YOLOv2 and YOLOv3, the formula for calculating the predicted target informati
In YOLOv5, the formula is:
In YOLOv5, the formula is:
![bx](https://latex.codecogs.com/svg.image?b_x=(2\cdot\sigma(t_x)-0.5)+c_x)
![by](https://latex.codecogs.com/svg.image?b_y=(2\cdot\sigma(t_y)-0.5)+c_y)
![bw](https://latex.codecogs.com/svg.image?b_w=p_w\cdot(2\cdot\sigma(t_w))^2)
![bh](https://latex.codecogs.com/svg.image?b_h=p_h\cdot(2\cdot\sigma(t_h))^2)
![bh](https://latex.codecogs.com/svg.image?b_h=p_h\cdot(2\cdot\sigma(t_h))^2)
Compare the center point offset before and after scaling. The center point offset range is adjusted from (0, 1) to (-0.5, 1.5).
Therefore, offset can easily get 0 or 1.
@ -168,8 +166,8 @@ Compare the height and width scaling ratio(relative to anchor) before and after
<img src="https://user-images.githubusercontent.com/31005897/158508089-5ac0c7a3-6358-44b7-863e-a6e45babb842.png#pic_center" width=40%>
### 4.4 Build Targets
Match positive samples:
- Calculate the aspect ratio of GT and Anchor Templates
@ -194,4 +192,4 @@ Match positive samples:
- Because the center point offset range is adjusted from (0, 1) to (-0.5, 1.5). GT Box can be assigned to more anchors.
<img src="https://user-images.githubusercontent.com/31005897/158508139-9db4e8c2-cf96-47e0-bc80-35d11512f296.png#pic_center" width=70%>
<img src="https://user-images.githubusercontent.com/31005897/158508139-9db4e8c2-cf96-47e0-bc80-35d11512f296.png#pic_center" width=70%>