Add FAQ sections to Modes and Tasks (#14181)
Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com> Co-authored-by: UltralyticsAssistant <web@ultralytics.com> Co-authored-by: Abirami Vina <abirami.vina@gmail.com> Co-authored-by: RizwanMunawar <chr043416@gmail.com> Co-authored-by: Muhammad Rizwan Munawar <muhammadrizwanmunawar123@gmail.com>
This commit is contained in:
parent
e285d3d1b2
commit
6c13bea7b8
39 changed files with 2247 additions and 481 deletions
|
|
@ -241,57 +241,68 @@ The original FastSAM paper can be found on [arXiv](https://arxiv.org/abs/2306.12
|
|||
|
||||
## FAQ
|
||||
|
||||
### What is FastSAM and how does it work?
|
||||
### What is FastSAM and how does it differ from SAM?
|
||||
|
||||
FastSAM, or Fast Segment Anything Model, is a real-time CNN-based solution designed to segment any object within an image. It decouples the segmentation task into two stages: all-instance segmentation and prompt-guided selection. The first stage uses [YOLOv8-seg](../tasks/segment.md) to produce segmentation masks for all instances in the image. The second stage outputs the region-of-interest based on user prompts. This approach significantly reduces computational demands while maintaining competitive performance, making it ideal for various vision tasks.
|
||||
FastSAM, short for Fast Segment Anything Model, is a real-time convolutional neural network (CNN)-based solution designed to reduce computational demands while maintaining high performance in object segmentation tasks. Unlike the Segment Anything Model (SAM), which uses a heavier Transformer-based architecture, FastSAM leverages [Ultralytics YOLOv8-seg](../tasks/segment.md) for efficient instance segmentation in two stages: all-instance segmentation followed by prompt-guided selection.
|
||||
|
||||
### How does FastSAM compare to the Segment Anything Model (SAM)?
|
||||
### How does FastSAM achieve real-time segmentation performance?
|
||||
|
||||
FastSAM addresses the limitations of SAM, which is a heavy Transformer model requiring substantial computational resources. FastSAM offers similar performance with significantly reduced computational demands by leveraging CNNs for real-time segmentation. It achieves competitive results on benchmarks like MS COCO with faster inference speeds using a single NVIDIA RTX 3090. This makes FastSAM a more efficient and practical solution for real-time industrial applications.
|
||||
FastSAM achieves real-time segmentation by decoupling the segmentation task into all-instance segmentation with YOLOv8-seg and prompt-guided selection stages. By utilizing the computational efficiency of CNNs, FastSAM offers significant reductions in computational and resource demands while maintaining competitive performance. This dual-stage approach enables FastSAM to deliver fast and efficient segmentation suitable for applications requiring quick results.
|
||||
|
||||
### Can I use FastSAM for real-time segmentation and what are its practical applications?
|
||||
### What are the practical applications of FastSAM?
|
||||
|
||||
Yes, FastSAM is designed for real-time segmentation tasks. Its efficiency and reduced computational demands make it suitable for various practical applications, including:
|
||||
FastSAM is practical for a variety of computer vision tasks that require real-time segmentation performance. Applications include:
|
||||
|
||||
- Industrial automation where quick segmentation results are necessary.
|
||||
- Real-time tracking in video streams ([tracking mode](../modes/track.md)).
|
||||
- Real-time object detection and segmentation in autonomous systems.
|
||||
- Security and surveillance systems requiring prompt object segmentation.
|
||||
- Industrial automation for quality control and assurance
|
||||
- Real-time video analysis for security and surveillance
|
||||
- Autonomous vehicles for object detection and segmentation
|
||||
- Medical imaging for precise and quick segmentation tasks
|
||||
|
||||
### How do I use FastSAM for inference in Python?
|
||||
Its ability to handle various user interaction prompts makes FastSAM adaptable and flexible for diverse scenarios.
|
||||
|
||||
You can easily integrate FastSAM into your Python applications for inference. Here's an example:
|
||||
### How do I use the FastSAM model for inference in Python?
|
||||
|
||||
To use FastSAM for inference in Python, you can follow the example below:
|
||||
|
||||
```python
|
||||
from ultralytics import FastSAM
|
||||
from ultralytics.models.fastsam import FastSAMPrompt
|
||||
|
||||
# Define an inference source
|
||||
source = "path/to/image.jpg"
|
||||
source = "path/to/bus.jpg"
|
||||
|
||||
# Create a FastSAM model
|
||||
model = FastSAM("FastSAM-s.pt") # or FastSAM-x.pt
|
||||
|
||||
# Run inference on an image
|
||||
results = model(source, device="cpu", retina_masks=True, imgsz=1024, conf=0.4, iou=0.9)
|
||||
everything_results = model(source, device="cpu", retina_masks=True, imgsz=1024, conf=0.4, iou=0.9)
|
||||
|
||||
# Process the prompts
|
||||
prompt_process = FastSAMPrompt(source, results, device="cpu")
|
||||
annotations = prompt_process.everything_prompt()
|
||||
prompt_process.plot(annotations=annotations, output="./")
|
||||
# Prepare a Prompt Process object
|
||||
prompt_process = FastSAMPrompt(source, everything_results, device="cpu")
|
||||
|
||||
# Everything prompt
|
||||
ann = prompt_process.everything_prompt()
|
||||
|
||||
# Bounding box prompt
|
||||
ann = prompt_process.box_prompt(bbox=[200, 200, 300, 300])
|
||||
|
||||
# Text prompt
|
||||
ann = prompt_process.text_prompt(text="a photo of a dog")
|
||||
|
||||
# Point prompt
|
||||
ann = prompt_process.point_prompt(points=[[200, 200]], pointlabel=[1])
|
||||
prompt_process.plot(annotations=ann, output="./")
|
||||
```
|
||||
|
||||
This snippet demonstrates the simplicity of loading a pre-trained model and running predictions. For more details, refer to the [predict mode](../modes/predict.md).
|
||||
For more details on inference methods, check the [Predict Usage](#predict-usage) section of the documentation.
|
||||
|
||||
### What are the key features of FastSAM?
|
||||
### What types of prompts does FastSAM support for segmentation tasks?
|
||||
|
||||
FastSAM offers several key features:
|
||||
FastSAM supports multiple prompt types for guiding the segmentation tasks:
|
||||
|
||||
1. **Real-time solution**: Leveraging CNNs for immediate results.
|
||||
2. **Efficiency and performance**: Comparable to SAM but with reduced computational resources.
|
||||
3. **Prompt-guided segmentation**: Flexibility to segment objects based on various user interactions.
|
||||
4. **Based on YOLOv8-seg**: Utilizes YOLOv8's capabilities for instance segmentation.
|
||||
5. **Benchmark performance**: High scores on MS COCO with faster inference speeds.
|
||||
6. **Model compression feasibility**: Demonstrates significant reduction in computational effort while maintaining performance.
|
||||
- **Everything Prompt**: Generates segmentation for all visible objects.
|
||||
- **Bounding Box (BBox) Prompt**: Segments objects within a specified bounding box.
|
||||
- **Text Prompt**: Uses a descriptive text to segment objects matching the description.
|
||||
- **Point Prompt**: Segments objects near specific user-defined points.
|
||||
|
||||
These features make FastSAM a powerful tool for a wide array of vision tasks. For a comprehensive list of features, visit the [FastSAM overview](#overview).
|
||||
This flexibility allows FastSAM to adapt to a wide range of user interaction scenarios, enhancing its utility across different applications. For more information on using these prompts, refer to the [Key Features](#key-features) section.
|
||||
|
|
|
|||
|
|
@ -98,52 +98,44 @@ For detailed steps, consult our [Contributing Guide](../help/contributing.md).
|
|||
|
||||
## FAQ
|
||||
|
||||
### What types of tasks can Ultralytics YOLO models handle?
|
||||
### What are the key advantages of using Ultralytics YOLOv8 for object detection?
|
||||
|
||||
Ultralytics YOLO models support a range of tasks including [object detection](../tasks/detect.md), [instance segmentation](../tasks/segment.md), [image classification](../tasks/classify.md), [pose estimation](../tasks/pose.md), and [multi-object tracking](../modes/track.md). These models are designed to achieve high performance in different computer vision applications, making them versatile tools for various project needs.
|
||||
Ultralytics YOLOv8 offers enhanced capabilities such as real-time object detection, instance segmentation, pose estimation, and classification. Its optimized architecture ensures high-speed performance without sacrificing accuracy, making it ideal for a variety of applications. YOLOv8 also includes built-in compatibility with popular datasets and models, as detailed on the [YOLOv8 documentation page](../models/yolov8.md).
|
||||
|
||||
### How do I train a YOLOv8 model for object detection?
|
||||
### How can I train a YOLOv8 model on custom data?
|
||||
|
||||
To train a YOLOv8 model for object detection, you can either use the Python API or the Command Line Interface (CLI). Below is an example using Python:
|
||||
Training a YOLOv8 model on custom data can be easily accomplished using Ultralytics' libraries. Here's a quick example:
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
!!! Example
|
||||
|
||||
# Load a COCO-pretrained YOLOv8n model
|
||||
model = YOLO("yolov8n.pt")
|
||||
=== "Python"
|
||||
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
|
||||
```
|
||||
# Load a YOLOv8n model
|
||||
model = YOLO("yolov8n.pt")
|
||||
|
||||
# Train the model on custom dataset
|
||||
results = model.train(data="custom_data.yaml", epochs=100, imgsz=640)
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
```bash
|
||||
yolo train model=yolov8n.pt data='custom_data.yaml' epochs=100 imgsz=640
|
||||
```
|
||||
|
||||
For more detailed instructions, visit the [Train](../modes/train.md) documentation page.
|
||||
|
||||
### Can I contribute my own model to Ultralytics?
|
||||
|
||||
Yes, you can contribute your own model to Ultralytics. To do so, follow these steps:
|
||||
|
||||
1. **Fork the Repository**: Fork the [Ultralytics GitHub repository](https://github.com/ultralytics/ultralytics).
|
||||
2. **Clone Your Fork**: Clone your fork to your local machine and create a new branch.
|
||||
3. **Implement Your Model**: Add your model while following the coding standards in the [Contributing Guide](../help/contributing.md).
|
||||
4. **Test Thoroughly**: Ensure your model passes all tests.
|
||||
5. **Create a Pull Request**: Submit your work for review.
|
||||
|
||||
Visit the [Contributing Guide](../help/contributing.md) for detailed steps.
|
||||
|
||||
### Which YOLO versions are supported by Ultralytics?
|
||||
|
||||
Ultralytics supports a wide range of YOLO versions from [YOLOv3](yolov3.md) to the latest [YOLOv10](yolov10.md). Each version has unique features and improvements. For instance, YOLOv8 supports tasks such as instance segmentation and pose estimation, while YOLOv10 offers NMS-free training and efficiency-accuracy driven architecture.
|
||||
Ultralytics supports a comprehensive range of YOLO (You Only Look Once) versions from YOLOv3 to YOLOv10, along with models like NAS, SAM, and RT-DETR. Each version is optimized for various tasks such as detection, segmentation, and classification. For detailed information on each model, refer to the [Models Supported by Ultralytics](../models/index.md) documentation.
|
||||
|
||||
### How can I run inference with a YOLOv8 model using the Command Line Interface (CLI)?
|
||||
### Why should I use Ultralytics HUB for machine learning projects?
|
||||
|
||||
To run inference with a YOLOv8 model using the CLI, use the following command:
|
||||
Ultralytics HUB provides a no-code, end-to-end platform for training, deploying, and managing YOLO models. It simplifies complex workflows, enabling users to focus on model performance and application. The HUB also offers cloud training capabilities, comprehensive dataset management, and user-friendly interfaces. Learn more about it on the [Ultralytics HUB](../hub/index.md) documentation page.
|
||||
|
||||
```bash
|
||||
# Load a COCO-pretrained YOLOv8n model and run inference on the 'bus.jpg' image
|
||||
yolo predict model=yolov8n.pt source=path/to/bus.jpg
|
||||
```
|
||||
### What types of tasks can YOLOv8 perform, and how does it compare to other YOLO versions?
|
||||
|
||||
For more information on using CLI commands, visit the [Predict](../modes/predict.md) documentation page.
|
||||
YOLOv8 is a versatile model capable of performing tasks including object detection, instance segmentation, classification, and pose estimation. Compared to earlier versions like YOLOv3 and YOLOv4, YOLOv8 offers significant improvements in speed and accuracy due to its optimized architecture. For a deeper comparison, refer to the [YOLOv8 documentation](../models/yolov8.md) and the [Task pages](../tasks/index.md) for more details on specific tasks.
|
||||
|
|
|
|||
|
|
@ -120,9 +120,13 @@ If you find MobileSAM useful in your research or development work, please consid
|
|||
|
||||
## FAQ
|
||||
|
||||
### How do I use MobileSAM for image segmentation on a mobile application?
|
||||
### What is MobileSAM and how does it differ from the original SAM model?
|
||||
|
||||
MobileSAM is specifically designed for lightweight and fast image segmentation on mobile applications. To get started, you can download the model weights [here](https://github.com/ChaoningZhang/MobileSAM/blob/master/weights/mobile_sam.pt) and use the following Python code snippet for inference:
|
||||
MobileSAM is a lightweight, fast image segmentation model designed for mobile applications. It retains the same pipeline as the original SAM but replaces the heavyweight ViT-H encoder (632M parameters) with a smaller Tiny-ViT encoder (5M parameters). This change results in MobileSAM being approximately 5 times smaller and 7 times faster than the original SAM. For instance, MobileSAM operates at about 12ms per image, compared to the original SAM's 456ms. You can learn more about the MobileSAM implementation in various projects [here](https://github.com/ChaoningZhang/MobileSAM).
|
||||
|
||||
### How can I test MobileSAM using Ultralytics?
|
||||
|
||||
Testing MobileSAM in Ultralytics can be accomplished through straightforward methods. You can use Point and Box prompts to predict segments. Here's an example using a Point prompt:
|
||||
|
||||
```python
|
||||
from ultralytics import SAM
|
||||
|
|
@ -134,35 +138,22 @@ model = SAM("mobile_sam.pt")
|
|||
model.predict("ultralytics/assets/zidane.jpg", points=[900, 370], labels=[1])
|
||||
```
|
||||
|
||||
For more detailed usage and various prompts, refer to the [SAM page](sam.md).
|
||||
You can also refer to the [Testing MobileSAM](#testing-mobilesam-in-ultralytics) section for more details.
|
||||
|
||||
### What are the performance benefits of using MobileSAM over the original SAM?
|
||||
### Why should I use MobileSAM for my mobile application?
|
||||
|
||||
MobileSAM offers significant improvements in both size and speed over the original SAM. Here is a detailed comparison:
|
||||
MobileSAM is ideal for mobile applications due to its lightweight architecture and fast inference speed. Compared to the original SAM, MobileSAM is approximately 5 times smaller and 7 times faster, making it suitable for environments where computational resources are limited. This efficiency ensures that mobile devices can perform real-time image segmentation without significant latency. Additionally, MobileSAM's models, such as [Inference](../modes/predict.md), are optimized for mobile performance.
|
||||
|
||||
- **Image Encoder**: MobileSAM uses a smaller Tiny-ViT (5M parameters) instead of the original heavyweight ViT-H (611M parameters), resulting in an 8ms encoding time versus 452ms with SAM.
|
||||
- **Overall Pipeline**: MobileSAM's entire pipeline, including image encoding and mask decoding, operates at 12ms per image compared to SAM's 456ms, making it approximately 7 times faster.
|
||||
In summary, MobileSAM is about 5 times smaller and 7 times faster than the original SAM, making it ideal for mobile applications.
|
||||
### How was MobileSAM trained, and is the training code available?
|
||||
|
||||
### Why should developers adopt MobileSAM for mobile applications?
|
||||
MobileSAM was trained on a single GPU with a 100k dataset, which is 1% of the original images, in less than a day. While the training code will be made available in the future, you can currently explore other aspects of MobileSAM in the [MobileSAM GitHub repository](https://github.com/ultralytics/assets/releases/download/v8.2.0/mobile_sam.pt). This repository includes pre-trained weights and implementation details for various applications.
|
||||
|
||||
Developers should consider using MobileSAM for mobile applications due to its lightweight and fast performance, making it highly efficient for real-time image segmentation tasks.
|
||||
### What are the primary use cases for MobileSAM?
|
||||
|
||||
- **Efficiency**: MobileSAM's Tiny-ViT encoder allows for rapid processing, achieving segmentation results in just 12ms.
|
||||
- **Size**: The model size is significantly reduced, making it easier to deploy and run on mobile devices.
|
||||
These advancements facilitate real-time applications, such as augmented reality, mobile games, and other interactive experiences.
|
||||
MobileSAM is designed for fast and efficient image segmentation in mobile environments. Primary use cases include:
|
||||
|
||||
Learn more about the MobileSAM's performance on its [project page](https://github.com/ChaoningZhang/MobileSAM).
|
||||
- **Real-time object detection and segmentation** for mobile applications.
|
||||
- **Low-latency image processing** in devices with limited computational resources.
|
||||
- **Integration in AI-driven mobile apps** for tasks such as augmented reality (AR) and real-time analytics.
|
||||
|
||||
### How easy is it to transition from the original SAM to MobileSAM?
|
||||
|
||||
Transitioning from the original SAM to MobileSAM is straightforward as MobileSAM retains the same pipeline, including pre-processing, post-processing, and interfaces. Only the image encoder has been changed to the more efficient Tiny-ViT. Users currently using SAM can switch to MobileSAM with minimal code modifications, benefiting from improved performance without the need for significant reconfiguration.
|
||||
|
||||
### What tasks are supported by the MobileSAM model?
|
||||
|
||||
The MobileSAM model supports instance segmentation tasks. Currently, it is optimized for [Inference](../modes/predict.md) mode. Additional tasks like validation, training, and export are not supported at this time, as indicated in the mode compatibility table:
|
||||
| Model Type | Tasks Supported | Inference | Validation | Training | Export |
|
||||
| ---------- | -------------------------------------------- | --------- | ---------- | -------- | ------ |
|
||||
| MobileSAM | [Instance Segmentation](../tasks/segment.md) | ✅ | ❌ | ❌ | ❌ |
|
||||
|
||||
For more information about supported tasks and operational modes, check the [tasks page](../tasks/segment.md) and the mode details like [Inference](../modes/predict.md), [Validation](../modes/val.md), and [Export](../modes/export.md).
|
||||
For more detailed use cases and performance comparisons, see the section on [Adapting from SAM to MobileSAM](#adapting-from-sam-to-mobilesam).
|
||||
|
|
|
|||
|
|
@ -102,62 +102,52 @@ We would like to acknowledge Baidu and the [PaddlePaddle](https://github.com/Pad
|
|||
|
||||
## FAQ
|
||||
|
||||
### What is Baidu's RT-DETR and how does it work?
|
||||
### What is Baidu's RT-DETR model and how does it work?
|
||||
|
||||
Baidu's RT-DETR (Real-Time Detection Transformer) is an end-to-end vision transformer-based object detector designed for real-time performance without compromising accuracy. Unlike traditional object detectors, it employs a convolutional backbone with an efficient hybrid encoder that handles multiscale feature processing by decoupling intra-scale interaction and cross-scale fusion. The model also utilizes IoU-aware query selection for initializing object queries, which improves detection accuracy. For flexible applications, the inference speed can be adjusted using different decoder layers without retraining. For more details, you can check out the [original paper](https://arxiv.org/abs/2304.08069).
|
||||
Baidu's RT-DETR (Real-Time Detection Transformer) is an advanced real-time object detector built upon the Vision Transformer architecture. It efficiently processes multiscale features by decoupling intra-scale interaction and cross-scale fusion through its efficient hybrid encoder. By employing IoU-aware query selection, the model focuses on the most relevant objects, enhancing detection accuracy. Its adaptable inference speed, achieved by adjusting decoder layers without retraining, makes RT-DETR suitable for various real-time object detection scenarios. Learn more about RT-DETR features [here](https://arxiv.org/pdf/2304.08069.pdf).
|
||||
|
||||
### How can I use a pre-trained RT-DETR model with Ultralytics?
|
||||
### How can I use the pre-trained RT-DETR models provided by Ultralytics?
|
||||
|
||||
Using a pre-trained RT-DETR model with the Ultralytics Python API is straightforward. Here's an example:
|
||||
You can leverage Ultralytics Python API to use pre-trained PaddlePaddle RT-DETR models. For instance, to load an RT-DETR-l model pre-trained on COCO val2017 and achieve high FPS on T4 GPU, you can utilize the following example:
|
||||
|
||||
```python
|
||||
from ultralytics import RTDETR
|
||||
!!! Example
|
||||
|
||||
# Load a COCO-pretrained RT-DETR-l model
|
||||
model = RTDETR("rtdetr-l.pt")
|
||||
=== "Python"
|
||||
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
```python
|
||||
from ultralytics import RTDETR
|
||||
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
|
||||
# Load a COCO-pretrained RT-DETR-l model
|
||||
model = RTDETR("rtdetr-l.pt")
|
||||
|
||||
# Run inference with the RT-DETR-l model on the 'bus.jpg' image
|
||||
results = model("path/to/bus.jpg")
|
||||
```
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
|
||||
You can find more details on specific modes like [Predict](../modes/predict.md), [Train](../modes/train.md), and [Export](../modes/export.md).
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
|
||||
|
||||
### What are the key features of RT-DETR that make it unique?
|
||||
# Run inference with the RT-DETR-l model on the 'bus.jpg' image
|
||||
results = model("path/to/bus.jpg")
|
||||
```
|
||||
|
||||
The RT-DETR model has several key features that set it apart:
|
||||
=== "CLI"
|
||||
|
||||
1. **Efficient Hybrid Encoder**: This design processes multiscale features by decoupling intra-scale interaction and cross-scale fusion, reducing computational costs.
|
||||
2. **IoU-aware Query Selection**: Enhances object query initialization, focusing on the most relevant objects for higher detection accuracy.
|
||||
3. **Adaptable Inference Speed**: The model supports flexible adjustments of inference speed by using different decoder layers without retraining, making it highly adaptable for various real-time object detection scenarios.
|
||||
```bash
|
||||
# Load a COCO-pretrained RT-DETR-l model and train it on the COCO8 example dataset for 100 epochs
|
||||
yolo train model=rtdetr-l.pt data=coco8.yaml epochs=100 imgsz=640
|
||||
|
||||
### What performance can I expect from RT-DETR on different scales?
|
||||
# Load a COCO-pretrained RT-DETR-l model and run inference on the 'bus.jpg' image
|
||||
yolo predict model=rtdetr-l.pt source=path/to/bus.jpg
|
||||
```
|
||||
|
||||
The Ultralytics Python API provides pre-trained PaddlePaddle RT-DETR models in different scales, offering notable performance metrics:
|
||||
### Why should I choose Baidu's RT-DETR over other real-time object detectors?
|
||||
|
||||
- **RT-DETR-L**: Achieves 53.0% AP on COCO val2017 and runs at 114 FPS on a T4 GPU.
|
||||
- **RT-DETR-X**: Achieves 54.8% AP on COCO val2017 and runs at 74 FPS on a T4 GPU.
|
||||
Baidu's RT-DETR stands out due to its efficient hybrid encoder and IoU-aware query selection, which drastically reduce computational costs while maintaining high accuracy. Its unique ability to adjust inference speed by using different decoder layers without retraining adds significant flexibility. This makes it particularly advantageous for applications requiring real-time performance on accelerated backends like CUDA with TensorRT, outclassing many other real-time object detectors.
|
||||
|
||||
This makes the RT-DETR models highly efficient for real-time applications requiring both speed and accuracy.
|
||||
### How does RT-DETR support adaptable inference speed for different real-time applications?
|
||||
|
||||
### How can I acknowledge Baidu's contribution if I use RT-DETR in my research?
|
||||
Baidu's RT-DETR allows flexible adjustments of inference speed by using different decoder layers without requiring retraining. This adaptability is crucial for scaling performance across various real-time object detection tasks. Whether you need faster processing for lower precision needs or slower, more accurate detections, RT-DETR can be tailored to meet your specific requirements.
|
||||
|
||||
If you use Baidu's RT-DETR in your research or development work, you should cite the original paper. Here is the BibTeX entry for your reference:
|
||||
### Can I use RT-DETR models with other Ultralytics modes, such as training, validation, and export?
|
||||
|
||||
```bibtex
|
||||
@misc{lv2023detrs,
|
||||
title={DETRs Beat YOLOs on Real-time Object Detection},
|
||||
author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu},
|
||||
year={2023},
|
||||
eprint={2304.08069},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV}
|
||||
}
|
||||
```
|
||||
|
||||
Additionally, acknowledge Baidu and the [PaddlePaddle](https://github.com/PaddlePaddle/PaddleDetection) team for creating and maintaining this valuable resource for the computer vision community.
|
||||
Yes, RT-DETR models are compatible with various Ultralytics modes including training, validation, prediction, and export. You can refer to the respective documentation for detailed instructions on how to utilize these modes: [Train](../modes/train.md), [Val](../modes/val.md), [Predict](../modes/predict.md), and [Export](../modes/export.md). This ensures a comprehensive workflow for developing and deploying your object detection solutions.
|
||||
|
|
|
|||
|
|
@ -226,25 +226,42 @@ We would like to express our gratitude to Meta AI for creating and maintaining t
|
|||
|
||||
## FAQ
|
||||
|
||||
### What is the Segment Anything Model (SAM)?
|
||||
### What is the Segment Anything Model (SAM) by Ultralytics?
|
||||
|
||||
The Segment Anything Model (SAM) is a cutting-edge image segmentation model designed for promptable segmentation, allowing it to generate segmentation masks based on spatial or text-based prompts. SAM is capable of zero-shot transfer, meaning it can adapt to new image distributions and tasks without prior knowledge. It's trained on the extensive [SA-1B dataset](https://ai.facebook.com/datasets/segment-anything/), which comprises over 1 billion masks on 11 million images. For more details, check the [Introduction to SAM](#introduction-to-sam-the-segment-anything-model).
|
||||
The Segment Anything Model (SAM) by Ultralytics is a revolutionary image segmentation model designed for promptable segmentation tasks. It leverages advanced architecture, including image and prompt encoders combined with a lightweight mask decoder, to generate high-quality segmentation masks from various prompts such as spatial or text cues. Trained on the expansive [SA-1B dataset](https://ai.facebook.com/datasets/segment-anything/), SAM excels in zero-shot performance, adapting to new image distributions and tasks without prior knowledge. Learn more [here](#introduction-to-sam-the-segment-anything-model).
|
||||
|
||||
### How does SAM achieve zero-shot performance in image segmentation?
|
||||
### How can I use the Segment Anything Model (SAM) for image segmentation?
|
||||
|
||||
SAM achieves zero-shot performance by leveraging its advanced architecture, which includes a robust image encoder, a prompt encoder, and a lightweight mask decoder. This configuration enables SAM to respond effectively to any given prompt and adapt to new tasks without additional training. Its training on the highly diverse SA-1B dataset further enhances its adaptability. Learn more about its architecture in the [Key Features of the Segment Anything Model](#key-features-of-the-segment-anything-model-sam).
|
||||
You can use the Segment Anything Model (SAM) for image segmentation by running inference with various prompts such as bounding boxes or points. Here's an example using Python:
|
||||
|
||||
### Can I use SAM for tasks other than segmentation?
|
||||
```python
|
||||
from ultralytics import SAM
|
||||
|
||||
Yes, SAM can be employed for various downstream tasks beyond its primary segmentation role. These tasks include edge detection, object proposal generation, instance segmentation, and preliminary text-to-mask prediction. Through prompt engineering, SAM can adapt swiftly to new tasks and data distributions, offering flexible applications. For practical use cases and examples, refer to the [How to Use SAM](#how-to-use-sam-versatility-and-power-in-image-segmentation) section.
|
||||
# Load a model
|
||||
model = SAM("sam_b.pt")
|
||||
|
||||
### How does SAM compare to Ultralytics YOLOv8 models?
|
||||
# Segment with bounding box prompt
|
||||
model("ultralytics/assets/zidane.jpg", bboxes=[439, 437, 524, 709])
|
||||
|
||||
While SAM excels in automatic, real-time segmentation with promptable capabilities, Ultralytics YOLOv8 models are smaller, faster, and more efficient for object detection and instance segmentation tasks. For instance, the YOLOv8n-seg model is significantly smaller and faster than the SAM-b model, making it ideal for applications requiring high-speed processing with lower computational resources. See a detailed comparison in the [SAM comparison vs YOLOv8](#sam-comparison-vs-yolov8) section.
|
||||
# Segment with points prompt
|
||||
model("ultralytics/assets/zidane.jpg", points=[900, 370], labels=[1])
|
||||
```
|
||||
|
||||
### How can I auto-annotate a segmentation dataset using SAM?
|
||||
Alternatively, you can run inference with SAM in the command line interface (CLI):
|
||||
|
||||
To auto-annotate a segmentation dataset, you can use the `auto_annotate` function provided by the Ultralytics framework. This function allows you to automatically generate high-quality segmentation masks using a pre-trained detection model paired with the SAM segmentation model:
|
||||
```bash
|
||||
yolo predict model=sam_b.pt source=path/to/image.jpg
|
||||
```
|
||||
|
||||
For more detailed usage instructions, visit the [Segmentation section](#sam-prediction-example).
|
||||
|
||||
### How do SAM and YOLOv8 compare in terms of performance?
|
||||
|
||||
Compared to YOLOv8, SAM models like SAM-b and FastSAM-s are larger and slower but offer unique capabilities for automatic segmentation. For instance, Ultralytics [YOLOv8n-seg](../tasks/segment.md) is 53.4 times smaller and 866 times faster than SAM-b. However, SAM's zero-shot performance makes it highly flexible and efficient in diverse, untrained tasks. Learn more about performance comparisons between SAM and YOLOv8 [here](#sam-comparison-vs-yolov8).
|
||||
|
||||
### How can I auto-annotate my dataset using SAM?
|
||||
|
||||
Ultralytics' SAM offers an auto-annotation feature that allows generating segmentation datasets using a pre-trained detection model. Here's an example in Python:
|
||||
|
||||
```python
|
||||
from ultralytics.data.annotator import auto_annotate
|
||||
|
|
@ -252,4 +269,12 @@ from ultralytics.data.annotator import auto_annotate
|
|||
auto_annotate(data="path/to/images", det_model="yolov8x.pt", sam_model="sam_b.pt")
|
||||
```
|
||||
|
||||
This approach accelerates the annotation process by bypassing manual labeling, making it especially useful for large datasets. For step-by-step instructions, visit [Generate Your Segmentation Dataset Using a Detection Model](#generate-your-segmentation-dataset-using-a-detection-model).
|
||||
This function takes the path to your images and optional arguments for pre-trained detection and SAM segmentation models, along with device and output directory specifications. For a complete guide, see [Auto-Annotation](#auto-annotation-a-quick-path-to-segmentation-datasets).
|
||||
|
||||
### What datasets are used to train the Segment Anything Model (SAM)?
|
||||
|
||||
SAM is trained on the extensive [SA-1B dataset](https://ai.facebook.com/datasets/segment-anything/) which comprises over 1 billion masks across 11 million images. SA-1B is the largest segmentation dataset to date, providing high-quality and diverse training data, ensuring impressive zero-shot performance in varied segmentation tasks. For more details, visit the [Dataset section](#key-features-of-the-segment-anything-model-sam).
|
||||
|
||||
---
|
||||
|
||||
This FAQ aims to address common questions related to the Segment Anything Model (SAM) from Ultralytics, enhancing user understanding and facilitating effective use of Ultralytics products. For additional information, explore the relevant sections linked throughout.
|
||||
|
|
|
|||
|
|
@ -119,60 +119,46 @@ We express our gratitude to Deci AI's [SuperGradients](https://github.com/Deci-A
|
|||
|
||||
## FAQ
|
||||
|
||||
### What is YOLO-NAS and how does it differ from previous YOLO models?
|
||||
### What is YOLO-NAS and how does it improve over previous YOLO models?
|
||||
|
||||
YOLO-NAS, developed by Deci AI, is an advanced object detection model built using Neural Architecture Search (NAS). It offers significant improvements over previous YOLO models, including:
|
||||
YOLO-NAS, developed by Deci AI, is a state-of-the-art object detection model leveraging advanced Neural Architecture Search (NAS) technology. It addresses the limitations of previous YOLO models by introducing features like quantization-friendly basic blocks and sophisticated training schemes. This results in significant improvements in performance, particularly in environments with limited computational resources. YOLO-NAS also supports quantization, maintaining high accuracy even when converted to its INT8 version, enhancing its suitability for production environments. For more details, see the [Overview](#overview) section.
|
||||
|
||||
- **Quantization-Friendly Basic Block:** This helps reduce the precision drop when the model is converted to INT8 quantization.
|
||||
- **Enhanced Training and Quantization:** YOLO-NAS utilizes sophisticated training schemes and post-training quantization techniques.
|
||||
- **Pre-trained on Large Datasets:** Utilizes the COCO, Objects365, and Roboflow 100 datasets, making it highly robust for downstream tasks.
|
||||
### How can I integrate YOLO-NAS models into my Python application?
|
||||
|
||||
For more details, refer to the [Overview of YOLO-NAS](#overview).
|
||||
|
||||
### How can I use YOLO-NAS models in my Python application?
|
||||
|
||||
Ultralytics makes it easy to integrate YOLO-NAS models into your Python applications via the `ultralytics` package. Here's a basic example:
|
||||
You can easily integrate YOLO-NAS models into your Python application using the `ultralytics` package. Here's a simple example of how to load a pre-trained YOLO-NAS model and perform inference:
|
||||
|
||||
```python
|
||||
from ultralytics import NAS
|
||||
|
||||
# Load a pre-trained YOLO-NAS-s model
|
||||
# Load a COCO-pretrained YOLO-NAS-s model
|
||||
model = NAS("yolo_nas_s.pt")
|
||||
|
||||
# Display model information
|
||||
model.info()
|
||||
|
||||
# Validate the model on the COCO8 dataset
|
||||
# Validate the model on the COCO8 example dataset
|
||||
results = model.val(data="coco8.yaml")
|
||||
|
||||
# Run inference with the YOLO-NAS-s model on an image
|
||||
results = model("path/to/image.jpg")
|
||||
# Run inference with the YOLO-NAS-s model on the 'bus.jpg' image
|
||||
results = model("path/to/bus.jpg")
|
||||
```
|
||||
|
||||
For additional examples, see the [Usage Examples](#usage-examples) section of the documentation.
|
||||
For more information, refer to the [Inference and Validation Examples](#inference-and-validation-examples).
|
||||
|
||||
### Why should I use YOLO-NAS for object detection tasks?
|
||||
### What are the key features of YOLO-NAS and why should I consider using it?
|
||||
|
||||
YOLO-NAS offers several advantages that make it a compelling choice for object detection:
|
||||
YOLO-NAS introduces several key features that make it a superior choice for object detection tasks:
|
||||
|
||||
- **High Performance:** Achieves a balance between accuracy and latency, crucial for real-time applications.
|
||||
- **Pre-Trained on Diverse Datasets:** Provides robust models for various use cases with extensive pre-training on datasets like COCO and Objects365.
|
||||
- **Quantization Efficiency:** For applications requiring low latency, the INT8 quantized versions show minimal precision drop, making them suitable for resource-constrained environments.
|
||||
- **Quantization-Friendly Basic Block:** Enhanced architecture that improves model performance with minimal precision drop post quantization.
|
||||
- **Sophisticated Training and Quantization:** Employs advanced training schemes and post-training quantization techniques.
|
||||
- **AutoNAC Optimization and Pre-training:** Utilizes AutoNAC optimization and is pre-trained on prominent datasets like COCO, Objects365, and Roboflow 100.
|
||||
These features contribute to its high accuracy, efficient performance, and suitability for deployment in production environments. Learn more in the [Key Features](#key-features) section.
|
||||
|
||||
For a detailed comparison of model variants, see [Pre-trained Models](#pre-trained-models).
|
||||
### Which tasks and modes are supported by YOLO-NAS models?
|
||||
|
||||
### What are the supported tasks and modes for YOLO-NAS models?
|
||||
YOLO-NAS models support various object detection tasks and modes such as inference, validation, and export. They do not support training. The supported models include YOLO-NAS-s, YOLO-NAS-m, and YOLO-NAS-l, each tailored to different computational capacities and performance needs. For a detailed overview, refer to the [Supported Tasks and Modes](#supported-tasks-and-modes) section.
|
||||
|
||||
YOLO-NAS models support several tasks and modes, including:
|
||||
### Are there pre-trained YOLO-NAS models available and how do I access them?
|
||||
|
||||
- **Object Detection:** Suitable for identifying and localizing objects in images.
|
||||
- **Inference and Validation:** Models can be used for both inference and validation to assess performance.
|
||||
- **Export:** YOLO-NAS models can be exported to various formats for deployment.
|
||||
Yes, Ultralytics provides pre-trained YOLO-NAS models that you can access directly. These models are pre-trained on datasets like COCO, ensuring high performance in terms of both speed and accuracy. You can download these models using the links provided in the [Pre-trained Models](#pre-trained-models) section. Here are some examples:
|
||||
|
||||
However, the YOLO-NAS implementation using the `ultralytics` package does not currently support training. For more information, visit the [Supported Tasks and Modes](#supported-tasks-and-modes) section.
|
||||
|
||||
### How does quantization impact the performance of YOLO-NAS models?
|
||||
|
||||
Quantization can significantly reduce the model size and improve inference speed with minimal impact on accuracy. YOLO-NAS introduces a quantization-friendly basic block, resulting in minimal precision loss when converted to INT8. This makes YOLO-NAS highly efficient for deployment in scenarios with resource constraints.
|
||||
|
||||
To understand the performance metrics of INT8 quantized models, refer to the [Pre-trained Models](#pre-trained-models) section.
|
||||
- [YOLO-NAS-s](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolo_nas_s.pt)
|
||||
- [YOLO-NAS-m](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolo_nas_m.pt)
|
||||
- [YOLO-NAS-l](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolo_nas_l.pt)
|
||||
|
|
|
|||
|
|
@ -338,41 +338,13 @@ For further reading, the original YOLO-World paper is available on [arXiv](https
|
|||
|
||||
## FAQ
|
||||
|
||||
### What is the YOLO-World Model and how does it improve open-vocabulary object detection?
|
||||
### What is the YOLO-World model and how does it work?
|
||||
|
||||
The YOLO-World Model is an advanced, real-time object detection model based on Ultralytics YOLOv8, designed specifically for open-vocabulary detection tasks. It leverages vision-language modeling and pre-training on large datasets to detect a broad range of objects based on descriptive texts, significantly reducing computational demands while maintaining high performance. This makes it suitable for real-time applications across various industries needing immediate results.
|
||||
The YOLO-World model is an advanced, real-time object detection approach based on the [Ultralytics YOLOv8](yolov8.md) framework. It excels in Open-Vocabulary Detection tasks by identifying objects within an image based on descriptive texts. Using vision-language modeling and pre-training on large datasets, YOLO-World achieves high efficiency and performance with significantly reduced computational demands, making it ideal for real-time applications across various industries.
|
||||
|
||||
### How do I train a custom YOLO-World Model using the Ultralytics API?
|
||||
### How does YOLO-World handle inference with custom prompts?
|
||||
|
||||
Training a custom YOLO-World Model is straightforward with Ultralytics' API. You can use pretrained weights and configuration files to start training on your dataset. Here is an example of training with Python:
|
||||
|
||||
```python
|
||||
from ultralytics import YOLOWorld
|
||||
|
||||
# Load a pretrained YOLOv8s-worldv2 model
|
||||
model = YOLOWorld("yolov8s-worldv2.pt")
|
||||
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
|
||||
|
||||
# Run inference with the YOLOv8s-worldv2 model on an image
|
||||
results = model("path/to/image.jpg")
|
||||
```
|
||||
|
||||
Refer to the [Training](../modes/train.md) page for more details.
|
||||
|
||||
### What are the main advantages of using YOLO-World for object detection?
|
||||
|
||||
YOLO-World offers several advantages:
|
||||
|
||||
- **Real-time detection**: Utilizes CNNs for high-speed inference.
|
||||
- **Lower computational demand**: Efficiently processes images with minimal resources.
|
||||
- **Open-vocabulary detection**: Detects objects without predefined categories, based on descriptive texts.
|
||||
- **High performance**: Outperforms other models on standard benchmarks while running on a single NVIDIA V100 GPU.
|
||||
|
||||
### Can I customize the classes YOLO-World detects without retraining the model?
|
||||
|
||||
Yes, YOLO-World allows you to dynamically specify detection classes through custom prompts without retraining the model. Here's an example of how to set custom classes:
|
||||
YOLO-World supports a "prompt-then-detect" strategy, which utilizes an offline vocabulary to enhance efficiency. Custom prompts like captions or specific object categories are pre-encoded and stored as offline vocabulary embeddings. This approach streamlines the detection process without the need for retraining. You can dynamically set these prompts within the model to tailor it to specific detection tasks, as shown below:
|
||||
|
||||
```python
|
||||
from ultralytics import YOLOWorld
|
||||
|
|
@ -385,11 +357,80 @@ model.set_classes(["person", "bus"])
|
|||
|
||||
# Execute prediction on an image
|
||||
results = model.predict("path/to/image.jpg")
|
||||
|
||||
# Show results
|
||||
results[0].show()
|
||||
```
|
||||
|
||||
You can learn more about this feature on the [Predict Usage](#predict-usage) section.
|
||||
### Why should I choose YOLO-World over traditional Open-Vocabulary detection models?
|
||||
|
||||
### What datasets are supported for training YOLO-World from scratch?
|
||||
YOLO-World provides several advantages over traditional Open-Vocabulary detection models:
|
||||
|
||||
YOLO-World supports various datasets for training, including Objects365, GQA, and Flickr30k for detection and grounding tasks. For validation, it supports datasets like LVIS minival. Detailed information about preparing and using these datasets can be found in the [Zero-shot Transfer on COCO Dataset](#zero-shot-transfer-on-coco-dataset) section.
|
||||
- **Real-Time Performance:** It leverages the computational speed of CNNs to offer quick, efficient detection.
|
||||
- **Efficiency and Low Resource Requirement:** YOLO-World maintains high performance while significantly reducing computational and resource demands.
|
||||
- **Customizable Prompts:** The model supports dynamic prompt setting, allowing users to specify custom detection classes without retraining.
|
||||
- **Benchmark Excellence:** It outperforms other open-vocabulary detectors like MDETR and GLIP in both speed and efficiency on standard benchmarks.
|
||||
|
||||
### How do I train a YOLO-World model on my dataset?
|
||||
|
||||
Training a YOLO-World model on your dataset is straightforward through the provided Python API or CLI commands. Here's how to start training using Python:
|
||||
|
||||
```python
|
||||
from ultralytics import YOLOWorld
|
||||
|
||||
# Load a pretrained YOLOv8s-worldv2 model
|
||||
model = YOLOWorld("yolov8s-worldv2.pt")
|
||||
|
||||
# Train the model on the COCO8 dataset for 100 epochs
|
||||
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
|
||||
```
|
||||
|
||||
Or using CLI:
|
||||
|
||||
```bash
|
||||
yolo train model=yolov8s-worldv2.yaml data=coco8.yaml epochs=100 imgsz=640
|
||||
```
|
||||
|
||||
### What are the available pre-trained YOLO-World models and their supported tasks?
|
||||
|
||||
Ultralytics offers multiple pre-trained YOLO-World models supporting various tasks and operating modes:
|
||||
|
||||
| Model Type | Pre-trained Weights | Tasks Supported | Inference | Validation | Training | Export |
|
||||
| --------------- | ------------------------------------------------------------------------------------------------------- | -------------------------------------- | --------- | ---------- | -------- | ------ |
|
||||
| YOLOv8s-world | [yolov8s-world.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s-world.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ❌ |
|
||||
| YOLOv8s-worldv2 | [yolov8s-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
| YOLOv8m-world | [yolov8m-world.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8m-world.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ❌ |
|
||||
| YOLOv8m-worldv2 | [yolov8m-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8m-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
| YOLOv8l-world | [yolov8l-world.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8l-world.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ❌ |
|
||||
| YOLOv8l-worldv2 | [yolov8l-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8l-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
| YOLOv8x-world | [yolov8x-world.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8x-world.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ❌ |
|
||||
| YOLOv8x-worldv2 | [yolov8x-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8x-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅ | ✅ | ✅ | ✅ |
|
||||
|
||||
### How do I reproduce the official results of YOLO-World from scratch?
|
||||
|
||||
To reproduce the official results from scratch, you need to prepare the datasets and launch the training using the provided code. The training procedure involves creating a data dictionary and running the `train` method with a custom trainer:
|
||||
|
||||
```python
|
||||
from ultralytics import YOLOWorld
|
||||
from ultralytics.models.yolo.world.train_world import WorldTrainerFromScratch
|
||||
|
||||
data = {
|
||||
"train": {
|
||||
"yolo_data": ["Objects365.yaml"],
|
||||
"grounding_data": [
|
||||
{
|
||||
"img_path": "../datasets/flickr30k/images",
|
||||
"json_file": "../datasets/flickr30k/final_flickr_separateGT_train.json",
|
||||
},
|
||||
{
|
||||
"img_path": "../datasets/GQA/images",
|
||||
"json_file": "../datasets/GQA/final_mixed_train_no_coco.json",
|
||||
},
|
||||
],
|
||||
},
|
||||
"val": {"yolo_data": ["lvis.yaml"]},
|
||||
}
|
||||
|
||||
model = YOLOWorld("yolov8s-worldv2.yaml")
|
||||
model.train(data=data, batch=128, epochs=100, trainer=WorldTrainerFromScratch)
|
||||
```
|
||||
|
|
|
|||
|
|
@ -19,7 +19,7 @@ Real-time object detection aims to accurately predict object categories and posi
|
|||
The architecture of YOLOv10 builds upon the strengths of previous YOLO models while introducing several key innovations. The model architecture consists of the following components:
|
||||
|
||||
1. **Backbone**: Responsible for feature extraction, the backbone in YOLOv10 uses an enhanced version of CSPNet (Cross Stage Partial Network) to improve gradient flow and reduce computational redundancy.
|
||||
2. **Neck**: The neck is designed to aggregate features from different scales and passes them to the head. It includes PAN (Path Aggregation Network) layers for effective multi-scale feature fusion.
|
||||
2. **Neck**: The neck is designed to aggregate features from different scales and passes them to the head. It includes PAN (Path Aggregation Network) layers for effective multiscale feature fusion.
|
||||
3. **One-to-Many Head**: Generates multiple predictions per object during training to provide rich supervisory signals and improve learning accuracy.
|
||||
4. **One-to-One Head**: Generates a single best prediction per object during inference to eliminate the need for NMS, thereby reducing latency and improving efficiency.
|
||||
|
||||
|
|
@ -90,33 +90,33 @@ Compared to other state-of-the-art detectors:
|
|||
|
||||
Here is a detailed comparison of YOLOv10 variants with other state-of-the-art models:
|
||||
|
||||
| Model | Params (M) | FLOPs (G) | APval (%) | Latency (ms) | Latency (Forward) (ms) |
|
||||
| ------------------ | ---------- | --------- | --------- | ------------ | ---------------------- |
|
||||
| YOLOv6-3.0-N | 4.7 | 11.4 | 37.0 | 2.69 | **1.76** |
|
||||
| Gold-YOLO-N | 5.6 | 12.1 | **39.6** | 2.92 | 1.82 |
|
||||
| YOLOv8-N | 3.2 | 8.7 | 37.3 | 6.16 | 1.77 |
|
||||
| **[YOLOv10-N][1]** | **2.3** | **6.7** | 39.5 | **1.84** | 1.79 |
|
||||
| | | | | | |
|
||||
| YOLOv6-3.0-S | 18.5 | 45.3 | 44.3 | 3.42 | 2.35 |
|
||||
| Gold-YOLO-S | 21.5 | 46.0 | 45.4 | 3.82 | 2.73 |
|
||||
| YOLOv8-S | 11.2 | 28.6 | 44.9 | 7.07 | **2.33** |
|
||||
| **[YOLOv10-S][2]** | **7.2** | **21.6** | **46.8** | **2.49** | 2.39 |
|
||||
| | | | | | |
|
||||
| RT-DETR-R18 | 20.0 | 60.0 | 46.5 | **4.58** | **4.49** |
|
||||
| YOLOv6-3.0-M | 34.9 | 85.8 | 49.1 | 5.63 | 4.56 |
|
||||
| Gold-YOLO-M | 41.3 | 87.5 | 49.8 | 6.38 | 5.45 |
|
||||
| YOLOv8-M | 25.9 | 78.9 | 50.6 | 9.50 | 5.09 |
|
||||
| **[YOLOv10-M][3]** | **15.4** | **59.1** | **51.3** | 4.74 | 4.63 |
|
||||
| | | | | | |
|
||||
| YOLOv6-3.0-L | 59.6 | 150.7 | 51.8 | 9.02 | 7.90 |
|
||||
| Gold-YOLO-L | 75.1 | 151.7 | 51.8 | 10.65 | 9.78 |
|
||||
| YOLOv8-L | 43.7 | 165.2 | 52.9 | 12.39 | 8.06 |
|
||||
| RT-DETR-R50 | 42.0 | 136.0 | 53.1 | 9.20 | 9.07 |
|
||||
| **[YOLOv10-L][5]** | **24.4** | **120.3** | **53.4** | **7.28** | **7.21** |
|
||||
| | | | | | |
|
||||
| YOLOv8-X | 68.2 | 257.8 | 53.9 | 16.86 | 12.83 |
|
||||
| RT-DETR-R101 | 76.0 | 259.0 | 54.3 | 13.71 | 13.58 |
|
||||
| **[YOLOv10-X][6]** | **29.5** | **160.4** | **54.4** | **10.70** | **10.60** |
|
||||
| Model | Params<br><sup>(M) | FLOPs<br><sup>(G) | mAP<sup>val<br>50-95 | Latency<br><sup>(ms) | Latency-forward<br><sup>(ms) |
|
||||
| ------------------ | ------------------ | ----------------- | -------------------- | -------------------- | ---------------------------- |
|
||||
| YOLOv6-3.0-N | 4.7 | 11.4 | 37.0 | 2.69 | **1.76** |
|
||||
| Gold-YOLO-N | 5.6 | 12.1 | **39.6** | 2.92 | 1.82 |
|
||||
| YOLOv8-N | 3.2 | 8.7 | 37.3 | 6.16 | 1.77 |
|
||||
| **[YOLOv10-N][1]** | **2.3** | **6.7** | 39.5 | **1.84** | 1.79 |
|
||||
| | | | | | |
|
||||
| YOLOv6-3.0-S | 18.5 | 45.3 | 44.3 | 3.42 | 2.35 |
|
||||
| Gold-YOLO-S | 21.5 | 46.0 | 45.4 | 3.82 | 2.73 |
|
||||
| YOLOv8-S | 11.2 | 28.6 | 44.9 | 7.07 | **2.33** |
|
||||
| **[YOLOv10-S][2]** | **7.2** | **21.6** | **46.8** | **2.49** | 2.39 |
|
||||
| | | | | | |
|
||||
| RT-DETR-R18 | 20.0 | 60.0 | 46.5 | **4.58** | **4.49** |
|
||||
| YOLOv6-3.0-M | 34.9 | 85.8 | 49.1 | 5.63 | 4.56 |
|
||||
| Gold-YOLO-M | 41.3 | 87.5 | 49.8 | 6.38 | 5.45 |
|
||||
| YOLOv8-M | 25.9 | 78.9 | 50.6 | 9.50 | 5.09 |
|
||||
| **[YOLOv10-M][3]** | **15.4** | **59.1** | **51.3** | 4.74 | 4.63 |
|
||||
| | | | | | |
|
||||
| YOLOv6-3.0-L | 59.6 | 150.7 | 51.8 | 9.02 | 7.90 |
|
||||
| Gold-YOLO-L | 75.1 | 151.7 | 51.8 | 10.65 | 9.78 |
|
||||
| YOLOv8-L | 43.7 | 165.2 | 52.9 | 12.39 | 8.06 |
|
||||
| RT-DETR-R50 | 42.0 | 136.0 | 53.1 | 9.20 | 9.07 |
|
||||
| **[YOLOv10-L][5]** | **24.4** | **120.3** | **53.4** | **7.28** | **7.21** |
|
||||
| | | | | | |
|
||||
| YOLOv8-X | 68.2 | 257.8 | 53.9 | 16.86 | 12.83 |
|
||||
| RT-DETR-R101 | 76.0 | 259.0 | 54.3 | 13.71 | 13.58 |
|
||||
| **[YOLOv10-X][6]** | **29.5** | **160.4** | **54.4** | **10.70** | **10.60** |
|
||||
|
||||
[1]: https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov10n.pt
|
||||
[2]: https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov10s.pt
|
||||
|
|
@ -233,63 +233,56 @@ For detailed implementation, architectural innovations, and experimental results
|
|||
|
||||
## FAQ
|
||||
|
||||
### What is YOLOv10 and how does it differ from previous versions?
|
||||
### What is YOLOv10 and how does it differ from previous YOLO versions?
|
||||
|
||||
YOLOv10 is the latest version of the YOLO (You Only Look Once) series for real-time object detection, optimized for both accuracy and efficiency. Unlike previous versions, YOLOv10 eliminates the need for non-maximum suppression (NMS) by utilizing consistent dual assignments, significantly reducing inference latency. It also introduces a holistic model design approach and various architectural enhancements, making it superior in performance and computational efficiency.
|
||||
YOLOv10, developed by researchers at [Tsinghua University](https://www.tsinghua.edu.cn/en/), introduces several key innovations to real-time object detection. It eliminates the need for non-maximum suppression (NMS) by employing consistent dual assignments during training and optimized model components for superior performance with reduced computational overhead. For more details on its architecture and key features, check out the [YOLOv10 overview](#overview) section.
|
||||
|
||||
Learn more about YOLOv10's architecture in the [Architecture section](#architecture).
|
||||
### How can I get started with running inference using YOLOv10?
|
||||
|
||||
### How can I use YOLOv10 for real-time object detection in Python?
|
||||
For easy inference, you can use the Ultralytics YOLO Python library or the command line interface (CLI). Below are examples of predicting new images using YOLOv10:
|
||||
|
||||
To use YOLOv10 for real-time object detection in Python, you can follow this example script:
|
||||
!!! Example
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
=== "Python"
|
||||
|
||||
# Load a pre-trained YOLOv10n model
|
||||
model = YOLO("yolov10n.pt")
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# Perform object detection on an image
|
||||
results = model("image.jpg")
|
||||
# Load the pre-trained YOLOv10-N model
|
||||
model = YOLO("yolov10n.pt")
|
||||
results = model("image.jpg")
|
||||
results[0].show()
|
||||
```
|
||||
|
||||
# Display the results
|
||||
results[0].show()
|
||||
```
|
||||
=== "CLI"
|
||||
|
||||
This example demonstrates loading a pre-trained YOLOv10 model and performing object detection on an image. You can also run detection via CLI with:
|
||||
```bash
|
||||
yolo detect predict model=yolov10n.pt source=path/to/image.jpg
|
||||
```
|
||||
|
||||
```bash
|
||||
# Load a COCO-pretrained YOLOv10n model and run inference on the 'bus.jpg' image
|
||||
yolo detect predict model=yolov10n.pt source=path/to/bus.jpg
|
||||
```
|
||||
For more usage examples, visit our [Usage Examples](#usage-examples) section.
|
||||
|
||||
Explore more usage examples in the [Usage Examples](#usage-examples) section.
|
||||
### Which model variants does YOLOv10 offer and what are their use cases?
|
||||
|
||||
### What are the key features that make YOLOv10 stand out?
|
||||
YOLOv10 offers several model variants to cater to different use cases:
|
||||
|
||||
YOLOv10 offers several innovative features:
|
||||
- **YOLOv10-N**: Suitable for extremely resource-constrained environments
|
||||
- **YOLOv10-S**: Balances speed and accuracy
|
||||
- **YOLOv10-M**: General-purpose use
|
||||
- **YOLOv10-B**: Higher accuracy with increased width
|
||||
- **YOLOv10-L**: High accuracy at the cost of computational resources
|
||||
- **YOLOv10-X**: Maximum accuracy and performance
|
||||
|
||||
- **NMS-Free Training**: Utilizes consistent dual assignments to eliminate NMS, reducing inference latency.
|
||||
- **Holistic Model Design**: Comprehensive optimization of model components for both efficiency and accuracy, including lightweight classification heads and large-kernel convolutions.
|
||||
- **Enhanced Model Capabilities**: Incorporates partial self-attention modules and other advanced techniques to boost performance without significant computational cost.
|
||||
Each variant is designed for different computational needs and accuracy requirements, making them versatile for a variety of applications. Explore the [Model Variants](#model-variants) section for more information.
|
||||
|
||||
Dive into more details on these features in the [Key Features](#key-features) section.
|
||||
### How does the NMS-free approach in YOLOv10 improve performance?
|
||||
|
||||
### Which model variants are available in YOLOv10 and how do they differ?
|
||||
YOLOv10 eliminates the need for non-maximum suppression (NMS) during inference by employing consistent dual assignments for training. This approach reduces inference latency and enhances prediction efficiency. The architecture also includes a one-to-one head for inference, ensuring that each object gets a single best prediction. For a detailed explanation, see the [Consistent Dual Assignments for NMS-Free Training](#consistent-dual-assignments-for-nms-free-training) section.
|
||||
|
||||
YOLOv10 offers several variants tailored for different application needs:
|
||||
### Where can I find the export options for YOLOv10 models?
|
||||
|
||||
- **YOLOv10-N**: Nano version for extremely resource-constrained environments.
|
||||
- **YOLOv10-S**: Small version balancing speed and accuracy.
|
||||
- **YOLOv10-M**: Medium version for general-purpose use.
|
||||
- **YOLOv10-B**: Balanced version with increased width for higher accuracy.
|
||||
- **YOLOv10-L**: Large version for higher accuracy at the cost of increased computational resources.
|
||||
- **YOLOv10-X**: Extra-large version for maximum accuracy and performance.
|
||||
YOLOv10 supports several export formats, including TorchScript, ONNX, OpenVINO, and TensorRT. However, not all export formats provided by Ultralytics are currently supported for YOLOv10 due to its new operations. For details on the supported formats and instructions on exporting, visit the [Exporting YOLOv10](#exporting-yolov10) section.
|
||||
|
||||
Each variant offers a trade-off between computational efficiency and detection accuracy, suitable for various real-time applications. See the complete [Model Variants](#model-variants) for more information.
|
||||
### What are the performance benchmarks for YOLOv10 models?
|
||||
|
||||
### How does YOLOv10's performance compare to other object detection models?
|
||||
|
||||
YOLOv10 outperforms previous YOLO versions and other state-of-the-art models in both accuracy and efficiency metrics. For example, YOLOv10-S is 1.8x faster than RT-DETR-R18 with a similar Average Precision (AP) on the COCO dataset. Additionally, YOLOv10-B has 46% less latency and 25% fewer parameters than YOLOv9-C while maintaining equivalent performance.
|
||||
|
||||
For detailed benchmark results, check out the [Performance](#performance) section.
|
||||
YOLOv10 outperforms previous YOLO versions and other state-of-the-art models in both accuracy and efficiency. For example, YOLOv10-S is 1.8x faster than RT-DETR-R18 with a similar AP on the COCO dataset. YOLOv10-B shows 46% less latency and 25% fewer parameters than YOLOv9-C with the same performance. Detailed benchmarks can be found in the [Comparisons](#comparisons) section.
|
||||
|
|
|
|||
|
|
@ -99,52 +99,87 @@ Thank you to Joseph Redmon and Ali Farhadi for developing the original YOLOv3.
|
|||
|
||||
## FAQ
|
||||
|
||||
### What is YOLOv3, and how does it improve object detection?
|
||||
### What are the differences between YOLOv3, YOLOv3-Ultralytics, and YOLOv3u?
|
||||
|
||||
YOLOv3 is the third iteration of the _You Only Look Once (YOLO)_ object detection algorithm. It enhances object detection accuracy by utilizing three different sizes of detection kernels: 13x13, 26x26, and 52x52. This allows the model to detect objects at multiple scales, improving accuracy for objects of varying sizes. YOLOv3 also supports multi-label predictions for bounding boxes and includes a superior feature extractor network.
|
||||
|
||||
### Why should I use Ultralytics' implementation of YOLOv3?
|
||||
|
||||
Ultralytics' implementation of YOLOv3, known as YOLOv3-Ultralytics, retains the original model's architecture but adds significant enhancements. It offers more pre-trained models, additional training methods, and customization options, making it user-friendly and versatile for practical applications. This implementation enhances the usability and flexibility of YOLOv3 in real-world object detection tasks.
|
||||
|
||||
### How does YOLOv3u differ from YOLOv3 and YOLOv3-Ultralytics?
|
||||
|
||||
YOLOv3u is an updated version of YOLOv3-Ultralytics that incorporates the anchor-free, objectness-free split head used in YOLOv8 models. This update eliminates the need for pre-defined anchor boxes and objectness scores, making YOLOv3u more robust and accurate in detecting objects of varying sizes and shapes, without altering the backbone and neck architecture of YOLOv3.
|
||||
|
||||
### Can I use YOLOv3 models for multiple prediction tasks?
|
||||
|
||||
Yes, the YOLOv3 series, including YOLOv3, YOLOv3-Ultralytics, and YOLOv3u, are designed for object detection tasks. They support several modes such as [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md). This versatility ensures they can be used effectively across different stages of model deployment and development in various applications.
|
||||
YOLOv3 is the third iteration of the YOLO (You Only Look Once) object detection algorithm developed by Joseph Redmon, known for its balance of accuracy and speed, utilizing three different scales (13x13, 26x26, and 52x52) for detections. YOLOv3-Ultralytics is Ultralytics' adaptation of YOLOv3 that adds support for more pre-trained models and facilitates easier model customization. YOLOv3u is an upgraded variant of YOLOv3-Ultralytics, integrating the anchor-free, objectness-free split head from YOLOv8, improving detection robustness and accuracy for various object sizes. For more details on the variants, refer to the [YOLOv3 series](https://github.com/ultralytics/yolov3).
|
||||
|
||||
### How can I train a YOLOv3 model using Ultralytics?
|
||||
|
||||
You can train a YOLOv3 model using Ultralytics by leveraging the Python code or CLI commands:
|
||||
Training a YOLOv3 model with Ultralytics is straightforward. You can train the model using either Python or CLI:
|
||||
|
||||
**Using Python:**
|
||||
!!! Example
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
=== "Python"
|
||||
|
||||
# Load a COCO-pretrained YOLOv3n model
|
||||
model = YOLO("yolov3n.pt")
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
# Load a COCO-pretrained YOLOv3n model
|
||||
model = YOLO("yolov3n.pt")
|
||||
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
|
||||
```
|
||||
|
||||
# Run inference with the YOLOv3n model on the 'bus.jpg' image
|
||||
results = model("path/to/bus.jpg")
|
||||
```
|
||||
=== "CLI"
|
||||
|
||||
**Using CLI:**
|
||||
```bash
|
||||
# Load a COCO-pretrained YOLOv3n model and train it on the COCO8 example dataset for 100 epochs
|
||||
yolo train model=yolov3n.pt data=coco8.yaml epochs=100 imgsz=640
|
||||
```
|
||||
|
||||
```bash
|
||||
# Load a COCO-pretrained YOLOv3n model and train it on the COCO8 example dataset for 100 epochs
|
||||
yolo train model=yolov3n.pt data=coco8.yaml epochs=100 imgsz=640
|
||||
For more comprehensive training options and guidelines, visit our [Train mode documentation](../modes/train.md).
|
||||
|
||||
# Load a COCO-pretrained YOLOv3n model and run inference on the 'bus.jpg' image
|
||||
yolo predict model=yolov3n.pt source=path/to/bus.jpg
|
||||
```
|
||||
### What makes YOLOv3u more accurate for object detection tasks?
|
||||
|
||||
For more details, visit the [Train](../modes/train.md) and [Predict](../modes/predict.md) documentation pages.
|
||||
YOLOv3u improves upon YOLOv3 and YOLOv3-Ultralytics by incorporating the anchor-free, objectness-free split head used in YOLOv8 models. This upgrade eliminates the need for pre-defined anchor boxes and objectness scores, enhancing its capability to detect objects of varying sizes and shapes more precisely. This makes YOLOv3u a better choice for complex and diverse object detection tasks. For more information, refer to the [Why YOLOv3u](#overview) section.
|
||||
|
||||
### How can I use YOLOv3 models for inference?
|
||||
|
||||
You can perform inference using YOLOv3 models by either Python scripts or CLI commands:
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# Load a COCO-pretrained YOLOv3n model
|
||||
model = YOLO("yolov3n.pt")
|
||||
|
||||
# Run inference with the YOLOv3n model on the 'bus.jpg' image
|
||||
results = model("path/to/bus.jpg")
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
```bash
|
||||
# Load a COCO-pretrained YOLOv3n model and run inference on the 'bus.jpg' image
|
||||
yolo predict model=yolov3n.pt source=path/to/bus.jpg
|
||||
```
|
||||
|
||||
Refer to the [Inference mode documentation](../modes/predict.md) for more details on running YOLO models.
|
||||
|
||||
### What tasks are supported by YOLOv3 and its variants?
|
||||
|
||||
YOLOv3, YOLOv3-Ultralytics, and YOLOv3u primarily support object detection tasks. These models can be used for various stages of model deployment and development, such as Inference, Validation, Training, and Export. For a comprehensive set of tasks supported and more in-depth details, visit our [Object Detection tasks documentation](../tasks/detect.md).
|
||||
|
||||
### Where can I find resources to cite YOLOv3 in my research?
|
||||
|
||||
If you use YOLOv3 in your research, please cite the original YOLO papers and the Ultralytics YOLOv3 repository. Example BibTeX citation:
|
||||
|
||||
!!! Quote ""
|
||||
|
||||
=== "BibTeX"
|
||||
|
||||
```bibtex
|
||||
@article{redmon2018yolov3,
|
||||
title={YOLOv3: An Incremental Improvement},
|
||||
author={Redmon, Joseph and Farhadi, Ali},
|
||||
journal={arXiv preprint arXiv:1804.02767},
|
||||
year={2018}
|
||||
}
|
||||
```
|
||||
|
||||
For more citation details, refer to the [Citations and Acknowledgements](#citations-and-acknowledgements) section.
|
||||
|
|
|
|||
|
|
@ -71,41 +71,22 @@ The original YOLOv4 paper can be found on [arXiv](https://arxiv.org/abs/2004.109
|
|||
|
||||
## FAQ
|
||||
|
||||
### What are the key features of the YOLOv4 model?
|
||||
### What is YOLOv4 and why should I use it for object detection?
|
||||
|
||||
YOLOv4, which stands for "You Only Look Once version 4," is designed with several innovative features that optimize its performance. Key features include:
|
||||
YOLOv4, which stands for "You Only Look Once version 4," is a state-of-the-art real-time object detection model developed by Alexey Bochkovskiy in 2020. It achieves an optimal balance between speed and accuracy, making it highly suitable for real-time applications. YOLOv4's architecture incorporates several innovative features like Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), and Self-adversarial-training (SAT), among others, to achieve state-of-the-art results. If you're looking for a high-performance model that operates efficiently on conventional GPUs, YOLOv4 is an excellent choice.
|
||||
|
||||
- **Weighted-Residual-Connections (WRC)**
|
||||
- **Cross-Stage-Partial-connections (CSP)**
|
||||
- **Cross mini-Batch Normalization (CmBN)**
|
||||
- **Self-adversarial training (SAT)**
|
||||
- **Mish-activation**
|
||||
- **Mosaic data augmentation**
|
||||
- **DropBlock regularization**
|
||||
- **CIoU loss**
|
||||
These features collectively enhance YOLOv4's speed and accuracy, making it ideal for real-time object detection tasks. For more details on its architecture, you can visit the [YOLOv4 section](https://docs.ultralytics.com/models/yolov4).
|
||||
### How does the architecture of YOLOv4 enhance its performance?
|
||||
|
||||
### How does YOLOv4 compare to its predecessor, YOLOv3?
|
||||
The architecture of YOLOv4 includes several key components: the backbone, the neck, and the head. The backbone, which can be models like VGG, ResNet, or CSPDarknet53, is pre-trained to predict classes and bounding boxes. The neck, utilizing PANet, connects feature maps from different stages for comprehensive data extraction. Finally, the head, which uses configurations from YOLOv3, makes the final object detections. YOLOv4 also employs "bag of freebies" techniques like mosaic data augmentation and DropBlock regularization, further optimizing its speed and accuracy.
|
||||
|
||||
YOLOv4 introduces several improvements over YOLOv3, including advanced features such as Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), and Cross mini-Batch Normalization (CmBN). These enhancements contribute to better speed and accuracy in object detection:
|
||||
### What are "bag of freebies" in the context of YOLOv4?
|
||||
|
||||
- **Higher Accuracy:** YOLOv4 achieves state-of-the-art results in object detection benchmarks.
|
||||
- **Improved Speed:** Despite its complex architecture, YOLOv4 maintains real-time performance.
|
||||
- **Better Backbone and Neck:** YOLOv4 utilizes CSPDarknet53 as the backbone and PANet as the neck, which are more advanced than YOLOv3's components.
|
||||
For more information, compare the features in the [YOLOv3](yolov3.md) and YOLOv4 documentation.
|
||||
"Bag of freebies" refers to methods that improve the training accuracy of YOLOv4 without increasing the cost of inference. These techniques include various forms of data augmentation like photometric distortions (adjusting brightness, contrast, etc.) and geometric distortions (scaling, cropping, flipping, rotating). By increasing the variability of the input images, these augmentations help YOLOv4 generalize better to different types of images, thereby improving its robustness and accuracy without compromising its real-time performance.
|
||||
|
||||
### Can YOLOv4 be used for training on a conventional GPU?
|
||||
### Why is YOLOv4 considered suitable for real-time object detection on conventional GPUs?
|
||||
|
||||
Yes, YOLOv4 is designed to be efficient on conventional GPU hardware, making it accessible for various users. The model can be trained using a single GPU, which broadens its usability for researchers and developers without access to high-end hardware. The architecture balances efficiency and computational requirements, allowing real-time object detection even on affordable hardware. For specific training guidelines, refer to the instructions provided in the [YOLOv4 GitHub repository](https://github.com/AlexeyAB/darknet).
|
||||
YOLOv4 is designed to optimize both speed and accuracy, making it ideal for real-time object detection tasks that require quick and reliable performance. It operates efficiently on conventional GPUs, needing only one for both training and inference. This makes it accessible and practical for various applications ranging from recommendation systems to standalone process management, thereby reducing the need for extensive hardware setups and making it a cost-effective solution for real-time object detection.
|
||||
|
||||
### What is the "bag of freebies" in YOLOv4?
|
||||
### How can I get started with YOLOv4 if Ultralytics does not currently support it?
|
||||
|
||||
The "bag of freebies" in YOLOv4 refers to techniques that enhance model accuracy during training without increasing inference costs. These include:
|
||||
|
||||
- **Photometric Distortions:** Adjusting brightness, contrast, hue, saturation, and noise.
|
||||
- **Geometric Distortions:** Applying random scaling, cropping, flipping, and rotating.
|
||||
These techniques improve the model's robustness and ability to generalize across different image types. Learn more about these methods in the [YOLOv4 features and performance](#features-and-performance) section.
|
||||
|
||||
### Is YOLOv4 supported by Ultralytics?
|
||||
|
||||
As of the latest update, Ultralytics does not currently support YOLOv4 models. Users interested in utilizing YOLOv4 should refer to the original [YOLOv4 GitHub repository](https://github.com/AlexeyAB/darknet) for installation and usage instructions. Ultralytics intends to update their documentation and support once integration with YOLOv4 is implemented. For alternative models supported by Ultralytics, you can explore [Ultralytics YOLO models](https://docs.ultralytics.com/models/).
|
||||
To get started with YOLOv4, you should visit the official [YOLOv4 GitHub repository](https://github.com/AlexeyAB/darknet). Follow the installation instructions provided in the README file, which typically include cloning the repository, installing dependencies, and setting up environment variables. Once installed, you can train the model by preparing your dataset, configuring the model parameters, and following the usage instructions provided. Since Ultralytics does not currently support YOLOv4, it is recommended to refer directly to the YOLOv4 GitHub for the most up-to-date and detailed guidance.
|
||||
|
|
|
|||
|
|
@ -115,69 +115,48 @@ Please note that YOLOv5 models are provided under [AGPL-3.0](https://github.com/
|
|||
|
||||
## FAQ
|
||||
|
||||
### What is YOLOv5u and how does it differ from YOLOv5?
|
||||
### What is Ultralytics YOLOv5u and how does it differ from YOLOv5?
|
||||
|
||||
YOLOv5u is an advanced version of the YOLOv5 object detection model developed by Ultralytics. It introduces an anchor-free, objectness-free split head, a feature adopted from the YOLOv8 models. This architectural change enhances the model's accuracy-speed tradeoff, making it more efficient and flexible for various object detection tasks. Learn more about these features in the [YOLOv5 Overview](#overview).
|
||||
Ultralytics YOLOv5u is an advanced version of YOLOv5, integrating the anchor-free, objectness-free split head that enhances the accuracy-speed tradeoff for real-time object detection tasks. Unlike the traditional YOLOv5, YOLOv5u adopts an anchor-free detection mechanism, making it more flexible and adaptive in diverse scenarios. For more detailed information on its features, you can refer to the [YOLOv5 Overview](#overview).
|
||||
|
||||
### Why should I use the anchor-free split head in YOLOv5u?
|
||||
### How does the anchor-free Ultralytics head improve object detection performance in YOLOv5u?
|
||||
|
||||
The anchor-free split head in YOLOv5u offers several advantages:
|
||||
The anchor-free Ultralytics head in YOLOv5u improves object detection performance by eliminating the dependency on predefined anchor boxes. This results in a more flexible and adaptive detection mechanism that can handle various object sizes and shapes with greater efficiency. This enhancement directly contributes to a balanced tradeoff between accuracy and speed, making YOLOv5u suitable for real-time applications. Learn more about its architecture in the [Key Features](#key-features) section.
|
||||
|
||||
- **Flexibility:** It alleviates the need for predefined anchor boxes, making the model more adaptable to diverse object scales and shapes.
|
||||
- **Simplicity:** Reducing dependencies on anchor boxes simplifies the model architecture, potentially decreasing the computational load.
|
||||
- **Performance:** Empirical results show enhanced performance in terms of accuracy and speed, making it suitable for real-time applications.
|
||||
### Can I use pre-trained YOLOv5u models for different tasks and modes?
|
||||
|
||||
For detailed information, see the [Anchor-free Split Ultralytics Head section](#key-features).
|
||||
Yes, you can use pre-trained YOLOv5u models for various tasks such as [Object Detection](../tasks/detect.md). These models support multiple modes, including [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md). This flexibility allows users to leverage the capabilities of YOLOv5u models across different operational requirements. For a detailed overview, check the [Supported Tasks and Modes](#supported-tasks-and-modes) section.
|
||||
|
||||
### How can I deploy the YOLOv5u model for real-time object detection?
|
||||
### How do the performance metrics of YOLOv5u models compare on different platforms?
|
||||
|
||||
Deploying YOLOv5u for real-time object detection involves several steps:
|
||||
The performance metrics of YOLOv5u models vary depending on the platform and hardware used. For example, the YOLOv5nu model achieves a 34.3 mAP on COCO dataset with a speed of 73.6 ms on CPU (ONNX) and 1.06 ms on A100 TensorRT. Detailed performance metrics for different YOLOv5u models can be found in the [Performance Metrics](#performance-metrics) section, which provides a comprehensive comparison across various devices.
|
||||
|
||||
1. **Load the Model:**
|
||||
### How can I train a YOLOv5u model using the Ultralytics Python API?
|
||||
|
||||
You can train a YOLOv5u model by loading a pre-trained model and running the training command with your dataset. Here's a quick example:
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
model = YOLO("yolov5u.pt")
|
||||
# Load a COCO-pretrained YOLOv5n model
|
||||
model = YOLO("yolov5n.pt")
|
||||
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
|
||||
```
|
||||
|
||||
2. **Run Inference:**
|
||||
```python
|
||||
results = model("path/to/image.jpg")
|
||||
=== "CLI"
|
||||
|
||||
```bash
|
||||
# Load a COCO-pretrained YOLOv5n model and train it on the COCO8 example dataset for 100 epochs
|
||||
yolo train model=yolov5n.pt data=coco8.yaml epochs=100 imgsz=640
|
||||
```
|
||||
|
||||
For a comprehensive guide, refer to the [Usage Examples](#usage-examples) section.
|
||||
|
||||
### What are the pre-trained model variants available for YOLOv5u?
|
||||
|
||||
YOLOv5u offers a variety of pre-trained models to cater to different needs:
|
||||
|
||||
- **YOLOv5nu**
|
||||
- **YOLOv5su**
|
||||
- **YOLOv5mu**
|
||||
- **YOLOv5lu**
|
||||
- **YOLOv5xu**
|
||||
- **YOLOv5n6u**
|
||||
- **YOLOv5s6u**
|
||||
- **YOLOv5m6u**
|
||||
- **YOLOv5l6u**
|
||||
- **YOLOv5x6u**
|
||||
|
||||
These models support tasks like detection and offer various modes such as [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md). For detailed metrics, see the [Performance Metrics](#performance-metrics) section.
|
||||
|
||||
### How do YOLOv5u models perform on different hardware setups?
|
||||
|
||||
YOLOv5u models have been evaluated on both CPU and GPU hardware, demonstrating competitive performance metrics across various setups. For example:
|
||||
|
||||
- **YOLOv5nu.pt:**
|
||||
|
||||
- **Speed (CPU ONNX):** 73.6 ms
|
||||
- **Speed (A100 TensorRT):** 1.06 ms
|
||||
- **mAP (50-95):** 34.3
|
||||
|
||||
- **YOLOv5lu.pt:**
|
||||
- **Speed (CPU ONNX):** 408.4 ms
|
||||
- **Speed (A100 TensorRT):** 2.50 ms
|
||||
- **mAP (50-95):** 52.2
|
||||
|
||||
For more detailed performance metrics, visit the [Performance Metrics](#performance-metrics) section.
|
||||
For more detailed instructions, visit the [Usage Examples](#usage-examples) section.
|
||||
|
|
|
|||
|
|
@ -107,53 +107,56 @@ The original YOLOv6 paper can be found on [arXiv](https://arxiv.org/abs/2301.055
|
|||
|
||||
## FAQ
|
||||
|
||||
### What is Meituan YOLOv6 and how does it differ from other YOLO models?
|
||||
### What is Meituan YOLOv6 and what makes it unique?
|
||||
|
||||
Meituan YOLOv6 is a highly advanced object detection model that balances speed and accuracy, making it ideal for real-time applications. This model features unique enhancements such as the Bidirectional Concatenation (BiC) module, Anchor-Aided Training (AAT) strategy, and an improved backbone and neck design, providing state-of-the-art performance on the COCO dataset. Unlike prior YOLO models, YOLOv6 incorporates these innovative strategies to enhance both inference speed and detection accuracy.
|
||||
Meituan YOLOv6 is a state-of-the-art object detector that balances speed and accuracy, ideal for real-time applications. It features notable architectural enhancements like the Bi-directional Concatenation (BiC) module and an Anchor-Aided Training (AAT) strategy. These innovations provide substantial performance gains with minimal speed degradation, making YOLOv6 a competitive choice for object detection tasks.
|
||||
|
||||
### How do I use the YOLOv6 model in a Python script?
|
||||
### How does the Bi-directional Concatenation (BiC) Module in YOLOv6 improve performance?
|
||||
|
||||
Using the YOLOv6 model in a Python script is straightforward. Here is a sample code snippet to get you started:
|
||||
The Bi-directional Concatenation (BiC) module in YOLOv6 enhances localization signals in the detector's neck, delivering performance improvements with negligible speed impact. This module effectively combines different feature maps, increasing the model's ability to detect objects accurately. For more details on YOLOv6's features, refer to the [Key Features](#key-features) section.
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
### How can I train a YOLOv6 model using Ultralytics?
|
||||
|
||||
# Build a YOLOv6n model from scratch
|
||||
model = YOLO("yolov6n.yaml")
|
||||
You can train a YOLOv6 model using Ultralytics with simple Python or CLI commands. For instance:
|
||||
|
||||
# Display model information (optional)
|
||||
model.info()
|
||||
!!! Example
|
||||
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
|
||||
=== "Python"
|
||||
|
||||
# Run inference with the YOLOv6n model on the 'bus.jpg' image
|
||||
results = model("path/to/bus.jpg")
|
||||
```
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
For more detailed examples and documentation, visit the [Train](../modes/train.md) and [Predict](../modes/predict.md) pages.
|
||||
# Build a YOLOv6n model from scratch
|
||||
model = YOLO("yolov6n.yaml")
|
||||
|
||||
### What are the performance metrics for different scales of YOLOv6 models?
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
|
||||
```
|
||||
|
||||
YOLOv6 offers pretrained models in various scales with the following performance metrics on the COCO val2017 dataset:
|
||||
=== "CLI"
|
||||
|
||||
- **YOLOv6-N**: 37.5% AP at 1187 FPS using an NVIDIA Tesla T4 GPU
|
||||
- **YOLOv6-S**: 45.0% AP at 484 FPS
|
||||
- **YOLOv6-M**: 50.0% AP at 226 FPS
|
||||
- **YOLOv6-L**: 52.8% AP at 116 FPS
|
||||
- **YOLOv6-L6**: State-of-the-art accuracy for real-time
|
||||
```bash
|
||||
yolo train model=yolov6n.yaml data=coco8.yaml epochs=100 imgsz=640
|
||||
```
|
||||
|
||||
These metrics make YOLOv6 a versatile choice for both high-accuracy and high-speed applications.
|
||||
For more information, visit the [Train](../modes/train.md) page.
|
||||
|
||||
### What are the unique features of YOLOv6 that improve its performance?
|
||||
### What are the different versions of YOLOv6 and their performance metrics?
|
||||
|
||||
YOLOv6 introduces several key features that enhance its performance:
|
||||
YOLOv6 offers multiple versions, each optimized for different performance requirements:
|
||||
|
||||
- **Bidirectional Concatenation (BiC) Module**: Improves localization signals and offers performance gains with minimal speed degradation.
|
||||
- **Anchor-Aided Training (AAT) Strategy**: Combines the benefits of anchor-based and anchor-free methods for better efficiency without sacrificing inference speed.
|
||||
- **Enhanced Backbone and Neck Design**: Adds additional stages to the backbone and neck, achieving state-of-the-art results on high-resolution inputs.
|
||||
- **Self-Distillation Strategy**: Boosts smaller model performance by refining the auxiliary regression branch during training and removing it during inference to maintain speed.
|
||||
- YOLOv6-N: 37.5% AP at 1187 FPS
|
||||
- YOLOv6-S: 45.0% AP at 484 FPS
|
||||
- YOLOv6-M: 50.0% AP at 226 FPS
|
||||
- YOLOv6-L: 52.8% AP at 116 FPS
|
||||
- YOLOv6-L6: State-of-the-art accuracy in real-time scenarios
|
||||
|
||||
### How can YOLOv6 be used for mobile and embedded applications?
|
||||
These models are evaluated on the COCO dataset using an NVIDIA Tesla T4 GPU. For more on performance metrics, see the [Performance Metrics](#performance-metrics) section.
|
||||
|
||||
YOLOv6 supports quantized models for different precisions and models optimized for mobile platforms, making it suitable for applications requiring low-latency and energy-efficient computations. For deployment on mobile and edge devices, you can explore conversion to formats like TFLite and ONNX, as detailed in the [Export](../modes/export.md) documentation. Quantized models ensure high performance even on resource-constrained devices, enabling real-time object detection in mobile and IoT applications.
|
||||
### How does the Anchor-Aided Training (AAT) strategy benefit YOLOv6?
|
||||
|
||||
Anchor-Aided Training (AAT) in YOLOv6 combines elements of anchor-based and anchor-free approaches, enhancing the model's detection capabilities without compromising inference efficiency. This strategy leverages anchors during training to improve bounding box predictions, making YOLOv6 effective in diverse object detection tasks.
|
||||
|
||||
### Which operational modes are supported by YOLOv6 models in Ultralytics?
|
||||
|
||||
YOLOv6 supports various operational modes including Inference, Validation, Training, and Export. This flexibility allows users to fully exploit the model's capabilities in different scenarios. Check out the [Supported Tasks and Modes](#supported-tasks-and-modes) section for a detailed overview of each mode.
|
||||
|
|
|
|||
|
|
@ -115,22 +115,40 @@ The original YOLOv7 paper can be found on [arXiv](https://arxiv.org/pdf/2207.026
|
|||
|
||||
## FAQ
|
||||
|
||||
### What makes YOLOv7 the most accurate real-time object detector?
|
||||
### What is YOLOv7 and why is it considered a breakthrough in real-time object detection?
|
||||
|
||||
YOLOv7 stands out due to its superior accuracy and speed. With an accuracy of 56.8% AP and the ability to process up to 161 FPS on a GPU V100, it surpasses all known real-time object detectors. Additionally, YOLOv7 introduces features like model re-parameterization, dynamic label assignment, and efficient parameter usage, enhancing both speed and accuracy. Check out the detailed comparison in the [source paper](https://arxiv.org/pdf/2207.02696.pdf).
|
||||
YOLOv7 is a cutting-edge real-time object detection model that achieves unparalleled speed and accuracy. It surpasses other models, such as YOLOX, YOLOv5, and PPYOLOE, in both parameters usage and inference speed. YOLOv7's distinguishing features include its model re-parameterization and dynamic label assignment, which optimize its performance without increasing inference costs. For more technical details about its architecture and comparison metrics with other state-of-the-art object detectors, refer to the [YOLOv7 paper](https://arxiv.org/pdf/2207.02696.pdf).
|
||||
|
||||
### How does model re-parameterization work in YOLOv7?
|
||||
### How does YOLOv7 improve on previous YOLO models like YOLOv4 and YOLOv5?
|
||||
|
||||
Model re-parameterization in YOLOv7 involves optimizing the gradient propagation path across various network layers. This strategy effectively recalibrates the training process, improving detection accuracy without increasing the inference cost. For more details, refer to the [Model Re-parameterization section](#key-features) in the documentation.
|
||||
YOLOv7 introduces several innovations, including model re-parameterization and dynamic label assignment, which enhance the training process and improve inference accuracy. Compared to YOLOv5, YOLOv7 significantly boosts speed and accuracy. For instance, YOLOv7-X improves accuracy by 2.2% and reduces parameters by 22% compared to YOLOv5-X. Detailed comparisons can be found in the performance table [YOLOv7 comparison with SOTA object detectors](#comparison-of-sota-object-detectors).
|
||||
|
||||
### Why should I choose YOLOv7 over YOLOv5 or other object detectors?
|
||||
### Can I use YOLOv7 with Ultralytics tools and platforms?
|
||||
|
||||
YOLOv7 outperforms YOLOv5 and other detectors like YOLOR, YOLOX, and PPYOLOE in both speed and accuracy. For instance, YOLOv7 achieves 127 FPS faster and 10.7% higher accuracy compared to YOLOv5-N. Furthermore, YOLOv7 effectively reduces parameters and computation while delivering higher AP scores, making it an optimal choice for real-time applications. Learn more in our [comparison table](#comparison-of-sota-object-detectors).
|
||||
As of now, Ultralytics does not directly support YOLOv7 in its tools and platforms. Users interested in using YOLOv7 need to follow the installation and usage instructions provided in the [YOLOv7 GitHub repository](https://github.com/WongKinYiu/yolov7). For other state-of-the-art models, you can explore and train using Ultralytics tools like [Ultralytics HUB](../hub/quickstart.md).
|
||||
|
||||
### What datasets is YOLOv7 trained on, and how do they impact its performance?
|
||||
### How do I install and run YOLOv7 for a custom object detection project?
|
||||
|
||||
YOLOv7 is trained exclusively on the MS COCO dataset without using additional datasets or pre-trained weights. This robust dataset provides a wide variety of images and annotations that contribute to YOLOv7's high accuracy and generalization capabilities. Explore more about dataset formats and usage in our [datasets section](https://docs.ultralytics.com/datasets/detect/coco/).
|
||||
To install and run YOLOv7, follow these steps:
|
||||
|
||||
### Are there any practical YOLOv7 usage examples available?
|
||||
1. Clone the YOLOv7 repository:
|
||||
```bash
|
||||
git clone https://github.com/WongKinYiu/yolov7
|
||||
```
|
||||
2. Navigate to the cloned directory and install dependencies:
|
||||
```bash
|
||||
cd yolov7
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
3. Prepare your dataset and configure the model parameters according to the [usage instructions](https://github.com/WongKinYiu/yolov7) provided in the repository.
|
||||
For further guidance, visit the YOLOv7 GitHub repository for the latest information and updates.
|
||||
|
||||
Currently, Ultralytics does not directly support YOLOv7 models. However, you can find detailed installation and usage instructions on the YOLOv7 GitHub repository. These steps involve cloning the repository, installing dependencies, and setting up your environment to train and use the model. Follow the [YOLOv7 GitHub repository](https://github.com/WongKinYiu/yolov7) for the latest updates. For other examples, see our [usage examples](#usage-examples) section.
|
||||
### What are the key features and optimizations introduced in YOLOv7?
|
||||
|
||||
YOLOv7 offers several key features that revolutionize real-time object detection:
|
||||
|
||||
- **Model Re-parameterization**: Enhances the model's performance by optimizing gradient propagation paths.
|
||||
- **Dynamic Label Assignment**: Uses a coarse-to-fine lead guided method to assign dynamic targets for outputs across different branches, improving accuracy.
|
||||
- **Extended and Compound Scaling**: Efficiently utilizes parameters and computation to scale the model for various real-time applications.
|
||||
- **Efficiency**: Reduces parameter count by 40% and computation by 50% compared to other state-of-the-art models while achieving faster inference speeds.
|
||||
For further details on these features, see the [YOLOv7 Overview](#overview) section.
|
||||
|
|
|
|||
|
|
@ -187,31 +187,63 @@ Please note that the DOI is pending and will be added to the citation once it is
|
|||
|
||||
## FAQ
|
||||
|
||||
### What differentiates YOLOv8 from previous YOLO versions?
|
||||
### What is YOLOv8 and how does it differ from previous YOLO versions?
|
||||
|
||||
YOLOv8 builds upon the advancements of its predecessors by incorporating state-of-the-art backbone and neck architectures for improved feature extraction and object detection performance. It utilizes an anchor-free split head for better accuracy and efficiency. With a focus on maintaining the optimal accuracy-speed tradeoff, YOLOv8 is suitable for real-time object detection across diverse applications. Explore more in [YOLOv8 Key Features](#key-features).
|
||||
YOLOv8 is the latest iteration in the Ultralytics YOLO series, designed to improve real-time object detection performance with advanced features. Unlike earlier versions, YOLOv8 incorporates an **anchor-free split Ultralytics head**, state-of-the-art backbone and neck architectures, and offers optimized accuracy-speed tradeoff, making it ideal for diverse applications. For more details, check the [Overview](#overview) and [Key Features](#key-features) sections.
|
||||
|
||||
### How can I use YOLOv8 for different tasks like segmentation and pose estimation?
|
||||
### How can I use YOLOv8 for different computer vision tasks?
|
||||
|
||||
YOLOv8 is versatile, offering specialized variants for various tasks such as object detection, instance segmentation, pose/keypoints detection, oriented object detection, and classification. These models come pre-trained and are optimized for high performance and accuracy. For more details, refer to the [Supported Tasks and Modes](#supported-tasks-and-modes).
|
||||
YOLOv8 supports a wide range of computer vision tasks, including object detection, instance segmentation, pose/keypoints detection, oriented object detection, and classification. Each model variant is optimized for its specific task and compatible with various operational modes like [Inference](../modes/predict.md), [Validation](../modes/val.md), [Training](../modes/train.md), and [Export](../modes/export.md). Refer to the [Supported Tasks and Modes](#supported-tasks-and-modes) section for more information.
|
||||
|
||||
### How do I run inference using a YOLOv8 model in Python?
|
||||
### What are the performance metrics for YOLOv8 models?
|
||||
|
||||
To run inference with a YOLOv8 model in Python, you can use the `YOLO` class from the Ultralytics package. Here's a basic example:
|
||||
YOLOv8 models achieve state-of-the-art performance across various benchmarking datasets. For instance, the YOLOv8n model achieves a mAP (mean Average Precision) of 37.3 on the COCO dataset and a speed of 0.99 ms on A100 TensorRT. Detailed performance metrics for each model variant across different tasks and datasets can be found in the [Performance Metrics](#performance-metrics) section.
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
### How do I train a YOLOv8 model?
|
||||
|
||||
model = YOLO("yolov8n.pt")
|
||||
results = model("path/to/image.jpg")
|
||||
```
|
||||
Training a YOLOv8 model can be done using either Python or CLI. Below are examples for training a model using a COCO-pretrained YOLOv8 model on the COCO8 dataset for 100 epochs:
|
||||
|
||||
For detailed examples, see the [Usage Examples](#usage-examples) section.
|
||||
!!! Example
|
||||
|
||||
### What are the performance benchmarks for YOLOv8 models?
|
||||
=== "Python"
|
||||
|
||||
YOLOv8 models are benchmarked on datasets such as COCO and Open Images V7, showing significant improvements in mAP and speed across various hardware setups. Detailed performance metrics include parameters, FLOPs, and inference speeds on different devices. For comprehensive benchmark details, visit [Performance Metrics](#performance-metrics).
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
### How do I export a YOLOv8 model for deployment?
|
||||
# Load a COCO-pretrained YOLOv8n model
|
||||
model = YOLO("yolov8n.pt")
|
||||
|
||||
You can export YOLOv8 models to various formats like ONNX, TensorRT, and CoreML for seamless deployment across different platforms. The export process ensures maximum compatibility and performance optimization. Learn more about exporting models in the [Export](../modes/export.md) section.
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
```bash
|
||||
yolo train model=yolov8n.pt data=coco8.yaml epochs=100 imgsz=640
|
||||
```
|
||||
|
||||
For further details, visit the [Training](../modes/train.md) documentation.
|
||||
|
||||
### Can I benchmark YOLOv8 models for performance?
|
||||
|
||||
Yes, YOLOv8 models can be benchmarked for performance in terms of speed and accuracy across various export formats. You can use PyTorch, ONNX, TensorRT, and more for benchmarking. Below are example commands for benchmarking using Python and CLI:
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics.utils.benchmarks import benchmark
|
||||
|
||||
# Benchmark on GPU
|
||||
benchmark(model="yolov8n.pt", data="coco8.yaml", imgsz=640, half=False, device=0)
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
```bash
|
||||
yolo benchmark model=yolov8n.pt data='coco8.yaml' imgsz=640 half=False device=0
|
||||
```
|
||||
|
||||
For additional information, check the [Performance Metrics](#performance-metrics) section.
|
||||
|
|
|
|||
|
|
@ -95,7 +95,7 @@ Comparatively, YOLOv9 exhibits remarkable gains:
|
|||
- **Lightweight Models**: YOLOv9s surpasses the YOLO MS-S in parameter efficiency and computational load while achieving an improvement of 0.4∼0.6% in AP.
|
||||
- **Medium to Large Models**: YOLOv9m and YOLOv9e show notable advancements in balancing the trade-off between model complexity and detection performance, offering significant reductions in parameters and computations against the backdrop of improved accuracy.
|
||||
|
||||
The YOLOv9c model, in particular, highlights the effectiveness of the architecture's optimizations. It operates with 42% fewer parameters and 21% less computational demand than YOLOv7 AF, yet it achieves comparable accuracy, demonstrating YOLOv9's significant efficiency improvements. Furthermore, the YOLOv9e model sets a new standard for large models, with 15% fewer parameters and 25% less computational need than [YOLOv8x](yolov8.md), alongside a incremental 1.7% improvement in AP.
|
||||
The YOLOv9c model, in particular, highlights the effectiveness of the architecture's optimizations. It operates with 42% fewer parameters and 21% less computational demand than YOLOv7 AF, yet it achieves comparable accuracy, demonstrating YOLOv9's significant efficiency improvements. Furthermore, the YOLOv9e model sets a new standard for large models, with 15% fewer parameters and 25% less computational need than [YOLOv8x](yolov8.md), alongside an incremental 1.7% improvement in AP.
|
||||
|
||||
These results showcase YOLOv9's strategic advancements in model design, emphasizing its enhanced efficiency without compromising on the precision essential for real-time object detection tasks. The model not only pushes the boundaries of performance metrics but also emphasizes the importance of computational efficiency, making it a pivotal development in the field of computer vision.
|
||||
|
||||
|
|
@ -180,32 +180,38 @@ The original YOLOv9 paper can be found on [arXiv](https://arxiv.org/pdf/2402.136
|
|||
|
||||
## FAQ
|
||||
|
||||
### What is YOLOv9 and why should I use it for real-time object detection?
|
||||
### What innovations does YOLOv9 introduce for real-time object detection?
|
||||
|
||||
YOLOv9 is the latest iteration of the YOLO (You Only Look Once) object detection family, featuring groundbreaking techniques like Programmable Gradient Information (PGI) and Generalized Efficient Layer Aggregation Network (GELAN). This model demonstrates significant improvements in efficiency, accuracy, and adaptability, setting new benchmarks on the [MS COCO](../datasets/detect/coco.md) dataset. The integration of PGI and GELAN ensures retention of crucial data throughout the detection process, making YOLOv9 an exceptional choice for real-time object detection.
|
||||
YOLOv9 introduces groundbreaking techniques such as Programmable Gradient Information (PGI) and the Generalized Efficient Layer Aggregation Network (GELAN). These innovations address information loss challenges in deep neural networks, ensuring high efficiency, accuracy, and adaptability. PGI preserves essential data across network layers, while GELAN optimizes parameter utilization and computational efficiency. Learn more about [YOLOv9's core innovations](#core-innovations-of-yolov9) that set new benchmarks on the MS COCO dataset.
|
||||
|
||||
### How do Programmable Gradient Information (PGI) and GELAN improve YOLOv9's performance?
|
||||
### How does YOLOv9 perform on the MS COCO dataset compared to other models?
|
||||
|
||||
Programmable Gradient Information (PGI) helps counteract information loss in deep neural networks by ensuring the preservation of essential data across network layers. This leads to more reliable gradient generation and better model convergence. The Generalized Efficient Layer Aggregation Network (GELAN) enhances parameter utilization and computational efficiency by allowing flexible integration of various computational blocks. These innovations collectively improve the accuracy and efficiency of YOLOv9. For more details, see the [Innovations section](#programmable-gradient-information-pgi).
|
||||
YOLOv9 outperforms state-of-the-art real-time object detectors by achieving higher accuracy and efficiency. On the [COCO dataset](../datasets/detect/coco.md), YOLOv9 models exhibit superior mAP scores across various sizes while maintaining or reducing computational overhead. For instance, YOLOv9c achieves comparable accuracy with 42% fewer parameters and 21% less computational demand than YOLOv7 AF. Explore [performance comparisons](#performance-on-ms-coco-dataset) for detailed metrics.
|
||||
|
||||
### What makes YOLOv9 more efficient than previous YOLO versions?
|
||||
### How can I train a YOLOv9 model using Python and CLI?
|
||||
|
||||
YOLOv9 incorporates several innovations like PGI and GELAN that effectively address the information bottleneck and improve layer aggregation, respectively. This results in high parameter efficiency and reduced computational load. Models such as YOLOv9s and YOLOv9e demonstrate significant gains in performance compared to previous versions. For a detailed performance comparison, refer to the [Performance section on MS COCO](#performance-on-ms-coco-dataset).
|
||||
|
||||
### Can I train YOLOv9 on my custom dataset using Ultralytics?
|
||||
|
||||
Yes, you can easily train YOLOv9 on your custom dataset using Ultralytics. The Ultralytics YOLO framework supports various modes including [Train](../modes/train.md), [Predict](../modes/predict.md), and [Val](../modes/val.md). For example, you can start training a YOLOv9c model with the following Python code snippet:
|
||||
You can train a YOLOv9 model using both Python and CLI commands. For Python, instantiate a model using the `YOLO` class and call the `train` method:
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# Load the YOLOv9c model configuration
|
||||
model = YOLO("yolov9c.yaml")
|
||||
|
||||
# Train the model on your custom dataset
|
||||
results = model.train(data="custom_dataset.yaml", epochs=100, imgsz=640)
|
||||
# Build a YOLOv9c model from pretrained weights and train
|
||||
model = YOLO("yolov9c.pt")
|
||||
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
|
||||
```
|
||||
|
||||
### How does YOLOv9 compare to other state-of-the-art real-time object detectors?
|
||||
For CLI training, execute:
|
||||
|
||||
YOLOv9 shows superior performance across different model sizes on the [COCO dataset](../datasets/detect/coco.md). It achieves higher mean Average Precision (mAP) with fewer parameters and computational resources compared to competitors. For instance, the YOLOv9c model operates with 42% fewer parameters and 21% less computational demand than YOLOv7 AF, while matching its accuracy. Detailed performance metrics can be found in the [Performance Comparison table](#performance-on-ms-coco-dataset).
|
||||
```bash
|
||||
yolo train model=yolov9c.yaml data=coco8.yaml epochs=100 imgsz=640
|
||||
```
|
||||
|
||||
Learn more about [usage examples](#usage-examples) for training and inference.
|
||||
|
||||
### What are the advantages of using Ultralytics YOLOv9 for lightweight models?
|
||||
|
||||
YOLOv9 is designed to mitigate information loss, which is particularly important for lightweight models often prone to losing significant information. By integrating Programmable Gradient Information (PGI) and reversible functions, YOLOv9 ensures essential data retention, enhancing the model's accuracy and efficiency. This makes it highly suitable for applications requiring compact models with high performance. For more details, explore the section on [YOLOv9's impact on lightweight models](#impact-on-lightweight-models).
|
||||
|
||||
### What tasks and modes does YOLOv9 support?
|
||||
|
||||
YOLOv9 supports various tasks including object detection and instance segmentation. It is compatible with multiple operational modes such as inference, validation, training, and export. This versatility makes YOLOv9 adaptable to diverse real-time computer vision applications. Refer to the [supported tasks and modes](#supported-tasks-and-modes) section for more information.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue