Update YOLO11 Actions and Docs (#16596)

Signed-off-by: UltralyticsAssistant <web@ultralytics.com>
This commit is contained in:
Ultralytics Assistant 2024-10-01 16:58:12 +02:00 committed by GitHub
parent 51e93d6111
commit 97f38409fb
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
124 changed files with 1948 additions and 1948 deletions

View file

@ -1,12 +1,12 @@
---
comments: true
description: Learn how to integrate Ultralytics YOLOv8 with NVIDIA Triton Inference Server for scalable, high-performance AI model deployment.
keywords: Triton Inference Server, YOLOv8, Ultralytics, NVIDIA, deep learning, AI model deployment, ONNX, scalable inference
description: Learn how to integrate Ultralytics YOLO11 with NVIDIA Triton Inference Server for scalable, high-performance AI model deployment.
keywords: Triton Inference Server, YOLO11, Ultralytics, NVIDIA, deep learning, AI model deployment, ONNX, scalable inference
---
# Triton Inference Server with Ultralytics YOLOv8
# Triton Inference Server with Ultralytics YOLO11
The [Triton Inference Server](https://developer.nvidia.com/triton-inference-server) (formerly known as TensorRT Inference Server) is an open-source software solution developed by NVIDIA. It provides a cloud inference solution optimized for NVIDIA GPUs. Triton simplifies the deployment of AI models at scale in production. Integrating Ultralytics YOLOv8 with Triton Inference Server allows you to deploy scalable, high-performance [deep learning](https://www.ultralytics.com/glossary/deep-learning-dl) inference workloads. This guide provides steps to set up and test the integration.
The [Triton Inference Server](https://developer.nvidia.com/triton-inference-server) (formerly known as TensorRT Inference Server) is an open-source software solution developed by NVIDIA. It provides a cloud inference solution optimized for NVIDIA GPUs. Triton simplifies the deployment of AI models at scale in production. Integrating Ultralytics YOLO11 with Triton Inference Server allows you to deploy scalable, high-performance [deep learning](https://www.ultralytics.com/glossary/deep-learning-dl) inference workloads. This guide provides steps to set up and test the integration.
<p align="center">
<br>
@ -38,7 +38,7 @@ Ensure you have the following prerequisites before proceeding:
pip install tritonclient[all]
```
## Exporting YOLOv8 to ONNX Format
## Exporting YOLO11 to ONNX Format
Before deploying the model on Triton, it must be exported to the ONNX format. ONNX (Open Neural Network Exchange) is a format that allows models to be transferred between different deep learning frameworks. Use the `export` function from the `YOLO` class:
@ -46,7 +46,7 @@ Before deploying the model on Triton, it must be exported to the ONNX format. ON
from ultralytics import YOLO
# Load a model
model = YOLO("yolov8n.pt") # load an official model
model = YOLO("yolo11n.pt") # load an official model
# Export the model
onnx_file = model.export(format="onnx", dynamic=True)
@ -141,21 +141,21 @@ subprocess.call(f"docker kill {container_id}", shell=True)
---
By following the above steps, you can deploy and run Ultralytics YOLOv8 models efficiently on Triton Inference Server, providing a scalable and high-performance solution for deep learning inference tasks. If you face any issues or have further queries, refer to the [official Triton documentation](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html) or reach out to the Ultralytics community for support.
By following the above steps, you can deploy and run Ultralytics YOLO11 models efficiently on Triton Inference Server, providing a scalable and high-performance solution for deep learning inference tasks. If you face any issues or have further queries, refer to the [official Triton documentation](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html) or reach out to the Ultralytics community for support.
## FAQ
### How do I set up Ultralytics YOLOv8 with NVIDIA Triton Inference Server?
### How do I set up Ultralytics YOLO11 with NVIDIA Triton Inference Server?
Setting up [Ultralytics YOLOv8](https://docs.ultralytics.com/models/yolov8/) with [NVIDIA Triton Inference Server](https://developer.nvidia.com/triton-inference-server) involves a few key steps:
Setting up [Ultralytics YOLO11](https://docs.ultralytics.com/models/yolov8/) with [NVIDIA Triton Inference Server](https://developer.nvidia.com/triton-inference-server) involves a few key steps:
1. **Export YOLOv8 to ONNX format**:
1. **Export YOLO11 to ONNX format**:
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolov8n.pt") # load an official model
model = YOLO("yolo11n.pt") # load an official model
# Export the model to ONNX format
onnx_file = model.export(format="onnx", dynamic=True)
@ -209,21 +209,21 @@ Setting up [Ultralytics YOLOv8](https://docs.ultralytics.com/models/yolov8/) wit
time.sleep(1)
```
This setup can help you efficiently deploy YOLOv8 models at scale on Triton Inference Server for high-performance AI model inference.
This setup can help you efficiently deploy YOLO11 models at scale on Triton Inference Server for high-performance AI model inference.
### What benefits does using Ultralytics YOLOv8 with NVIDIA Triton Inference Server offer?
### What benefits does using Ultralytics YOLO11 with NVIDIA Triton Inference Server offer?
Integrating [Ultralytics YOLOv8](../models/yolov8.md) with [NVIDIA Triton Inference Server](https://developer.nvidia.com/triton-inference-server) provides several advantages:
Integrating [Ultralytics YOLO11](../models/yolov8.md) with [NVIDIA Triton Inference Server](https://developer.nvidia.com/triton-inference-server) provides several advantages:
- **Scalable AI Inference**: Triton allows serving multiple models from a single server instance, supporting dynamic model loading and unloading, making it highly scalable for diverse AI workloads.
- **High Performance**: Optimized for NVIDIA GPUs, Triton Inference Server ensures high-speed inference operations, perfect for real-time applications such as [object detection](https://www.ultralytics.com/glossary/object-detection).
- **Ensemble and Model Versioning**: Triton's ensemble mode enables combining multiple models to improve results, and its model versioning supports A/B testing and rolling updates.
For detailed instructions on setting up and running YOLOv8 with Triton, you can refer to the [setup guide](#setting-up-triton-model-repository).
For detailed instructions on setting up and running YOLO11 with Triton, you can refer to the [setup guide](#setting-up-triton-model-repository).
### Why should I export my YOLOv8 model to ONNX format before using Triton Inference Server?
### Why should I export my YOLO11 model to ONNX format before using Triton Inference Server?
Using ONNX (Open Neural Network Exchange) format for your [Ultralytics YOLOv8](../models/yolov8.md) model before deploying it on [NVIDIA Triton Inference Server](https://developer.nvidia.com/triton-inference-server) offers several key benefits:
Using ONNX (Open Neural Network Exchange) format for your [Ultralytics YOLO11](../models/yolov8.md) model before deploying it on [NVIDIA Triton Inference Server](https://developer.nvidia.com/triton-inference-server) offers several key benefits:
- **Interoperability**: ONNX format supports transfer between different deep learning frameworks (such as PyTorch, TensorFlow), ensuring broader compatibility.
- **Optimization**: Many deployment environments, including Triton, optimize for ONNX, enabling faster inference and better performance.
@ -234,15 +234,15 @@ To export your model, use:
```python
from ultralytics import YOLO
model = YOLO("yolov8n.pt")
model = YOLO("yolo11n.pt")
onnx_file = model.export(format="onnx", dynamic=True)
```
You can follow the steps in the [exporting guide](../modes/export.md) to complete the process.
### Can I run inference using the Ultralytics YOLOv8 model on Triton Inference Server?
### Can I run inference using the Ultralytics YOLO11 model on Triton Inference Server?
Yes, you can run inference using the [Ultralytics YOLOv8](../models/yolov8.md) model on [NVIDIA Triton Inference Server](https://developer.nvidia.com/triton-inference-server). Once your model is set up in the Triton Model Repository and the server is running, you can load and run inference on your model as follows:
Yes, you can run inference using the [Ultralytics YOLO11](../models/yolov8.md) model on [NVIDIA Triton Inference Server](https://developer.nvidia.com/triton-inference-server). Once your model is set up in the Triton Model Repository and the server is running, you can load and run inference on your model as follows:
```python
from ultralytics import YOLO
@ -254,14 +254,14 @@ model = YOLO("http://localhost:8000/yolo", task="detect")
results = model("path/to/image.jpg")
```
For an in-depth guide on setting up and running Triton Server with YOLOv8, refer to the [running triton inference server](#running-triton-inference-server) section.
For an in-depth guide on setting up and running Triton Server with YOLO11, refer to the [running triton inference server](#running-triton-inference-server) section.
### How does Ultralytics YOLOv8 compare to [TensorFlow](https://www.ultralytics.com/glossary/tensorflow) and PyTorch models for deployment?
### How does Ultralytics YOLO11 compare to [TensorFlow](https://www.ultralytics.com/glossary/tensorflow) and PyTorch models for deployment?
[Ultralytics YOLOv8](https://docs.ultralytics.com/models/yolov8/) offers several unique advantages compared to TensorFlow and PyTorch models for deployment:
[Ultralytics YOLO11](https://docs.ultralytics.com/models/yolov8/) offers several unique advantages compared to TensorFlow and PyTorch models for deployment:
- **Real-time Performance**: Optimized for real-time object detection tasks, YOLOv8 provides state-of-the-art [accuracy](https://www.ultralytics.com/glossary/accuracy) and speed, making it ideal for applications requiring live video analytics.
- **Ease of Use**: YOLOv8 integrates seamlessly with Triton Inference Server and supports diverse export formats (ONNX, TensorRT, CoreML), making it flexible for various deployment scenarios.
- **Advanced Features**: YOLOv8 includes features like dynamic model loading, model versioning, and ensemble inference, which are crucial for scalable and reliable AI deployments.
- **Real-time Performance**: Optimized for real-time object detection tasks, YOLO11 provides state-of-the-art [accuracy](https://www.ultralytics.com/glossary/accuracy) and speed, making it ideal for applications requiring live video analytics.
- **Ease of Use**: YOLO11 integrates seamlessly with Triton Inference Server and supports diverse export formats (ONNX, TensorRT, CoreML), making it flexible for various deployment scenarios.
- **Advanced Features**: YOLO11 includes features like dynamic model loading, model versioning, and ensemble inference, which are crucial for scalable and reliable AI deployments.
For more details, compare the deployment options in the [model deployment guide](../modes/export.md).