Fix mkdocs.yml raw image URLs (#14213)

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: UltralyticsAssistant <web@ultralytics.com>
Co-authored-by: Burhan <62214284+Burhan-Q@users.noreply.github.com>
This commit is contained in:
Glenn Jocher 2024-07-05 02:25:02 +02:00 committed by GitHub
parent d5db9c916f
commit 5d479c73c2
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
69 changed files with 4767 additions and 223 deletions

View file

@ -142,3 +142,126 @@ subprocess.call(f"docker kill {container_id}", shell=True)
---
By following the above steps, you can deploy and run Ultralytics YOLOv8 models efficiently on Triton Inference Server, providing a scalable and high-performance solution for deep learning inference tasks. If you face any issues or have further queries, refer to the [official Triton documentation](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html) or reach out to the Ultralytics community for support.
## FAQ
### How do I set up Ultralytics YOLOv8 with NVIDIA Triton Inference Server?
Setting up [Ultralytics YOLOv8](https://docs.ultralytics.com/models/yolov8) with [NVIDIA Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server) involves a few key steps:
1. **Export YOLOv8 to ONNX format**:
```python
from ultralytics import YOLO
# Load a model
model = YOLO("yolov8n.pt") # load an official model
# Export the model to ONNX format
onnx_file = model.export(format="onnx", dynamic=True)
```
2. **Set up Triton Model Repository**:
```python
from pathlib import Path
# Define paths
model_name = "yolo"
triton_repo_path = Path("tmp") / "triton_repo"
triton_model_path = triton_repo_path / model_name
# Create directories
(triton_model_path / "1").mkdir(parents=True, exist_ok=True)
Path(onnx_file).rename(triton_model_path / "1" / "model.onnx")
(triton_model_path / "config.pbtxt").touch()
```
3. **Run the Triton Server**:
```python
import contextlib
import subprocess
import time
from tritonclient.http import InferenceServerClient
# Define image https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver
tag = "nvcr.io/nvidia/tritonserver:23.09-py3"
subprocess.call(f"docker pull {tag}", shell=True)
container_id = (
subprocess.check_output(
f"docker run -d --rm -v {triton_repo_path}/models -p 8000:8000 {tag} tritonserver --model-repository=/models",
shell=True,
)
.decode("utf-8")
.strip()
)
triton_client = InferenceServerClient(url="localhost:8000", verbose=False, ssl=False)
for _ in range(10):
with contextlib.suppress(Exception):
assert triton_client.is_model_ready(model_name)
break
time.sleep(1)
```
This setup can help you efficiently deploy YOLOv8 models at scale on Triton Inference Server for high-performance AI model inference.
### What benefits does using Ultralytics YOLOv8 with NVIDIA Triton Inference Server offer?
Integrating [Ultralytics YOLOv8](../models/yolov8.md) with [NVIDIA Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server) provides several advantages:
- **Scalable AI Inference**: Triton allows serving multiple models from a single server instance, supporting dynamic model loading and unloading, making it highly scalable for diverse AI workloads.
- **High Performance**: Optimized for NVIDIA GPUs, Triton Inference Server ensures high-speed inference operations, perfect for real-time applications such as object detection.
- **Ensemble and Model Versioning**: Triton's ensemble mode enables combining multiple models to improve results, and its model versioning supports A/B testing and rolling updates.
For detailed instructions on setting up and running YOLOv8 with Triton, you can refer to the [setup guide](#setting-up-triton-model-repository).
### Why should I export my YOLOv8 model to ONNX format before using Triton Inference Server?
Using ONNX (Open Neural Network Exchange) format for your [Ultralytics YOLOv8](../models/yolov8.md) model before deploying it on [NVIDIA Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server) offers several key benefits:
- **Interoperability**: ONNX format supports transfer between different deep learning frameworks (such as PyTorch, TensorFlow), ensuring broader compatibility.
- **Optimization**: Many deployment environments, including Triton, optimize for ONNX, enabling faster inference and better performance.
- **Ease of Deployment**: ONNX is widely supported across frameworks and platforms, simplifying the deployment process in various operating systems and hardware configurations.
To export your model, use:
```python
from ultralytics import YOLO
model = YOLO("yolov8n.pt")
onnx_file = model.export(format="onnx", dynamic=True)
```
You can follow the steps in the [exporting guide](../modes/export.md) to complete the process.
### Can I run inference using the Ultralytics YOLOv8 model on Triton Inference Server?
Yes, you can run inference using the [Ultralytics YOLOv8](../models/yolov8.md) model on [NVIDIA Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server). Once your model is set up in the Triton Model Repository and the server is running, you can load and run inference on your model as follows:
```python
from ultralytics import YOLO
# Load the Triton Server model
model = YOLO("http://localhost:8000/yolo", task="detect")
# Run inference on the server
results = model("path/to/image.jpg")
```
For an in-depth guide on setting up and running Triton Server with YOLOv8, refer to the [running triton inference server](#running-triton-inference-server) section.
### How does Ultralytics YOLOv8 compare to TensorFlow and PyTorch models for deployment?
[Ultralytics YOLOv8](https://docs.ultralytics.com/models/yolov8) offers several unique advantages compared to TensorFlow and PyTorch models for deployment:
- **Real-time Performance**: Optimized for real-time object detection tasks, YOLOv8 provides state-of-the-art accuracy and speed, making it ideal for applications requiring live video analytics.
- **Ease of Use**: YOLOv8 integrates seamlessly with Triton Inference Server and supports diverse export formats (ONNX, TensorRT, CoreML), making it flexible for various deployment scenarios.
- **Advanced Features**: YOLOv8 includes features like dynamic model loading, model versioning, and ensemble inference, which are crucial for scalable and reliable AI deployments.
For more details, compare the deployment options in the [model deployment guide](../modes/export.md).