Fix mkdocs.yml raw image URLs (#14213)
Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com> Co-authored-by: UltralyticsAssistant <web@ultralytics.com> Co-authored-by: Burhan <62214284+Burhan-Q@users.noreply.github.com>
This commit is contained in:
parent
d5db9c916f
commit
5d479c73c2
69 changed files with 4767 additions and 223 deletions
|
|
@ -142,3 +142,126 @@ subprocess.call(f"docker kill {container_id}", shell=True)
|
|||
---
|
||||
|
||||
By following the above steps, you can deploy and run Ultralytics YOLOv8 models efficiently on Triton Inference Server, providing a scalable and high-performance solution for deep learning inference tasks. If you face any issues or have further queries, refer to the [official Triton documentation](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html) or reach out to the Ultralytics community for support.
|
||||
|
||||
## FAQ
|
||||
|
||||
### How do I set up Ultralytics YOLOv8 with NVIDIA Triton Inference Server?
|
||||
|
||||
Setting up [Ultralytics YOLOv8](https://docs.ultralytics.com/models/yolov8) with [NVIDIA Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server) involves a few key steps:
|
||||
|
||||
1. **Export YOLOv8 to ONNX format**:
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# Load a model
|
||||
model = YOLO("yolov8n.pt") # load an official model
|
||||
|
||||
# Export the model to ONNX format
|
||||
onnx_file = model.export(format="onnx", dynamic=True)
|
||||
```
|
||||
|
||||
2. **Set up Triton Model Repository**:
|
||||
|
||||
```python
|
||||
from pathlib import Path
|
||||
|
||||
# Define paths
|
||||
model_name = "yolo"
|
||||
triton_repo_path = Path("tmp") / "triton_repo"
|
||||
triton_model_path = triton_repo_path / model_name
|
||||
|
||||
# Create directories
|
||||
(triton_model_path / "1").mkdir(parents=True, exist_ok=True)
|
||||
Path(onnx_file).rename(triton_model_path / "1" / "model.onnx")
|
||||
(triton_model_path / "config.pbtxt").touch()
|
||||
```
|
||||
|
||||
3. **Run the Triton Server**:
|
||||
|
||||
```python
|
||||
import contextlib
|
||||
import subprocess
|
||||
import time
|
||||
|
||||
from tritonclient.http import InferenceServerClient
|
||||
|
||||
# Define image https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver
|
||||
tag = "nvcr.io/nvidia/tritonserver:23.09-py3"
|
||||
|
||||
subprocess.call(f"docker pull {tag}", shell=True)
|
||||
|
||||
container_id = (
|
||||
subprocess.check_output(
|
||||
f"docker run -d --rm -v {triton_repo_path}/models -p 8000:8000 {tag} tritonserver --model-repository=/models",
|
||||
shell=True,
|
||||
)
|
||||
.decode("utf-8")
|
||||
.strip()
|
||||
)
|
||||
|
||||
triton_client = InferenceServerClient(url="localhost:8000", verbose=False, ssl=False)
|
||||
|
||||
for _ in range(10):
|
||||
with contextlib.suppress(Exception):
|
||||
assert triton_client.is_model_ready(model_name)
|
||||
break
|
||||
time.sleep(1)
|
||||
```
|
||||
|
||||
This setup can help you efficiently deploy YOLOv8 models at scale on Triton Inference Server for high-performance AI model inference.
|
||||
|
||||
### What benefits does using Ultralytics YOLOv8 with NVIDIA Triton Inference Server offer?
|
||||
|
||||
Integrating [Ultralytics YOLOv8](../models/yolov8.md) with [NVIDIA Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server) provides several advantages:
|
||||
|
||||
- **Scalable AI Inference**: Triton allows serving multiple models from a single server instance, supporting dynamic model loading and unloading, making it highly scalable for diverse AI workloads.
|
||||
- **High Performance**: Optimized for NVIDIA GPUs, Triton Inference Server ensures high-speed inference operations, perfect for real-time applications such as object detection.
|
||||
- **Ensemble and Model Versioning**: Triton's ensemble mode enables combining multiple models to improve results, and its model versioning supports A/B testing and rolling updates.
|
||||
|
||||
For detailed instructions on setting up and running YOLOv8 with Triton, you can refer to the [setup guide](#setting-up-triton-model-repository).
|
||||
|
||||
### Why should I export my YOLOv8 model to ONNX format before using Triton Inference Server?
|
||||
|
||||
Using ONNX (Open Neural Network Exchange) format for your [Ultralytics YOLOv8](../models/yolov8.md) model before deploying it on [NVIDIA Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server) offers several key benefits:
|
||||
|
||||
- **Interoperability**: ONNX format supports transfer between different deep learning frameworks (such as PyTorch, TensorFlow), ensuring broader compatibility.
|
||||
- **Optimization**: Many deployment environments, including Triton, optimize for ONNX, enabling faster inference and better performance.
|
||||
- **Ease of Deployment**: ONNX is widely supported across frameworks and platforms, simplifying the deployment process in various operating systems and hardware configurations.
|
||||
|
||||
To export your model, use:
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
model = YOLO("yolov8n.pt")
|
||||
onnx_file = model.export(format="onnx", dynamic=True)
|
||||
```
|
||||
|
||||
You can follow the steps in the [exporting guide](../modes/export.md) to complete the process.
|
||||
|
||||
### Can I run inference using the Ultralytics YOLOv8 model on Triton Inference Server?
|
||||
|
||||
Yes, you can run inference using the [Ultralytics YOLOv8](../models/yolov8.md) model on [NVIDIA Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server). Once your model is set up in the Triton Model Repository and the server is running, you can load and run inference on your model as follows:
|
||||
|
||||
```python
|
||||
from ultralytics import YOLO
|
||||
|
||||
# Load the Triton Server model
|
||||
model = YOLO("http://localhost:8000/yolo", task="detect")
|
||||
|
||||
# Run inference on the server
|
||||
results = model("path/to/image.jpg")
|
||||
```
|
||||
|
||||
For an in-depth guide on setting up and running Triton Server with YOLOv8, refer to the [running triton inference server](#running-triton-inference-server) section.
|
||||
|
||||
### How does Ultralytics YOLOv8 compare to TensorFlow and PyTorch models for deployment?
|
||||
|
||||
[Ultralytics YOLOv8](https://docs.ultralytics.com/models/yolov8) offers several unique advantages compared to TensorFlow and PyTorch models for deployment:
|
||||
|
||||
- **Real-time Performance**: Optimized for real-time object detection tasks, YOLOv8 provides state-of-the-art accuracy and speed, making it ideal for applications requiring live video analytics.
|
||||
- **Ease of Use**: YOLOv8 integrates seamlessly with Triton Inference Server and supports diverse export formats (ONNX, TensorRT, CoreML), making it flexible for various deployment scenarios.
|
||||
- **Advanced Features**: YOLOv8 includes features like dynamic model loading, model versioning, and ensemble inference, which are crucial for scalable and reliable AI deployments.
|
||||
|
||||
For more details, compare the deployment options in the [model deployment guide](../modes/export.md).
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue