Fix mkdocs.yml raw image URLs (#14213)

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com> Co-authored-by: UltralyticsAssistant <web@ultralytics.com> Co-authored-by: Burhan <62214284+Burhan-Q@users.noreply.github.com>
2024-07-05 02:25:02 +02:00 · 2024-07-05 02:25:02 +02:00 · 5d479c73c2
commit 5d479c73c2
parent d5db9c916f
69 changed files with 4767 additions and 223 deletions
--- a/docs/en/guides/triton-inference-server.md
+++ b/docs/en/guides/triton-inference-server.md
@ -142,3 +142,126 @@ subprocess.call(f"docker kill {container_id}", shell=True)
 ---

 By following the above steps, you can deploy and run Ultralytics YOLOv8 models efficiently on Triton Inference Server, providing a scalable and high-performance solution for deep learning inference tasks. If you face any issues or have further queries, refer to the [official Triton documentation](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html) or reach out to the Ultralytics community for support.
+
+## FAQ
+
+### How do I set up Ultralytics YOLOv8 with NVIDIA Triton Inference Server?
+
+Setting up [Ultralytics YOLOv8](https://docs.ultralytics.com/models/yolov8) with [NVIDIA Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server) involves a few key steps:
+
+1. **Export YOLOv8 to ONNX format**:
+
+    ```python
+    from ultralytics import YOLO
+
+    # Load a model
+    model = YOLO("yolov8n.pt")  # load an official model
+
+    # Export the model to ONNX format
+    onnx_file = model.export(format="onnx", dynamic=True)
+    ```
+
+2. **Set up Triton Model Repository**:
+
+    ```python
+    from pathlib import Path
+
+    # Define paths
+    model_name = "yolo"
+    triton_repo_path = Path("tmp") / "triton_repo"
+    triton_model_path = triton_repo_path / model_name
+
+    # Create directories
+    (triton_model_path / "1").mkdir(parents=True, exist_ok=True)
+    Path(onnx_file).rename(triton_model_path / "1" / "model.onnx")
+    (triton_model_path / "config.pbtxt").touch()
+    ```
+
+3. **Run the Triton Server**:
+
+    ```python
+    import contextlib
+    import subprocess
+    import time
+
+    from tritonclient.http import InferenceServerClient
+
+    # Define image https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver
+    tag = "nvcr.io/nvidia/tritonserver:23.09-py3"
+
+    subprocess.call(f"docker pull {tag}", shell=True)
+
+    container_id = (
+        subprocess.check_output(
+            f"docker run -d --rm -v {triton_repo_path}/models -p 8000:8000 {tag} tritonserver --model-repository=/models",
+            shell=True,
+        )
+        .decode("utf-8")
+        .strip()
+    )
+
+    triton_client = InferenceServerClient(url="localhost:8000", verbose=False, ssl=False)
+
+    for _ in range(10):
+        with contextlib.suppress(Exception):
+            assert triton_client.is_model_ready(model_name)
+            break
+        time.sleep(1)
+    ```
+
+This setup can help you efficiently deploy YOLOv8 models at scale on Triton Inference Server for high-performance AI model inference.
+
+### What benefits does using Ultralytics YOLOv8 with NVIDIA Triton Inference Server offer?
+
+Integrating [Ultralytics YOLOv8](../models/yolov8.md) with [NVIDIA Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server) provides several advantages:
+
+- **Scalable AI Inference**: Triton allows serving multiple models from a single server instance, supporting dynamic model loading and unloading, making it highly scalable for diverse AI workloads.
+- **High Performance**: Optimized for NVIDIA GPUs, Triton Inference Server ensures high-speed inference operations, perfect for real-time applications such as object detection.
+- **Ensemble and Model Versioning**: Triton's ensemble mode enables combining multiple models to improve results, and its model versioning supports A/B testing and rolling updates.
+
+For detailed instructions on setting up and running YOLOv8 with Triton, you can refer to the [setup guide](#setting-up-triton-model-repository).
+
+### Why should I export my YOLOv8 model to ONNX format before using Triton Inference Server?
+
+Using ONNX (Open Neural Network Exchange) format for your [Ultralytics YOLOv8](../models/yolov8.md) model before deploying it on [NVIDIA Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server) offers several key benefits:
+
+- **Interoperability**: ONNX format supports transfer between different deep learning frameworks (such as PyTorch, TensorFlow), ensuring broader compatibility.
+- **Optimization**: Many deployment environments, including Triton, optimize for ONNX, enabling faster inference and better performance.
+- **Ease of Deployment**: ONNX is widely supported across frameworks and platforms, simplifying the deployment process in various operating systems and hardware configurations.
+
+To export your model, use:
+
+```python
+from ultralytics import YOLO
+
+model = YOLO("yolov8n.pt")
+onnx_file = model.export(format="onnx", dynamic=True)
+```
+
+You can follow the steps in the [exporting guide](../modes/export.md) to complete the process.
+
+### Can I run inference using the Ultralytics YOLOv8 model on Triton Inference Server?
+
+Yes, you can run inference using the [Ultralytics YOLOv8](../models/yolov8.md) model on [NVIDIA Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server). Once your model is set up in the Triton Model Repository and the server is running, you can load and run inference on your model as follows:
+
+```python
+from ultralytics import YOLO
+
+# Load the Triton Server model
+model = YOLO("http://localhost:8000/yolo", task="detect")
+
+# Run inference on the server
+results = model("path/to/image.jpg")
+```
+
+For an in-depth guide on setting up and running Triton Server with YOLOv8, refer to the [running triton inference server](#running-triton-inference-server) section.
+
+### How does Ultralytics YOLOv8 compare to TensorFlow and PyTorch models for deployment?
+
+[Ultralytics YOLOv8](https://docs.ultralytics.com/models/yolov8) offers several unique advantages compared to TensorFlow and PyTorch models for deployment:
+
+- **Real-time Performance**: Optimized for real-time object detection tasks, YOLOv8 provides state-of-the-art accuracy and speed, making it ideal for applications requiring live video analytics.
+- **Ease of Use**: YOLOv8 integrates seamlessly with Triton Inference Server and supports diverse export formats (ONNX, TensorRT, CoreML), making it flexible for various deployment scenarios.
+- **Advanced Features**: YOLOv8 includes features like dynamic model loading, model versioning, and ensemble inference, which are crucial for scalable and reliable AI deployments.
+
+For more details, compare the deployment options in the [model deployment guide](../modes/export.md).