diff --git a/docs/en/integrations/tensorrt.md b/docs/en/integrations/tensorrt.md index c568759b..4ffb9cb2 100644 --- a/docs/en/integrations/tensorrt.md +++ b/docs/en/integrations/tensorrt.md @@ -145,27 +145,43 @@ Experimentation by NVIDIA led them to recommend using at least 500 calibration i !!! example - ```{ .py .annotate } - from ultralytics import YOLO + === "Python" - model = YOLO("yolov8n.pt") - model.export( - format="engine", - dynamic=True, #(1)! - batch=8, #(2)! - workspace=4, #(3)! - int8=True, - data="coco.yaml", #(4)! - ) + ```{ .py .annotate } + from ultralytics import YOLO + + model = YOLO("yolov8n.pt") + model.export( + format="engine", + dynamic=True, #(1)! + batch=8, #(2)! + workspace=4, #(3)! + int8=True, + data="coco.yaml", #(4)! + ) + + # Load the exported TensorRT INT8 model + model = YOLO("yolov8n.engine", task="detect") + # Run inference + result = model.predict("https://ultralytics.com/images/bus.jpg") + ``` + + 1. Exports with dynamic axes, this will be enabled by default when exporting with `int8=True` even when not explicitly set. See [export arguments](../modes/export.md#arguments) for additional information. + 2. Sets max batch size of 8 for exported model, which calibrates with `batch = 2 * 8` to avoid scaling errors during calibration. + 3. Allocates 4 GiB of memory instead of allocating the entire device for conversion process. + 4. Uses [COCO dataset](../datasets/detect/coco.md) for calibration, specifically the images used for [validation](../modes/val.md) (5,000 total). - model = YOLO("yolov8n.engine", task="detect") # load the model - ``` - - 1. Exports with dynamic axes, this will be enabled by default when exporting with `int8=True` even when not explicitly set. See [export arguments](../modes/export.md#arguments) for additional information. - 2. Sets max batch size of 8 for exported model, which calibrates with `batch = 2 *×* 8` to avoid scaling errors during calibration. - 3. Allocates 4 GiB of memory instead of allocating the entire device for conversion process. - 4. Uses [COCO dataset](../datasets/detect/coco.md) for calibration, specifically the images used for [validation](../modes/val.md) (5,000 total). + === "CLI" + + ```bash + # Export a YOLOv8n PyTorch model to TensorRT format with INT8 quantization + yolo export model=yolov8n.pt format=engine batch=8 workspace=4 int8=True data=coco.yaml # creates 'yolov8n.engine'' + + # Run inference with the exported TensorRT quantized model + yolo predict model=yolov8n.engine source='https://ultralytics.com/images/bus.jpg' + ``` + ???+ warning "Calibration Cache" @@ -240,12 +256,12 @@ Experimentation by NVIDIA led them to recommend using at least 500 calibration i | Precision | Eval test | mean
(ms) | min \| max
(ms) | top-1 | top-5 | `batch` | size
(pixels) | |-----------|------------------|--------------|--------------------|-------|-------|---------|-----------------------| - | FP32 | Predict | 0.26 | 0.25 \| 0.28 | 0.35 | 0.61 | 8 | 640 | - | FP32 | ImageNetval | 0.26 | | | | 1 | 640 | - | FP16 | Predict | 0.18 | 0.17 \| 0.19 | 0.35 | 0.61 | 8 | 640 | - | FP16 | ImageNetval | 0.18 | | | | 1 | 640 | - | INT8 | Predict | 0.16 | 0.15 \| 0.57 | 0.32 | 0.59 | 8 | 640 | - | INT8 | ImageNetval | 0.15 | | | | 1 | 640 | + | FP32 | Predict | 0.26 | 0.25 \| 0.28 | | | 8 | 640 | + | FP32 | ImageNetval | 0.26 | | 0.35 | 0.61 | 1 | 640 | + | FP16 | Predict | 0.18 | 0.17 \| 0.19 | | | 8 | 640 | + | FP16 | ImageNetval | 0.18 | | 0.35 | 0.61 | 1 | 640 | + | INT8 | Predict | 0.16 | 0.15 \| 0.57 | | | 8 | 640 | + | INT8 | ImageNetval | 0.15 | | 0.32 | 0.59 | 1 | 640 | === "Pose (COCO)" @@ -338,19 +354,19 @@ Experimentation by NVIDIA led them to recommend using at least 500 calibration i === "Jetson Orin NX 16GB" - Tested with JetPack 5.1.3 (L4T 35.5.0) Ubuntu 20.04.6, `python 3.8.10`, `ultralytics==8.2.4`, `tensorrt==8.5.2.2` + Tested with JetPack 6.0 (L4T 36.3) Ubuntu 22.04.4 LTS, `python 3.10.12`, `ultralytics==8.2.16`, `tensorrt==10.0.1` !!! note Inference times shown for `mean`, `min` (fastest), and `max` (slowest) for each test using pre-trained weights `yolov8n.engine` | Precision | Eval test | mean
(ms) | min \| max
(ms) | mAPval
50(B) | mAPval
50-95(B) | `batch` | size
(pixels) | |-----------|--------------|--------------|--------------------|----------------------|-------------------------|---------|-----------------------| - | FP32 | Predict | 6.90 | 6.89 \| 6.93 | | | 8 | 640 | - | FP32 | COCOval | 6.97 | | 0.52 | 0.37 | 1 | 640 | - | FP16 | Predict | 3.36 | 3.35 \| 3.39 | | | 8 | 640 | - | FP16 | COCOval | 3.39 | | 0.52 | 0.37 | 1 | 640 | - | INT8 | Predict | 2.32 | 2.32 \| 2.34 | | | 8 | 640 | - | INT8 | COCOval | 2.33 | | 0.47 | 0.33 | 1 | 640 | + | FP32 | Predict | 6.11 | 6.10 \| 6.29 | | | 8 | 640 | + | FP32 | COCOval | 6.17 | | 0.52 | 0.37 | 1 | 640 | + | FP16 | Predict | 3.18 | 3.18 \| 3.20 | | | 8 | 640 | + | FP16 | COCOval | 3.19 | | 0.52 | 0.37 | 1 | 640 | + | INT8 | Predict | 2.30 | 2.29 \| 2.35 | | | 8 | 640 | + | INT8 | COCOval | 2.32 | | 0.46 | 0.32 | 1 | 640 | !!! info