ultralytics 8.3.67 NMS Export for Detect, Segment, Pose and OBB YOLO models (#18484)

Signed-off-by: Mohammed Yasin <32206511+Y-T-G@users.noreply.github.com>
Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: UltralyticsAssistant <web@ultralytics.com>
Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
Co-authored-by: Laughing <61612323+Laughing-q@users.noreply.github.com>
Co-authored-by: Laughing-q <1185102784@qq.com>
Co-authored-by: Ultralytics Assistant <135830346+UltralyticsAssistant@users.noreply.github.com>
This commit is contained in:
Mohammed Yasin 2025-01-24 18:00:36 +08:00 committed by GitHub
parent 0e48a00303
commit 9181ff62f5
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
17 changed files with 320 additions and 208 deletions

View file

@ -1,18 +1,18 @@
| Format | `format` Argument | Model | Metadata | Arguments | | Format | `format` Argument | Model | Metadata | Arguments |
| ------------------------------------------------- | ----------------- | ----------------------------------------------- | -------- | -------------------------------------------------------------------- | | ------------------------------------------------- | ----------------- | ----------------------------------------------- | -------- | --------------------------------------------------------------------------- |
| [PyTorch](https://pytorch.org/) | - | `{{ model_name or "yolo11n" }}.pt` | ✅ | - | | [PyTorch](https://pytorch.org/) | - | `{{ model_name or "yolo11n" }}.pt` | ✅ | - |
| [TorchScript](../integrations/torchscript.md) | `torchscript` | `{{ model_name or "yolo11n" }}.torchscript` | ✅ | `imgsz`, `optimize`, `batch` | | [TorchScript](../integrations/torchscript.md) | `torchscript` | `{{ model_name or "yolo11n" }}.torchscript` | ✅ | `imgsz`, `optimize`, `nms`, `batch` |
| [ONNX](../integrations/onnx.md) | `onnx` | `{{ model_name or "yolo11n" }}.onnx` | ✅ | `imgsz`, `half`, `dynamic`, `simplify`, `opset`, `batch` | | [ONNX](../integrations/onnx.md) | `onnx` | `{{ model_name or "yolo11n" }}.onnx` | ✅ | `imgsz`, `half`, `dynamic`, `simplify`, `opset`, `nms`, `batch` |
| [OpenVINO](../integrations/openvino.md) | `openvino` | `{{ model_name or "yolo11n" }}_openvino_model/` | ✅ | `imgsz`, `half`, `dynamic`, `int8`, `batch` | | [OpenVINO](../integrations/openvino.md) | `openvino` | `{{ model_name or "yolo11n" }}_openvino_model/` | ✅ | `imgsz`, `half`, `dynamic`, `int8`, `nms`, `batch` |
| [TensorRT](../integrations/tensorrt.md) | `engine` | `{{ model_name or "yolo11n" }}.engine` | ✅ | `imgsz`, `half`, `dynamic`, `simplify`, `workspace`, `int8`, `batch` | | [TensorRT](../integrations/tensorrt.md) | `engine` | `{{ model_name or "yolo11n" }}.engine` | ✅ | `imgsz`, `half`, `dynamic`, `simplify`, `workspace`, `int8`, `nms`, `batch` |
| [CoreML](../integrations/coreml.md) | `coreml` | `{{ model_name or "yolo11n" }}.mlpackage` | ✅ | `imgsz`, `half`, `int8`, `nms`, `batch` | | [CoreML](../integrations/coreml.md) | `coreml` | `{{ model_name or "yolo11n" }}.mlpackage` | ✅ | `imgsz`, `half`, `int8`, `nms`, `batch` |
| [TF SavedModel](../integrations/tf-savedmodel.md) | `saved_model` | `{{ model_name or "yolo11n" }}_saved_model/` | ✅ | `imgsz`, `keras`, `int8`, `batch` | | [TF SavedModel](../integrations/tf-savedmodel.md) | `saved_model` | `{{ model_name or "yolo11n" }}_saved_model/` | ✅ | `imgsz`, `keras`, `int8`, `nms`, `batch` |
| [TF GraphDef](../integrations/tf-graphdef.md) | `pb` | `{{ model_name or "yolo11n" }}.pb` | ❌ | `imgsz`, `batch` | | [TF GraphDef](../integrations/tf-graphdef.md) | `pb` | `{{ model_name or "yolo11n" }}.pb` | ❌ | `imgsz`, `batch` |
| [TF Lite](../integrations/tflite.md) | `tflite` | `{{ model_name or "yolo11n" }}.tflite` | ✅ | `imgsz`, `half`, `int8`, `batch` | | [TF Lite](../integrations/tflite.md) | `tflite` | `{{ model_name or "yolo11n" }}.tflite` | ✅ | `imgsz`, `half`, `int8`, `nms`, `batch` |
| [TF Edge TPU](../integrations/edge-tpu.md) | `edgetpu` | `{{ model_name or "yolo11n" }}_edgetpu.tflite` | ✅ | `imgsz` | | [TF Edge TPU](../integrations/edge-tpu.md) | `edgetpu` | `{{ model_name or "yolo11n" }}_edgetpu.tflite` | ✅ | `imgsz` |
| [TF.js](../integrations/tfjs.md) | `tfjs` | `{{ model_name or "yolo11n" }}_web_model/` | ✅ | `imgsz`, `half`, `int8`, `batch` | | [TF.js](../integrations/tfjs.md) | `tfjs` | `{{ model_name or "yolo11n" }}_web_model/` | ✅ | `imgsz`, `half`, `int8`, `nms`, `batch` |
| [PaddlePaddle](../integrations/paddlepaddle.md) | `paddle` | `{{ model_name or "yolo11n" }}_paddle_model/` | ✅ | `imgsz`, `batch` | | [PaddlePaddle](../integrations/paddlepaddle.md) | `paddle` | `{{ model_name or "yolo11n" }}_paddle_model/` | ✅ | `imgsz`, `batch` |
| [MNN](../integrations/mnn.md) | `mnn` | `{{ model_name or "yolo11n" }}.mnn` | ✅ | `imgsz`, `batch`, `int8`, `half` | | [MNN](../integrations/mnn.md) | `mnn` | `{{ model_name or "yolo11n" }}.mnn` | ✅ | `imgsz`, `batch`, `int8`, `half` |
| [NCNN](../integrations/ncnn.md) | `ncnn` | `{{ model_name or "yolo11n" }}_ncnn_model/` | ✅ | `imgsz`, `half`, `batch` | | [NCNN](../integrations/ncnn.md) | `ncnn` | `{{ model_name or "yolo11n" }}_ncnn_model/` | ✅ | `imgsz`, `half`, `batch` |
| [IMX500](../integrations/sony-imx500.md) | `imx` | `{{ model_name or "yolov8n" }}_imx_model/` | ✅ | `imgsz`, `int8` | | [IMX500](../integrations/sony-imx500.md) | `imx` | `{{ model_name or "yolov8n" }}_imx_model/` | ✅ | `imgsz`, `int8` |
| [RKNN](../integrations/rockchip-rknn.md) | `rknn` | `{{ model_name or "yolo11n" }}_rknn_model/` | ✅ | `imgsz`, `batch`, `name` | | [RKNN](../integrations/rockchip-rknn.md) | `rknn` | `{{ model_name or "yolo11n" }}_rknn_model/` | ✅ | `imgsz`, `batch`, `name` |

View file

@ -19,6 +19,10 @@ keywords: YOLOv8, export formats, ONNX, TensorRT, CoreML, machine learning model
<br><br><hr><br> <br><br><hr><br>
## ::: ultralytics.engine.exporter.NMSModel
<br><br><hr><br>
## ::: ultralytics.engine.exporter.export_formats ## ::: ultralytics.engine.exporter.export_formats
<br><br><hr><br> <br><br><hr><br>

View file

@ -43,23 +43,19 @@ def test_export_openvino():
@pytest.mark.slow @pytest.mark.slow
@pytest.mark.skipif(not TORCH_1_13, reason="OpenVINO requires torch>=1.13") @pytest.mark.skipif(not TORCH_1_13, reason="OpenVINO requires torch>=1.13")
@pytest.mark.parametrize( @pytest.mark.parametrize(
"task, dynamic, int8, half, batch", "task, dynamic, int8, half, batch, nms",
[ # generate all combinations but exclude those where both int8 and half are True [ # generate all combinations but exclude those where both int8 and half are True
(task, dynamic, int8, half, batch) (task, dynamic, int8, half, batch, nms)
for task, dynamic, int8, half, batch in product(TASKS, [True, False], [True, False], [True, False], [1, 2]) for task, dynamic, int8, half, batch, nms in product(
TASKS, [True, False], [True, False], [True, False], [1, 2], [True, False]
)
if not (int8 and half) # exclude cases where both int8 and half are True if not (int8 and half) # exclude cases where both int8 and half are True
], ],
) )
def test_export_openvino_matrix(task, dynamic, int8, half, batch): def test_export_openvino_matrix(task, dynamic, int8, half, batch, nms):
"""Test YOLO model exports to OpenVINO under various configuration matrix conditions.""" """Test YOLO model exports to OpenVINO under various configuration matrix conditions."""
file = YOLO(TASK2MODEL[task]).export( file = YOLO(TASK2MODEL[task]).export(
format="openvino", format="openvino", imgsz=32, dynamic=dynamic, int8=int8, half=half, batch=batch, data=TASK2DATA[task], nms=nms
imgsz=32,
dynamic=dynamic,
int8=int8,
half=half,
batch=batch,
data=TASK2DATA[task],
) )
if WINDOWS: if WINDOWS:
# Use unique filenames due to Windows file permissions bug possibly due to latent threaded use # Use unique filenames due to Windows file permissions bug possibly due to latent threaded use
@ -72,34 +68,26 @@ def test_export_openvino_matrix(task, dynamic, int8, half, batch):
@pytest.mark.slow @pytest.mark.slow
@pytest.mark.parametrize( @pytest.mark.parametrize(
"task, dynamic, int8, half, batch, simplify", product(TASKS, [True, False], [False], [False], [1, 2], [True, False]) "task, dynamic, int8, half, batch, simplify, nms",
product(TASKS, [True, False], [False], [False], [1, 2], [True, False], [True, False]),
) )
def test_export_onnx_matrix(task, dynamic, int8, half, batch, simplify): def test_export_onnx_matrix(task, dynamic, int8, half, batch, simplify, nms):
"""Test YOLO exports to ONNX format with various configurations and parameters.""" """Test YOLO exports to ONNX format with various configurations and parameters."""
file = YOLO(TASK2MODEL[task]).export( file = YOLO(TASK2MODEL[task]).export(
format="onnx", format="onnx", imgsz=32, dynamic=dynamic, int8=int8, half=half, batch=batch, simplify=simplify, nms=nms
imgsz=32,
dynamic=dynamic,
int8=int8,
half=half,
batch=batch,
simplify=simplify,
) )
YOLO(file)([SOURCE] * batch, imgsz=64 if dynamic else 32) # exported model inference YOLO(file)([SOURCE] * batch, imgsz=64 if dynamic else 32) # exported model inference
Path(file).unlink() # cleanup Path(file).unlink() # cleanup
@pytest.mark.slow @pytest.mark.slow
@pytest.mark.parametrize("task, dynamic, int8, half, batch", product(TASKS, [False], [False], [False], [1, 2])) @pytest.mark.parametrize(
def test_export_torchscript_matrix(task, dynamic, int8, half, batch): "task, dynamic, int8, half, batch, nms", product(TASKS, [False], [False], [False], [1, 2], [True, False])
)
def test_export_torchscript_matrix(task, dynamic, int8, half, batch, nms):
"""Tests YOLO model exports to TorchScript format under varied configurations.""" """Tests YOLO model exports to TorchScript format under varied configurations."""
file = YOLO(TASK2MODEL[task]).export( file = YOLO(TASK2MODEL[task]).export(
format="torchscript", format="torchscript", imgsz=32, dynamic=dynamic, int8=int8, half=half, batch=batch, nms=nms
imgsz=32,
dynamic=dynamic,
int8=int8,
half=half,
batch=batch,
) )
YOLO(file)([SOURCE] * 3, imgsz=64 if dynamic else 32) # exported model inference at batch=3 YOLO(file)([SOURCE] * 3, imgsz=64 if dynamic else 32) # exported model inference at batch=3
Path(file).unlink() # cleanup Path(file).unlink() # cleanup
@ -135,22 +123,19 @@ def test_export_coreml_matrix(task, dynamic, int8, half, batch):
@pytest.mark.skipif(not checks.IS_PYTHON_MINIMUM_3_10, reason="TFLite export requires Python>=3.10") @pytest.mark.skipif(not checks.IS_PYTHON_MINIMUM_3_10, reason="TFLite export requires Python>=3.10")
@pytest.mark.skipif(not LINUX, reason="Test disabled as TF suffers from install conflicts on Windows and macOS") @pytest.mark.skipif(not LINUX, reason="Test disabled as TF suffers from install conflicts on Windows and macOS")
@pytest.mark.parametrize( @pytest.mark.parametrize(
"task, dynamic, int8, half, batch", "task, dynamic, int8, half, batch, nms",
[ # generate all combinations but exclude those where both int8 and half are True [ # generate all combinations but exclude those where both int8 and half are True
(task, dynamic, int8, half, batch) (task, dynamic, int8, half, batch, nms)
for task, dynamic, int8, half, batch in product(TASKS, [False], [True, False], [True, False], [1]) for task, dynamic, int8, half, batch, nms in product(
TASKS, [False], [True, False], [True, False], [1], [True, False]
)
if not (int8 and half) # exclude cases where both int8 and half are True if not (int8 and half) # exclude cases where both int8 and half are True
], ],
) )
def test_export_tflite_matrix(task, dynamic, int8, half, batch): def test_export_tflite_matrix(task, dynamic, int8, half, batch, nms):
"""Test YOLO exports to TFLite format considering various export configurations.""" """Test YOLO exports to TFLite format considering various export configurations."""
file = YOLO(TASK2MODEL[task]).export( file = YOLO(TASK2MODEL[task]).export(
format="tflite", format="tflite", imgsz=32, dynamic=dynamic, int8=int8, half=half, batch=batch, nms=nms
imgsz=32,
dynamic=dynamic,
int8=int8,
half=half,
batch=batch,
) )
YOLO(file)([SOURCE] * batch, imgsz=32) # exported model inference at batch=3 YOLO(file)([SOURCE] * batch, imgsz=32) # exported model inference at batch=3
Path(file).unlink() # cleanup Path(file).unlink() # cleanup

View file

@ -1,6 +1,6 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license # Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
__version__ = "8.3.66" __version__ = "8.3.67"
import os import os

View file

@ -103,7 +103,7 @@ from ultralytics.utils.checks import (
) )
from ultralytics.utils.downloads import attempt_download_asset, get_github_assets, safe_download from ultralytics.utils.downloads import attempt_download_asset, get_github_assets, safe_download
from ultralytics.utils.files import file_size, spaces_in_path from ultralytics.utils.files import file_size, spaces_in_path
from ultralytics.utils.ops import Profile from ultralytics.utils.ops import Profile, nms_rotated, xywh2xyxy
from ultralytics.utils.torch_utils import TORCH_1_13, get_latest_opset, select_device from ultralytics.utils.torch_utils import TORCH_1_13, get_latest_opset, select_device
@ -111,16 +111,16 @@ def export_formats():
"""Ultralytics YOLO export formats.""" """Ultralytics YOLO export formats."""
x = [ x = [
["PyTorch", "-", ".pt", True, True, []], ["PyTorch", "-", ".pt", True, True, []],
["TorchScript", "torchscript", ".torchscript", True, True, ["batch", "optimize"]], ["TorchScript", "torchscript", ".torchscript", True, True, ["batch", "optimize", "nms"]],
["ONNX", "onnx", ".onnx", True, True, ["batch", "dynamic", "half", "opset", "simplify"]], ["ONNX", "onnx", ".onnx", True, True, ["batch", "dynamic", "half", "opset", "simplify", "nms"]],
["OpenVINO", "openvino", "_openvino_model", True, False, ["batch", "dynamic", "half", "int8"]], ["OpenVINO", "openvino", "_openvino_model", True, False, ["batch", "dynamic", "half", "int8", "nms"]],
["TensorRT", "engine", ".engine", False, True, ["batch", "dynamic", "half", "int8", "simplify"]], ["TensorRT", "engine", ".engine", False, True, ["batch", "dynamic", "half", "int8", "simplify", "nms"]],
["CoreML", "coreml", ".mlpackage", True, False, ["batch", "half", "int8", "nms"]], ["CoreML", "coreml", ".mlpackage", True, False, ["batch", "half", "int8", "nms"]],
["TensorFlow SavedModel", "saved_model", "_saved_model", True, True, ["batch", "int8", "keras"]], ["TensorFlow SavedModel", "saved_model", "_saved_model", True, True, ["batch", "int8", "keras", "nms"]],
["TensorFlow GraphDef", "pb", ".pb", True, True, ["batch"]], ["TensorFlow GraphDef", "pb", ".pb", True, True, ["batch"]],
["TensorFlow Lite", "tflite", ".tflite", True, False, ["batch", "half", "int8"]], ["TensorFlow Lite", "tflite", ".tflite", True, False, ["batch", "half", "int8", "nms"]],
["TensorFlow Edge TPU", "edgetpu", "_edgetpu.tflite", True, False, []], ["TensorFlow Edge TPU", "edgetpu", "_edgetpu.tflite", True, False, []],
["TensorFlow.js", "tfjs", "_web_model", True, False, ["batch", "half", "int8"]], ["TensorFlow.js", "tfjs", "_web_model", True, False, ["batch", "half", "int8", "nms"]],
["PaddlePaddle", "paddle", "_paddle_model", True, True, ["batch"]], ["PaddlePaddle", "paddle", "_paddle_model", True, True, ["batch"]],
["MNN", "mnn", ".mnn", True, True, ["batch", "half", "int8"]], ["MNN", "mnn", ".mnn", True, True, ["batch", "half", "int8"]],
["NCNN", "ncnn", "_ncnn_model", True, True, ["batch", "half"]], ["NCNN", "ncnn", "_ncnn_model", True, True, ["batch", "half"]],
@ -281,6 +281,11 @@ class Exporter:
) )
if self.args.int8 and tflite: if self.args.int8 and tflite:
assert not getattr(model, "end2end", False), "TFLite INT8 export not supported for end2end models." assert not getattr(model, "end2end", False), "TFLite INT8 export not supported for end2end models."
if self.args.nms:
if getattr(model, "end2end", False):
LOGGER.warning("WARNING ⚠️ 'nms=True' is not available for end2end models. Forcing 'nms=False'.")
self.args.nms = False
self.args.conf = self.args.conf or 0.25 # set conf default value for nms export
if edgetpu: if edgetpu:
if not LINUX: if not LINUX:
raise SystemError("Edge TPU export only supported on Linux. See https://coral.ai/docs/edgetpu/compiler") raise SystemError("Edge TPU export only supported on Linux. See https://coral.ai/docs/edgetpu/compiler")
@ -344,8 +349,8 @@ class Exporter:
) )
y = None y = None
for _ in range(2): for _ in range(2): # dry runs
y = model(im) # dry runs y = NMSModel(model, self.args)(im) if self.args.nms and not coreml else model(im)
if self.args.half and onnx and self.device.type != "cpu": if self.args.half and onnx and self.device.type != "cpu":
im, model = im.half(), model.half() # to FP16 im, model = im.half(), model.half() # to FP16
@ -476,7 +481,7 @@ class Exporter:
LOGGER.info(f"\n{prefix} starting export with torch {torch.__version__}...") LOGGER.info(f"\n{prefix} starting export with torch {torch.__version__}...")
f = self.file.with_suffix(".torchscript") f = self.file.with_suffix(".torchscript")
ts = torch.jit.trace(self.model, self.im, strict=False) ts = torch.jit.trace(NMSModel(self.model, self.args) if self.args.nms else self.model, self.im, strict=False)
extra_files = {"config.txt": json.dumps(self.metadata)} # torch._C.ExtraFilesMap() extra_files = {"config.txt": json.dumps(self.metadata)} # torch._C.ExtraFilesMap()
if self.args.optimize: # https://pytorch.org/tutorials/recipes/mobile_interpreter.html if self.args.optimize: # https://pytorch.org/tutorials/recipes/mobile_interpreter.html
LOGGER.info(f"{prefix} optimizing for mobile...") LOGGER.info(f"{prefix} optimizing for mobile...")
@ -499,7 +504,6 @@ class Exporter:
opset_version = self.args.opset or get_latest_opset() opset_version = self.args.opset or get_latest_opset()
LOGGER.info(f"\n{prefix} starting export with onnx {onnx.__version__} opset {opset_version}...") LOGGER.info(f"\n{prefix} starting export with onnx {onnx.__version__} opset {opset_version}...")
f = str(self.file.with_suffix(".onnx")) f = str(self.file.with_suffix(".onnx"))
output_names = ["output0", "output1"] if isinstance(self.model, SegmentationModel) else ["output0"] output_names = ["output0", "output1"] if isinstance(self.model, SegmentationModel) else ["output0"]
dynamic = self.args.dynamic dynamic = self.args.dynamic
if dynamic: if dynamic:
@ -509,9 +513,18 @@ class Exporter:
dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160) dynamic["output1"] = {0: "batch", 2: "mask_height", 3: "mask_width"} # shape(1,32,160,160)
elif isinstance(self.model, DetectionModel): elif isinstance(self.model, DetectionModel):
dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 84, 8400) dynamic["output0"] = {0: "batch", 2: "anchors"} # shape(1, 84, 8400)
if self.args.nms: # only batch size is dynamic with NMS
dynamic["output0"].pop(2)
if self.args.nms and self.model.task == "obb":
self.args.opset = opset_version # for NMSModel
# OBB error https://github.com/pytorch/pytorch/issues/110859#issuecomment-1757841865
torch.onnx.register_custom_op_symbolic("aten::lift_fresh", lambda g, x: x, opset_version)
check_requirements("onnxslim>=0.1.46") # Older versions has bug with OBB
torch.onnx.export( torch.onnx.export(
self.model.cpu() if dynamic else self.model, # dynamic=True only compatible with cpu NMSModel(self.model.cpu() if dynamic else self.model, self.args)
if self.args.nms
else self.model, # dynamic=True only compatible with cpu
self.im.cpu() if dynamic else self.im, self.im.cpu() if dynamic else self.im,
f, f,
verbose=False, verbose=False,
@ -553,7 +566,7 @@ class Exporter:
LOGGER.info(f"\n{prefix} starting export with openvino {ov.__version__}...") LOGGER.info(f"\n{prefix} starting export with openvino {ov.__version__}...")
assert TORCH_1_13, f"OpenVINO export requires torch>=1.13.0 but torch=={torch.__version__} is installed" assert TORCH_1_13, f"OpenVINO export requires torch>=1.13.0 but torch=={torch.__version__} is installed"
ov_model = ov.convert_model( ov_model = ov.convert_model(
self.model, NMSModel(self.model, self.args) if self.args.nms else self.model,
input=None if self.args.dynamic else [self.im.shape], input=None if self.args.dynamic else [self.im.shape],
example_input=self.im, example_input=self.im,
) )
@ -736,9 +749,6 @@ class Exporter:
f = self.file.with_suffix(".mlmodel" if mlmodel else ".mlpackage") f = self.file.with_suffix(".mlmodel" if mlmodel else ".mlpackage")
if f.is_dir(): if f.is_dir():
shutil.rmtree(f) shutil.rmtree(f)
if self.args.nms and getattr(self.model, "end2end", False):
LOGGER.warning(f"{prefix} WARNING ⚠️ 'nms=True' is not available for end2end models. Forcing 'nms=False'.")
self.args.nms = False
bias = [0.0, 0.0, 0.0] bias = [0.0, 0.0, 0.0]
scale = 1 / 255 scale = 1 / 255
@ -1438,8 +1448,8 @@ class Exporter:
nms.coordinatesOutputFeatureName = "coordinates" nms.coordinatesOutputFeatureName = "coordinates"
nms.iouThresholdInputFeatureName = "iouThreshold" nms.iouThresholdInputFeatureName = "iouThreshold"
nms.confidenceThresholdInputFeatureName = "confidenceThreshold" nms.confidenceThresholdInputFeatureName = "confidenceThreshold"
nms.iouThreshold = 0.45 nms.iouThreshold = self.args.iou
nms.confidenceThreshold = 0.25 nms.confidenceThreshold = self.args.conf
nms.pickTop.perClass = True nms.pickTop.perClass = True
nms.stringClassLabels.vector.extend(names.values()) nms.stringClassLabels.vector.extend(names.values())
nms_model = ct.models.MLModel(nms_spec) nms_model = ct.models.MLModel(nms_spec)
@ -1507,3 +1517,91 @@ class IOSDetectModel(torch.nn.Module):
"""Normalize predictions of object detection model with input size-dependent factors.""" """Normalize predictions of object detection model with input size-dependent factors."""
xywh, cls = self.model(x)[0].transpose(0, 1).split((4, self.nc), 1) xywh, cls = self.model(x)[0].transpose(0, 1).split((4, self.nc), 1)
return cls, xywh * self.normalize # confidence (3780, 80), coordinates (3780, 4) return cls, xywh * self.normalize # confidence (3780, 80), coordinates (3780, 4)
class NMSModel(torch.nn.Module):
"""Model wrapper with embedded NMS for Detect, Segment, Pose and OBB."""
def __init__(self, model, args):
"""
Initialize the NMSModel.
Args:
model (torch.nn.module): The model to wrap with NMS postprocessing.
args (Namespace): The export arguments.
"""
super().__init__()
self.model = model
self.args = args
self.obb = model.task == "obb"
self.is_tf = self.args.format in frozenset({"saved_model", "tflite", "tfjs"})
def forward(self, x):
"""
Performs inference with NMS post-processing. Supports Detect, Segment, OBB and Pose.
Args:
x (torch.tensor): The preprocessed tensor with shape (N, 3, H, W).
Returns:
out (torch.tensor): The post-processed results with shape (N, max_det, 4 + 2 + extra_shape).
"""
from functools import partial
from torchvision.ops import nms
preds = self.model(x)
pred = preds[0] if isinstance(preds, tuple) else preds
pred = pred.transpose(-1, -2) # shape(1,84,6300) to shape(1,6300,84)
extra_shape = pred.shape[-1] - (4 + self.model.nc) # extras from Segment, OBB, Pose
boxes, scores, extras = pred.split([4, self.model.nc, extra_shape], dim=2)
scores, classes = scores.max(dim=-1)
# (N, max_det, 4 coords + 1 class score + 1 class label + extra_shape).
out = torch.zeros(
boxes.shape[0],
self.args.max_det,
boxes.shape[-1] + 2 + extra_shape,
device=boxes.device,
dtype=boxes.dtype,
)
for i, (box, cls, score, extra) in enumerate(zip(boxes, classes, scores, extras)):
mask = score > self.args.conf
if self.is_tf:
# TFLite GatherND error if mask is empty
score *= mask
# Explicit length otherwise reshape error, hardcoded to `self.args.max_det * 5`
mask = score.topk(self.args.max_det * 5).indices
box, score, cls, extra = box[mask], score[mask], cls[mask], extra[mask]
if not self.obb:
box = xywh2xyxy(box)
if self.is_tf:
# TFlite bug returns less boxes
box = torch.nn.functional.pad(box, (0, 0, 0, mask.shape[0] - box.shape[0]))
nmsbox = box.clone()
# `8` is the minimum value experimented to get correct NMS results for obb
multiplier = 8 if self.obb else 1
# Normalize boxes for NMS since large values for class offset causes issue with int8 quantization
if self.args.format == "tflite": # TFLite is already normalized
nmsbox *= multiplier
else:
nmsbox = multiplier * nmsbox / torch.tensor(x.shape[2:], device=box.device, dtype=box.dtype).max()
if not self.args.agnostic_nms: # class-specific NMS
end = 2 if self.obb else 4
# fully explicit expansion otherwise reshape error
# large max_wh causes issues when quantizing
cls_offset = cls.reshape(-1, 1).expand(nmsbox.shape[0], end)
offbox = nmsbox[:, :end] + cls_offset * multiplier
nmsbox = torch.cat((offbox, nmsbox[:, end:]), dim=-1)
nms_fn = (
partial(nms_rotated, use_triu=not (self.is_tf or (self.args.opset or 14) < 14)) if self.obb else nms
)
keep = nms_fn(
torch.cat([nmsbox, extra], dim=-1) if self.obb else nmsbox,
score,
self.args.iou,
)[: self.args.max_det]
dets = torch.cat([box[keep], score[keep].view(-1, 1), cls[keep].view(-1, 1), extra[keep]], dim=-1)
# Zero-pad to max_det size to avoid reshape error
pad = (0, 0, 0, self.args.max_det - dets.shape[0])
out[i] = torch.nn.functional.pad(dets, pad)
return (out, preds[1]) if self.model.task == "segment" else out

View file

@ -305,7 +305,7 @@ class Results(SimpleClass):
if v is not None: if v is not None:
return len(v) return len(v)
def update(self, boxes=None, masks=None, probs=None, obb=None): def update(self, boxes=None, masks=None, probs=None, obb=None, keypoints=None):
""" """
Updates the Results object with new detection data. Updates the Results object with new detection data.
@ -318,6 +318,7 @@ class Results(SimpleClass):
masks (torch.Tensor | None): A tensor of shape (N, H, W) containing segmentation masks. masks (torch.Tensor | None): A tensor of shape (N, H, W) containing segmentation masks.
probs (torch.Tensor | None): A tensor of shape (num_classes,) containing class probabilities. probs (torch.Tensor | None): A tensor of shape (num_classes,) containing class probabilities.
obb (torch.Tensor | None): A tensor of shape (N, 5) containing oriented bounding box coordinates. obb (torch.Tensor | None): A tensor of shape (N, 5) containing oriented bounding box coordinates.
keypoints (torch.Tensor | None): A tensor of shape (N, 17, 3) containing keypoints.
Examples: Examples:
>>> results = model("image.jpg") >>> results = model("image.jpg")
@ -332,6 +333,8 @@ class Results(SimpleClass):
self.probs = probs self.probs = probs
if obb is not None: if obb is not None:
self.obb = OBB(obb, self.orig_shape) self.obb = OBB(obb, self.orig_shape)
if keypoints is not None:
self.keypoints = Keypoints(keypoints, self.orig_shape)
def _apply(self, fn, *args, **kwargs): def _apply(self, fn, *args, **kwargs):
""" """

View file

@ -38,13 +38,7 @@ class NASValidator(DetectionValidator):
"""Apply Non-maximum suppression to prediction outputs.""" """Apply Non-maximum suppression to prediction outputs."""
boxes = ops.xyxy2xywh(preds_in[0][0]) boxes = ops.xyxy2xywh(preds_in[0][0])
preds = torch.cat((boxes, preds_in[0][1]), -1).permute(0, 2, 1) preds = torch.cat((boxes, preds_in[0][1]), -1).permute(0, 2, 1)
return ops.non_max_suppression( return super().postprocess(
preds, preds,
self.args.conf,
self.args.iou,
labels=self.lb,
multi_label=False,
agnostic=self.args.single_cls or self.args.agnostic_nms,
max_det=self.args.max_det,
max_time_img=0.5, max_time_img=0.5,
) )

View file

@ -20,22 +20,54 @@ class DetectionPredictor(BasePredictor):
``` ```
""" """
def postprocess(self, preds, img, orig_imgs): def postprocess(self, preds, img, orig_imgs, **kwargs):
"""Post-processes predictions and returns a list of Results objects.""" """Post-processes predictions and returns a list of Results objects."""
preds = ops.non_max_suppression( preds = ops.non_max_suppression(
preds, preds,
self.args.conf, self.args.conf,
self.args.iou, self.args.iou,
agnostic=self.args.agnostic_nms, self.args.classes,
self.args.agnostic_nms,
max_det=self.args.max_det, max_det=self.args.max_det,
classes=self.args.classes, nc=len(self.model.names),
end2end=getattr(self.model, "end2end", False),
rotated=self.args.task == "obb",
) )
if not isinstance(orig_imgs, list): # input images are a torch.Tensor, not a list if not isinstance(orig_imgs, list): # input images are a torch.Tensor, not a list
orig_imgs = ops.convert_torch2numpy_batch(orig_imgs) orig_imgs = ops.convert_torch2numpy_batch(orig_imgs)
results = [] return self.construct_results(preds, img, orig_imgs, **kwargs)
for pred, orig_img, img_path in zip(preds, orig_imgs, self.batch[0]):
pred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], orig_img.shape) def construct_results(self, preds, img, orig_imgs):
results.append(Results(orig_img, path=img_path, names=self.model.names, boxes=pred)) """
return results Constructs a list of result objects from the predictions.
Args:
preds (List[torch.Tensor]): List of predicted bounding boxes and scores.
img (torch.Tensor): The image after preprocessing.
orig_imgs (List[np.ndarray]): List of original images before preprocessing.
Returns:
(list): List of result objects containing the original images, image paths, class names, and bounding boxes.
"""
return [
self.construct_result(pred, img, orig_img, img_path)
for pred, orig_img, img_path in zip(preds, orig_imgs, self.batch[0])
]
def construct_result(self, pred, img, orig_img, img_path):
"""
Constructs the result object from the prediction.
Args:
pred (torch.Tensor): The predicted bounding boxes and scores.
img (torch.Tensor): The image after preprocessing.
orig_img (np.ndarray): The original image before preprocessing.
img_path (str): The path to the original image.
Returns:
(Results): The result object containing the original image, image path, class names, and bounding boxes.
"""
pred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], orig_img.shape)
return Results(orig_img, path=img_path, names=self.model.names, boxes=pred[:, :6])

View file

@ -78,6 +78,7 @@ class DetectionValidator(BaseValidator):
self.args.save_json |= self.args.val and (self.is_coco or self.is_lvis) and not self.training # run final val self.args.save_json |= self.args.val and (self.is_coco or self.is_lvis) and not self.training # run final val
self.names = model.names self.names = model.names
self.nc = len(model.names) self.nc = len(model.names)
self.end2end = getattr(model, "end2end", False)
self.metrics.names = self.names self.metrics.names = self.names
self.metrics.plot = self.args.plots self.metrics.plot = self.args.plots
self.confusion_matrix = ConfusionMatrix(nc=self.nc, conf=self.args.conf) self.confusion_matrix = ConfusionMatrix(nc=self.nc, conf=self.args.conf)
@ -96,9 +97,12 @@ class DetectionValidator(BaseValidator):
self.args.conf, self.args.conf,
self.args.iou, self.args.iou,
labels=self.lb, labels=self.lb,
nc=self.nc,
multi_label=True, multi_label=True,
agnostic=self.args.single_cls or self.args.agnostic_nms, agnostic=self.args.single_cls or self.args.agnostic_nms,
max_det=self.args.max_det, max_det=self.args.max_det,
end2end=self.end2end,
rotated=self.args.task == "obb",
) )
def _prepare_batch(self, si, batch): def _prepare_batch(self, si, batch):

View file

@ -27,27 +27,20 @@ class OBBPredictor(DetectionPredictor):
super().__init__(cfg, overrides, _callbacks) super().__init__(cfg, overrides, _callbacks)
self.args.task = "obb" self.args.task = "obb"
def postprocess(self, preds, img, orig_imgs): def construct_result(self, pred, img, orig_img, img_path):
"""Post-processes predictions and returns a list of Results objects.""" """
preds = ops.non_max_suppression( Constructs the result object from the prediction.
preds,
self.args.conf,
self.args.iou,
agnostic=self.args.agnostic_nms,
max_det=self.args.max_det,
nc=len(self.model.names),
classes=self.args.classes,
rotated=True,
)
if not isinstance(orig_imgs, list): # input images are a torch.Tensor, not a list Args:
orig_imgs = ops.convert_torch2numpy_batch(orig_imgs) pred (torch.Tensor): The predicted bounding boxes, scores, and rotation angles.
img (torch.Tensor): The image after preprocessing.
orig_img (np.ndarray): The original image before preprocessing.
img_path (str): The path to the original image.
results = [] Returns:
for pred, orig_img, img_path in zip(preds, orig_imgs, self.batch[0]): (Results): The result object containing the original image, image path, class names, and oriented bounding boxes.
rboxes = ops.regularize_rboxes(torch.cat([pred[:, :4], pred[:, -1:]], dim=-1)) """
rboxes[:, :4] = ops.scale_boxes(img.shape[2:], rboxes[:, :4], orig_img.shape, xywh=True) rboxes = ops.regularize_rboxes(torch.cat([pred[:, :4], pred[:, -1:]], dim=-1))
# xywh, r, conf, cls rboxes[:, :4] = ops.scale_boxes(img.shape[2:], rboxes[:, :4], orig_img.shape, xywh=True)
obb = torch.cat([rboxes, pred[:, 4:6]], dim=-1) obb = torch.cat([rboxes, pred[:, 4:6]], dim=-1)
results.append(Results(orig_img, path=img_path, names=self.model.names, obb=obb)) return Results(orig_img, path=img_path, names=self.model.names, obb=obb)
return results

View file

@ -36,20 +36,6 @@ class OBBValidator(DetectionValidator):
val = self.data.get(self.args.split, "") # validation path val = self.data.get(self.args.split, "") # validation path
self.is_dota = isinstance(val, str) and "DOTA" in val # is COCO self.is_dota = isinstance(val, str) and "DOTA" in val # is COCO
def postprocess(self, preds):
"""Apply Non-maximum suppression to prediction outputs."""
return ops.non_max_suppression(
preds,
self.args.conf,
self.args.iou,
labels=self.lb,
nc=self.nc,
multi_label=True,
agnostic=self.args.single_cls or self.args.agnostic_nms,
max_det=self.args.max_det,
rotated=True,
)
def _process_batch(self, detections, gt_bboxes, gt_cls): def _process_batch(self, detections, gt_bboxes, gt_cls):
""" """
Perform computation of the correct prediction matrix for a batch of detections and ground truth bounding boxes. Perform computation of the correct prediction matrix for a batch of detections and ground truth bounding boxes.

View file

@ -1,6 +1,5 @@
# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license # Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
from ultralytics.engine.results import Results
from ultralytics.models.yolo.detect.predict import DetectionPredictor from ultralytics.models.yolo.detect.predict import DetectionPredictor
from ultralytics.utils import DEFAULT_CFG, LOGGER, ops from ultralytics.utils import DEFAULT_CFG, LOGGER, ops
@ -30,27 +29,21 @@ class PosePredictor(DetectionPredictor):
"See https://github.com/ultralytics/ultralytics/issues/4031." "See https://github.com/ultralytics/ultralytics/issues/4031."
) )
def postprocess(self, preds, img, orig_imgs): def construct_result(self, pred, img, orig_img, img_path):
"""Return detection results for a given input image or list of images.""" """
preds = ops.non_max_suppression( Constructs the result object from the prediction.
preds,
self.args.conf,
self.args.iou,
agnostic=self.args.agnostic_nms,
max_det=self.args.max_det,
classes=self.args.classes,
nc=len(self.model.names),
)
if not isinstance(orig_imgs, list): # input images are a torch.Tensor, not a list Args:
orig_imgs = ops.convert_torch2numpy_batch(orig_imgs) pred (torch.Tensor): The predicted bounding boxes, scores, and keypoints.
img (torch.Tensor): The image after preprocessing.
orig_img (np.ndarray): The original image before preprocessing.
img_path (str): The path to the original image.
results = [] Returns:
for pred, orig_img, img_path in zip(preds, orig_imgs, self.batch[0]): (Results): The result object containing the original image, image path, class names, bounding boxes, and keypoints.
pred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], orig_img.shape).round() """
pred_kpts = pred[:, 6:].view(len(pred), *self.model.kpt_shape) if len(pred) else pred[:, 6:] result = super().construct_result(pred, img, orig_img, img_path)
pred_kpts = ops.scale_coords(img.shape[2:], pred_kpts, orig_img.shape) pred_kpts = pred[:, 6:].view(len(pred), *self.model.kpt_shape) if len(pred) else pred[:, 6:]
results.append( pred_kpts = ops.scale_coords(img.shape[2:], pred_kpts, orig_img.shape)
Results(orig_img, path=img_path, names=self.model.names, boxes=pred[:, :6], keypoints=pred_kpts) result.update(keypoints=pred_kpts)
) return result
return results

View file

@ -61,19 +61,6 @@ class PoseValidator(DetectionValidator):
"mAP50-95)", "mAP50-95)",
) )
def postprocess(self, preds):
"""Apply non-maximum suppression and return detections with high confidence scores."""
return ops.non_max_suppression(
preds,
self.args.conf,
self.args.iou,
labels=self.lb,
multi_label=True,
agnostic=self.args.single_cls or self.args.agnostic_nms,
max_det=self.args.max_det,
nc=self.nc,
)
def init_metrics(self, model): def init_metrics(self, model):
"""Initiate pose estimation metrics for YOLO model.""" """Initiate pose estimation metrics for YOLO model."""
super().init_metrics(model) super().init_metrics(model)

View file

@ -27,29 +27,48 @@ class SegmentationPredictor(DetectionPredictor):
def postprocess(self, preds, img, orig_imgs): def postprocess(self, preds, img, orig_imgs):
"""Applies non-max suppression and processes detections for each image in an input batch.""" """Applies non-max suppression and processes detections for each image in an input batch."""
p = ops.non_max_suppression( # tuple if PyTorch model or array if exported
preds[0], protos = preds[1][-1] if isinstance(preds[1], tuple) else preds[1]
self.args.conf, return super().postprocess(preds[0], img, orig_imgs, protos=protos)
self.args.iou,
agnostic=self.args.agnostic_nms,
max_det=self.args.max_det,
nc=len(self.model.names),
classes=self.args.classes,
)
if not isinstance(orig_imgs, list): # input images are a torch.Tensor, not a list def construct_results(self, preds, img, orig_imgs, protos):
orig_imgs = ops.convert_torch2numpy_batch(orig_imgs) """
Constructs a list of result objects from the predictions.
results = [] Args:
proto = preds[1][-1] if isinstance(preds[1], tuple) else preds[1] # tuple if PyTorch model or array if exported preds (List[torch.Tensor]): List of predicted bounding boxes, scores, and masks.
for i, (pred, orig_img, img_path) in enumerate(zip(p, orig_imgs, self.batch[0])): img (torch.Tensor): The image after preprocessing.
if not len(pred): # save empty boxes orig_imgs (List[np.ndarray]): List of original images before preprocessing.
masks = None protos (List[torch.Tensor]): List of prototype masks.
elif self.args.retina_masks:
pred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], orig_img.shape) Returns:
masks = ops.process_mask_native(proto[i], pred[:, 6:], pred[:, :4], orig_img.shape[:2]) # HWC (list): List of result objects containing the original images, image paths, class names, bounding boxes, and masks.
else: """
masks = ops.process_mask(proto[i], pred[:, 6:], pred[:, :4], img.shape[2:], upsample=True) # HWC return [
pred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], orig_img.shape) self.construct_result(pred, img, orig_img, img_path, proto)
results.append(Results(orig_img, path=img_path, names=self.model.names, boxes=pred[:, :6], masks=masks)) for pred, orig_img, img_path, proto in zip(preds, orig_imgs, self.batch[0], protos)
return results ]
def construct_result(self, pred, img, orig_img, img_path, proto):
"""
Constructs the result object from the prediction.
Args:
pred (np.ndarray): The predicted bounding boxes, scores, and masks.
img (torch.Tensor): The image after preprocessing.
orig_img (np.ndarray): The original image before preprocessing.
img_path (str): The path to the original image.
proto (torch.Tensor): The prototype masks.
Returns:
(Results): The result object containing the original image, image path, class names, bounding boxes, and masks.
"""
if not len(pred): # save empty boxes
masks = None
elif self.args.retina_masks:
pred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], orig_img.shape)
masks = ops.process_mask_native(proto, pred[:, 6:], pred[:, :4], orig_img.shape[:2]) # HWC
else:
masks = ops.process_mask(proto, pred[:, 6:], pred[:, :4], img.shape[2:], upsample=True) # HWC
pred[:, :4] = ops.scale_boxes(img.shape[2:], pred[:, :4], orig_img.shape)
return Results(orig_img, path=img_path, names=self.model.names, boxes=pred[:, :6], masks=masks)

View file

@ -70,16 +70,7 @@ class SegmentationValidator(DetectionValidator):
def postprocess(self, preds): def postprocess(self, preds):
"""Post-processes YOLO predictions and returns output detections with proto.""" """Post-processes YOLO predictions and returns output detections with proto."""
p = ops.non_max_suppression( p = super().postprocess(preds[0])
preds[0],
self.args.conf,
self.args.iou,
labels=self.lb,
multi_label=True,
agnostic=self.args.single_cls or self.args.agnostic_nms,
max_det=self.args.max_det,
nc=self.nc,
)
proto = preds[1][-1] if len(preds[1]) == 3 else preds[1] # second output is len 3 if pt, but only 1 if exported proto = preds[1][-1] if len(preds[1]) == 3 else preds[1] # second output is len 3 if pt, but only 1 if exported
return p, proto return p, proto

View file

@ -132,6 +132,7 @@ class AutoBackend(nn.Module):
fp16 &= pt or jit or onnx or xml or engine or nn_module or triton # FP16 fp16 &= pt or jit or onnx or xml or engine or nn_module or triton # FP16
nhwc = coreml or saved_model or pb or tflite or edgetpu or rknn # BHWC formats (vs torch BCWH) nhwc = coreml or saved_model or pb or tflite or edgetpu or rknn # BHWC formats (vs torch BCWH)
stride = 32 # default stride stride = 32 # default stride
end2end = False # default end2end
model, metadata, task = None, None, None model, metadata, task = None, None, None
# Set device # Set device
@ -222,16 +223,18 @@ class AutoBackend(nn.Module):
output_names = [x.name for x in session.get_outputs()] output_names = [x.name for x in session.get_outputs()]
metadata = session.get_modelmeta().custom_metadata_map metadata = session.get_modelmeta().custom_metadata_map
dynamic = isinstance(session.get_outputs()[0].shape[0], str) dynamic = isinstance(session.get_outputs()[0].shape[0], str)
fp16 = True if "float16" in session.get_inputs()[0].type else False
if not dynamic: if not dynamic:
io = session.io_binding() io = session.io_binding()
bindings = [] bindings = []
for output in session.get_outputs(): for output in session.get_outputs():
y_tensor = torch.empty(output.shape, dtype=torch.float16 if fp16 else torch.float32).to(device) out_fp16 = "float16" in output.type
y_tensor = torch.empty(output.shape, dtype=torch.float16 if out_fp16 else torch.float32).to(device)
io.bind_output( io.bind_output(
name=output.name, name=output.name,
device_type=device.type, device_type=device.type,
device_id=device.index if cuda else 0, device_id=device.index if cuda else 0,
element_type=np.float16 if fp16 else np.float32, element_type=np.float16 if out_fp16 else np.float32,
shape=tuple(y_tensor.shape), shape=tuple(y_tensor.shape),
buffer_ptr=y_tensor.data_ptr(), buffer_ptr=y_tensor.data_ptr(),
) )
@ -501,7 +504,7 @@ class AutoBackend(nn.Module):
for k, v in metadata.items(): for k, v in metadata.items():
if k in {"stride", "batch"}: if k in {"stride", "batch"}:
metadata[k] = int(v) metadata[k] = int(v)
elif k in {"imgsz", "names", "kpt_shape"} and isinstance(v, str): elif k in {"imgsz", "names", "kpt_shape", "args"} and isinstance(v, str):
metadata[k] = eval(v) metadata[k] = eval(v)
stride = metadata["stride"] stride = metadata["stride"]
task = metadata["task"] task = metadata["task"]
@ -509,6 +512,7 @@ class AutoBackend(nn.Module):
imgsz = metadata["imgsz"] imgsz = metadata["imgsz"]
names = metadata["names"] names = metadata["names"]
kpt_shape = metadata.get("kpt_shape") kpt_shape = metadata.get("kpt_shape")
end2end = metadata.get("args", {}).get("nms", False)
elif not (pt or triton or nn_module): elif not (pt or triton or nn_module):
LOGGER.warning(f"WARNING ⚠️ Metadata not found for 'model={weights}'") LOGGER.warning(f"WARNING ⚠️ Metadata not found for 'model={weights}'")
@ -703,9 +707,12 @@ class AutoBackend(nn.Module):
if x.ndim == 3: # if task is not classification, excluding masks (ndim=4) as well if x.ndim == 3: # if task is not classification, excluding masks (ndim=4) as well
# Denormalize xywh by image size. See https://github.com/ultralytics/ultralytics/pull/1695 # Denormalize xywh by image size. See https://github.com/ultralytics/ultralytics/pull/1695
# xywh are normalized in TFLite/EdgeTPU to mitigate quantization error of integer models # xywh are normalized in TFLite/EdgeTPU to mitigate quantization error of integer models
if x.shape[-1] == 6: # end-to-end model if x.shape[-1] == 6 or self.end2end: # end-to-end model
x[:, :, [0, 2]] *= w x[:, :, [0, 2]] *= w
x[:, :, [1, 3]] *= h x[:, :, [1, 3]] *= h
if self.task == "pose":
x[:, :, 6::3] *= w
x[:, :, 7::3] *= h
else: else:
x[:, [0, 2]] *= w x[:, [0, 2]] *= w
x[:, [1, 3]] *= h x[:, [1, 3]] *= h

View file

@ -143,7 +143,7 @@ def make_divisible(x, divisor):
return math.ceil(x / divisor) * divisor return math.ceil(x / divisor) * divisor
def nms_rotated(boxes, scores, threshold=0.45): def nms_rotated(boxes, scores, threshold=0.45, use_triu=True):
""" """
NMS for oriented bounding boxes using probiou and fast-nms. NMS for oriented bounding boxes using probiou and fast-nms.
@ -151,16 +151,30 @@ def nms_rotated(boxes, scores, threshold=0.45):
boxes (torch.Tensor): Rotated bounding boxes, shape (N, 5), format xywhr. boxes (torch.Tensor): Rotated bounding boxes, shape (N, 5), format xywhr.
scores (torch.Tensor): Confidence scores, shape (N,). scores (torch.Tensor): Confidence scores, shape (N,).
threshold (float, optional): IoU threshold. Defaults to 0.45. threshold (float, optional): IoU threshold. Defaults to 0.45.
use_triu (bool, optional): Whether to use `torch.triu` operator. It'd be useful for disable it
when exporting obb models to some formats that do not support `torch.triu`.
Returns: Returns:
(torch.Tensor): Indices of boxes to keep after NMS. (torch.Tensor): Indices of boxes to keep after NMS.
""" """
if len(boxes) == 0:
return np.empty((0,), dtype=np.int8)
sorted_idx = torch.argsort(scores, descending=True) sorted_idx = torch.argsort(scores, descending=True)
boxes = boxes[sorted_idx] boxes = boxes[sorted_idx]
ious = batch_probiou(boxes, boxes).triu_(diagonal=1) ious = batch_probiou(boxes, boxes)
pick = torch.nonzero(ious.max(dim=0)[0] < threshold).squeeze_(-1) if use_triu:
ious = ious.triu_(diagonal=1)
# pick = torch.nonzero(ious.max(dim=0)[0] < threshold).squeeze_(-1)
# NOTE: handle the case when len(boxes) hence exportable by eliminating if-else condition
pick = torch.nonzero((ious >= threshold).sum(0) <= 0).squeeze_(-1)
else:
n = boxes.shape[0]
row_idx = torch.arange(n, device=boxes.device).view(-1, 1).expand(-1, n)
col_idx = torch.arange(n, device=boxes.device).view(1, -1).expand(n, -1)
upper_mask = row_idx < col_idx
ious = ious * upper_mask
# Zeroing these scores ensures the additional indices would not affect the final results
scores[~((ious >= threshold).sum(0) <= 0)] = 0
# NOTE: return indices with fixed length to avoid TFLite reshape error
pick = torch.topk(scores, scores.shape[0]).indices
return sorted_idx[pick] return sorted_idx[pick]
@ -179,6 +193,7 @@ def non_max_suppression(
max_wh=7680, max_wh=7680,
in_place=True, in_place=True,
rotated=False, rotated=False,
end2end=False,
): ):
""" """
Perform non-maximum suppression (NMS) on a set of boxes, with support for masks and multiple labels per box. Perform non-maximum suppression (NMS) on a set of boxes, with support for masks and multiple labels per box.
@ -205,6 +220,7 @@ def non_max_suppression(
max_wh (int): The maximum box width and height in pixels. max_wh (int): The maximum box width and height in pixels.
in_place (bool): If True, the input prediction tensor will be modified in place. in_place (bool): If True, the input prediction tensor will be modified in place.
rotated (bool): If Oriented Bounding Boxes (OBB) are being passed for NMS. rotated (bool): If Oriented Bounding Boxes (OBB) are being passed for NMS.
end2end (bool): If the model doesn't require NMS.
Returns: Returns:
(List[torch.Tensor]): A list of length batch_size, where each element is a tensor of (List[torch.Tensor]): A list of length batch_size, where each element is a tensor of
@ -221,7 +237,7 @@ def non_max_suppression(
if classes is not None: if classes is not None:
classes = torch.tensor(classes, device=prediction.device) classes = torch.tensor(classes, device=prediction.device)
if prediction.shape[-1] == 6: # end-to-end model (BNC, i.e. 1,300,6) if prediction.shape[-1] == 6 or end2end: # end-to-end model (BNC, i.e. 1,300,6)
output = [pred[pred[:, 4] > conf_thres][:max_det] for pred in prediction] output = [pred[pred[:, 4] > conf_thres][:max_det] for pred in prediction]
if classes is not None: if classes is not None:
output = [pred[(pred[:, 5:6] == classes).any(1)] for pred in output] output = [pred[(pred[:, 5:6] == classes).any(1)] for pred in output]