Add FAQ sections to Modes and Tasks (#14181)

Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com> Co-authored-by: UltralyticsAssistant <web@ultralytics.com> Co-authored-by: Abirami Vina <abirami.vina@gmail.com> Co-authored-by: RizwanMunawar <chr043416@gmail.com> Co-authored-by: Muhammad Rizwan Munawar <muhammadrizwanmunawar123@gmail.com>
2024-07-04 17:16:16 +02:00 · 2024-07-04 17:16:16 +02:00 · 6c13bea7b8
commit 6c13bea7b8
parent e285d3d1b2
39 changed files with 2247 additions and 481 deletions
--- a/docs/en/models/yolo-world.md
+++ b/docs/en/models/yolo-world.md
@ -338,41 +338,13 @@ For further reading, the original YOLO-World paper is available on [arXiv](https

 ## FAQ

-### What is the YOLO-World Model and how does it improve open-vocabulary object detection?
+### What is the YOLO-World model and how does it work?

-The YOLO-World Model is an advanced, real-time object detection model based on Ultralytics YOLOv8, designed specifically for open-vocabulary detection tasks. It leverages vision-language modeling and pre-training on large datasets to detect a broad range of objects based on descriptive texts, significantly reducing computational demands while maintaining high performance. This makes it suitable for real-time applications across various industries needing immediate results.
+The YOLO-World model is an advanced, real-time object detection approach based on the [Ultralytics YOLOv8](yolov8.md) framework. It excels in Open-Vocabulary Detection tasks by identifying objects within an image based on descriptive texts. Using vision-language modeling and pre-training on large datasets, YOLO-World achieves high efficiency and performance with significantly reduced computational demands, making it ideal for real-time applications across various industries.

-### How do I train a custom YOLO-World Model using the Ultralytics API?
+### How does YOLO-World handle inference with custom prompts?

-Training a custom YOLO-World Model is straightforward with Ultralytics' API. You can use pretrained weights and configuration files to start training on your dataset. Here is an example of training with Python:
-
-```python
-from ultralytics import YOLOWorld
-
-# Load a pretrained YOLOv8s-worldv2 model
-model = YOLOWorld("yolov8s-worldv2.pt")
-
-# Train the model on the COCO8 example dataset for 100 epochs
-results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
-
-# Run inference with the YOLOv8s-worldv2 model on an image
-results = model("path/to/image.jpg")
-```
-
-Refer to the [Training](../modes/train.md) page for more details.
-
-### What are the main advantages of using YOLO-World for object detection?
-
-YOLO-World offers several advantages:
-
- **Real-time detection**: Utilizes CNNs for high-speed inference.
- **Lower computational demand**: Efficiently processes images with minimal resources.
- **Open-vocabulary detection**: Detects objects without predefined categories, based on descriptive texts.
- **High performance**: Outperforms other models on standard benchmarks while running on a single NVIDIA V100 GPU.
-
-### Can I customize the classes YOLO-World detects without retraining the model?
-
-Yes, YOLO-World allows you to dynamically specify detection classes through custom prompts without retraining the model. Here's an example of how to set custom classes:
+YOLO-World supports a "prompt-then-detect" strategy, which utilizes an offline vocabulary to enhance efficiency. Custom prompts like captions or specific object categories are pre-encoded and stored as offline vocabulary embeddings. This approach streamlines the detection process without the need for retraining. You can dynamically set these prompts within the model to tailor it to specific detection tasks, as shown below:

 ```python
 from ultralytics import YOLOWorld
@ -385,11 +357,80 @@ model.set_classes(["person", "bus"])

 # Execute prediction on an image
 results = model.predict("path/to/image.jpg")
+
+# Show results
 results[0].show()
 ```

-You can learn more about this feature on the [Predict Usage](#predict-usage) section.
+### Why should I choose YOLO-World over traditional Open-Vocabulary detection models?

-### What datasets are supported for training YOLO-World from scratch?
+YOLO-World provides several advantages over traditional Open-Vocabulary detection models:

-YOLO-World supports various datasets for training, including Objects365, GQA, and Flickr30k for detection and grounding tasks. For validation, it supports datasets like LVIS minival. Detailed information about preparing and using these datasets can be found in the [Zero-shot Transfer on COCO Dataset](#zero-shot-transfer-on-coco-dataset) section.
+- **Real-Time Performance:** It leverages the computational speed of CNNs to offer quick, efficient detection.
+- **Efficiency and Low Resource Requirement:** YOLO-World maintains high performance while significantly reducing computational and resource demands.
+- **Customizable Prompts:** The model supports dynamic prompt setting, allowing users to specify custom detection classes without retraining.
+- **Benchmark Excellence:** It outperforms other open-vocabulary detectors like MDETR and GLIP in both speed and efficiency on standard benchmarks.
+
+### How do I train a YOLO-World model on my dataset?
+
+Training a YOLO-World model on your dataset is straightforward through the provided Python API or CLI commands. Here's how to start training using Python:
+
+```python
+from ultralytics import YOLOWorld
+
+# Load a pretrained YOLOv8s-worldv2 model
+model = YOLOWorld("yolov8s-worldv2.pt")
+
+# Train the model on the COCO8 dataset for 100 epochs
+results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
+```
+
+Or using CLI:
+
+```bash
+yolo train model=yolov8s-worldv2.yaml data=coco8.yaml epochs=100 imgsz=640
+```
+
+### What are the available pre-trained YOLO-World models and their supported tasks?
+
+Ultralytics offers multiple pre-trained YOLO-World models supporting various tasks and operating modes:
+
+| Model Type      | Pre-trained Weights                                                                                     | Tasks Supported                        | Inference | Validation | Training | Export |
+| --------------- | ------------------------------------------------------------------------------------------------------- | -------------------------------------- | --------- | ---------- | -------- | ------ |
+| YOLOv8s-world   | [yolov8s-world.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s-world.pt)     | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ❌     |
+| YOLOv8s-worldv2 | [yolov8s-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8s-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ✅     |
+| YOLOv8m-world   | [yolov8m-world.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8m-world.pt)     | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ❌     |
+| YOLOv8m-worldv2 | [yolov8m-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8m-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ✅     |
+| YOLOv8l-world   | [yolov8l-world.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8l-world.pt)     | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ❌     |
+| YOLOv8l-worldv2 | [yolov8l-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8l-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ✅     |
+| YOLOv8x-world   | [yolov8x-world.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8x-world.pt)     | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ❌     |
+| YOLOv8x-worldv2 | [yolov8x-worldv2.pt](https://github.com/ultralytics/assets/releases/download/v8.2.0/yolov8x-worldv2.pt) | [Object Detection](../tasks/detect.md) | ✅        | ✅         | ✅       | ✅     |
+
+### How do I reproduce the official results of YOLO-World from scratch?
+
+To reproduce the official results from scratch, you need to prepare the datasets and launch the training using the provided code. The training procedure involves creating a data dictionary and running the `train` method with a custom trainer:
+
+```python
+from ultralytics import YOLOWorld
+from ultralytics.models.yolo.world.train_world import WorldTrainerFromScratch
+
+data = {
+    "train": {
+        "yolo_data": ["Objects365.yaml"],
+        "grounding_data": [
+            {
+                "img_path": "../datasets/flickr30k/images",
+                "json_file": "../datasets/flickr30k/final_flickr_separateGT_train.json",
+            },
+            {
+                "img_path": "../datasets/GQA/images",
+                "json_file": "../datasets/GQA/final_mixed_train_no_coco.json",
+            },
+        ],
+    },
+    "val": {"yolo_data": ["lvis.yaml"]},
+}
+
+model = YOLOWorld("yolov8s-worldv2.yaml")
+model.train(data=data, batch=128, epochs=100, trainer=WorldTrainerFromScratch)
+```