ultralytics 8.1.39 add YOLO-World training (#9268)
Signed-off-by: Glenn Jocher <glenn.jocher@ultralytics.com> Co-authored-by: UltralyticsAssistant <web@ultralytics.com> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
This commit is contained in:
parent
18036908d4
commit
e9187c1296
34 changed files with 2166 additions and 100 deletions
|
|
@ -147,7 +147,7 @@ FastSAM is also available directly from the [https://github.com/CASIA-IVA-Lab/Fa
|
|||
|
||||
4. Install the CLIP model:
|
||||
```shell
|
||||
pip install git+https://github.com/openai/CLIP.git
|
||||
pip install git+https://github.com/ultralytics/CLIP.git
|
||||
```
|
||||
|
||||
### Example Usage
|
||||
|
|
|
|||
|
|
@ -64,6 +64,39 @@ This section details the models available with their specific pre-trained weight
|
|||
|
||||
The YOLO-World models are easy to integrate into your Python applications. Ultralytics provides user-friendly Python API and CLI commands to streamline development.
|
||||
|
||||
### Train Usage
|
||||
|
||||
!!! Tip "Tip"
|
||||
|
||||
We strongly recommend to use `yolov8-worldv2` model for custom training, because it supports deterministic training and also easy to export other formats i.e onnx/tensorrt.
|
||||
|
||||
Object detection is straightforward with the `train` method, as illustrated below:
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
PyTorch pretrained `*.pt` models as well as configuration `*.yaml` files can be passed to the `YOLOWorld()` class to create a model instance in python:
|
||||
|
||||
```python
|
||||
from ultralytics import YOLOWorld
|
||||
|
||||
# Load a pretrained YOLOv8s-worldv2 model
|
||||
model = YOLOWorld('yolov8s-worldv2.pt')
|
||||
|
||||
# Train the model on the COCO8 example dataset for 100 epochs
|
||||
results = model.train(data='coco8.yaml', epochs=100, imgsz=640)
|
||||
|
||||
# Run inference with the YOLOv8n model on the 'bus.jpg' image
|
||||
results = model('path/to/bus.jpg')
|
||||
```
|
||||
|
||||
=== "CLI"
|
||||
|
||||
```bash
|
||||
# Load a pretrained YOLOv8s-worldv2 model and train it on the COCO8 example dataset for 100 epochs
|
||||
yolo train model=yolov8s-worldv2.yaml data=coco8.yaml epochs=100 imgsz=640
|
||||
```
|
||||
|
||||
### Predict Usage
|
||||
|
||||
Object detection is straightforward with the `predict` method, as illustrated below:
|
||||
|
|
@ -196,6 +229,59 @@ You can also save a model after setting custom classes. By doing this you create
|
|||
|
||||
This approach provides a powerful means of customizing state-of-the-art object detection models for specific tasks, making advanced AI more accessible and applicable to a broader range of practical applications.
|
||||
|
||||
## Reproduce official results from scratch(Experimental)
|
||||
|
||||
### Prepare datasets
|
||||
|
||||
- Train data
|
||||
|
||||
| Dataset | Type | Samples | Boxes | Annotation Files |
|
||||
|-------------------------------------------------------------------|-----------|---------|-------|--------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| [Objects365v1](https://opendatalab.com/OpenDataLab/Objects365_v1) | Detection | 609k | 9621k | [objects365_train.json](https://opendatalab.com/OpenDataLab/Objects365_v1) |
|
||||
| [GQA](https://nlp.stanford.edu/data/gqa/images.zip) | Grounding | 621k | 3681k | [final_mixed_train_no_coco.json](https://huggingface.co/GLIPModel/GLIP/blob/main/mdetr_annotations/final_mixed_train_no_coco.json) |
|
||||
| [Flickr30k](https://shannon.cs.illinois.edu/DenotationGraph/) | Grounding | 149k | 641k | [final_flickr_separateGT_train.json](https://huggingface.co/GLIPModel/GLIP/blob/main/mdetr_annotations/final_flickr_separateGT_train.json) |
|
||||
|
||||
- Val data
|
||||
|
||||
| Dataset | Type | Annotation Files |
|
||||
|---------------------------------------------------------------------------------------------------------|-----------|--------------------------------------------------------------------------------------------------------|
|
||||
| [LVIS minival](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/lvis.yaml) | Detection | [minival.txt](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/cfg/datasets/lvis.yaml) |
|
||||
|
||||
### Launch training from scratch
|
||||
|
||||
!!! Note
|
||||
|
||||
`WorldTrainerFromScratch` is highly customized to allow training yolo-world models on both detection datasets and grounding datasets simultaneously. More details please checkout [ultralytics.model.yolo.world.train_world.py](https://github.com/ultralytics/ultralytics/blob/main/ultralytics/models/yolo/world/train_world.py).
|
||||
|
||||
!!! Example
|
||||
|
||||
=== "Python"
|
||||
|
||||
```python
|
||||
from ultralytics.models.yolo.world.train_world import WorldTrainerFromScratch
|
||||
from ultralytics import YOLOWorld
|
||||
|
||||
data = dict(
|
||||
train=dict(
|
||||
yolo_data=["Objects365.yaml"],
|
||||
grounding_data=[
|
||||
dict(
|
||||
img_path="../datasets/flickr30k/images",
|
||||
json_file="../datasets/flickr30k/final_flickr_separateGT_train.json",
|
||||
),
|
||||
dict(
|
||||
img_path="../datasets/GQA/images",
|
||||
json_file="../datasets/GQA/final_mixed_train_no_coco.json",
|
||||
),
|
||||
],
|
||||
),
|
||||
val=dict(yolo_data=["lvis.yaml"]),
|
||||
)
|
||||
model = YOLOWorld("yolov8s-worldv2.yaml")
|
||||
model.train(data=data, batch=128, epochs=100, trainer=WorldTrainerFromScratch)
|
||||
|
||||
```
|
||||
|
||||
## Citations and Acknowledgements
|
||||
|
||||
We extend our gratitude to the [Tencent AILab Computer Vision Center](https://ai.tencent.com/) for their pioneering work in real-time open-vocabulary object detection with YOLO-World:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue