diff --git a/docs/en/modes/train.md b/docs/en/modes/train.md index 5c97f6cf..1175d2e9 100644 --- a/docs/en/modes/train.md +++ b/docs/en/modes/train.md @@ -230,24 +230,25 @@ The training settings for YOLO models encompass various hyperparameters and conf Augmentation techniques are essential for improving the robustness and performance of YOLO models by introducing variability into the training data, helping the model generalize better to unseen data. The following table outlines the purpose and effect of each augmentation argument: -| Argument | Type | Default | Range | Description | -|----------------|---------|---------------|---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `hsv_h` | `float` | `0.015` | `0.0 - 1.0` | Adjusts the hue of the image by a fraction of the color wheel, introducing color variability. Helps the model generalize across different lighting conditions. | -| `hsv_s` | `float` | `0.7` | `0.0 - 1.0` | Alters the saturation of the image by a fraction, affecting the intensity of colors. Useful for simulating different environmental conditions. | -| `hsv_v` | `float` | `0.4` | `0.0 - 1.0` | Modifies the value (brightness) of the image by a fraction, helping the model to perform well under various lighting conditions. | -| `degrees` | `float` | `0.0` | `-180 - +180` | Rotates the image randomly within the specified degree range, improving the model's ability to recognize objects at various orientations. | -| `translate` | `float` | `0.1` | `0.0 - 1.0` | Translates the image horizontally and vertically by a fraction of the image size, aiding in learning to detect partially visible objects. | -| `scale` | `float` | `0.5` | `>=0.0` | Scales the image by a gain factor, simulating objects at different distances from the camera. | -| `shear` | `float` | `0.0` | `-180 - +180` | Shears the image by a specified degree, mimicking the effect of objects being viewed from different angles. | -| `perspective` | `float` | `0.0` | `0.0 - 0.001` | Applies a random perspective transformation to the image, enhancing the model's ability to understand objects in 3D space. | -| `flipud` | `float` | `0.0` | `0.0 - 1.0` | Flips the image upside down with the specified probability, increasing the data variability without affecting the object's characteristics. | -| `fliplr` | `float` | `0.5` | `0.0 - 1.0` | Flips the image left to right with the specified probability, useful for learning symmetrical objects and increasing dataset diversity. | -| `bgr` | `float` | `0.0` | `0.0 - 1.0` | Flips the image channels from RGB to BGR with the specified probability, useful for increasing robustness to incorrect channel ordering. | -| `mosaic` | `float` | `1.0` | `0.0 - 1.0` | Combines four training images into one, simulating different scene compositions and object interactions. Highly effective for complex scene understanding. | -| `mixup` | `float` | `0.0` | `0.0 - 1.0` | Blends two images and their labels, creating a composite image. Enhances the model's ability to generalize by introducing label noise and visual variability. | -| `copy_paste` | `float` | `0.0` | `0.0 - 1.0` | Copies objects from one image and pastes them onto another, useful for increasing object instances and learning object occlusion. | -| `auto_augment` | `str` | `randaugment` | - | Automatically applies a predefined augmentation policy (`randaugment`, `autoaugment`, `augmix`), optimizing for classification tasks by diversifying the visual features. | -| `erasing` | `float` | `0.4` | `0.0 - 1.0` | Randomly erases a portion of the image during classification training, encouraging the model to focus on less obvious features for recognition. | +| Argument | Type | Default | Range | Description | +|-----------------|---------|---------------|---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `hsv_h` | `float` | `0.015` | `0.0 - 1.0` | Adjusts the hue of the image by a fraction of the color wheel, introducing color variability. Helps the model generalize across different lighting conditions. | +| `hsv_s` | `float` | `0.7` | `0.0 - 1.0` | Alters the saturation of the image by a fraction, affecting the intensity of colors. Useful for simulating different environmental conditions. | +| `hsv_v` | `float` | `0.4` | `0.0 - 1.0` | Modifies the value (brightness) of the image by a fraction, helping the model to perform well under various lighting conditions. | +| `degrees` | `float` | `0.0` | `-180 - +180` | Rotates the image randomly within the specified degree range, improving the model's ability to recognize objects at various orientations. | +| `translate` | `float` | `0.1` | `0.0 - 1.0` | Translates the image horizontally and vertically by a fraction of the image size, aiding in learning to detect partially visible objects. | +| `scale` | `float` | `0.5` | `>=0.0` | Scales the image by a gain factor, simulating objects at different distances from the camera. | +| `shear` | `float` | `0.0` | `-180 - +180` | Shears the image by a specified degree, mimicking the effect of objects being viewed from different angles. | +| `perspective` | `float` | `0.0` | `0.0 - 0.001` | Applies a random perspective transformation to the image, enhancing the model's ability to understand objects in 3D space. | +| `flipud` | `float` | `0.0` | `0.0 - 1.0` | Flips the image upside down with the specified probability, increasing the data variability without affecting the object's characteristics. | +| `fliplr` | `float` | `0.5` | `0.0 - 1.0` | Flips the image left to right with the specified probability, useful for learning symmetrical objects and increasing dataset diversity. | +| `bgr` | `float` | `0.0` | `0.0 - 1.0` | Flips the image channels from RGB to BGR with the specified probability, useful for increasing robustness to incorrect channel ordering. | +| `mosaic` | `float` | `1.0` | `0.0 - 1.0` | Combines four training images into one, simulating different scene compositions and object interactions. Highly effective for complex scene understanding. | +| `mixup` | `float` | `0.0` | `0.0 - 1.0` | Blends two images and their labels, creating a composite image. Enhances the model's ability to generalize by introducing label noise and visual variability. | +| `copy_paste` | `float` | `0.0` | `0.0 - 1.0` | Copies objects from one image and pastes them onto another, useful for increasing object instances and learning object occlusion. | +| `auto_augment` | `str` | `randaugment` | - | Automatically applies a predefined augmentation policy (`randaugment`, `autoaugment`, `augmix`), optimizing for classification tasks by diversifying the visual features. | +| `erasing` | `float` | `0.4` | `0.0 - 0.9` | Randomly erases a portion of the image during classification training, encouraging the model to focus on less obvious features for recognition. | +| `crop_fraction` | `float` | `1.0` | `0.1 - 1.0` | Crops the classification image to a fraction of its size to emphasize central features and adapt to object scales, reducing background distractions. | These settings can be adjusted to meet the specific requirements of the dataset and task at hand. Experimenting with different values can help find the optimal augmentation strategy that leads to the best model performance. diff --git a/docs/en/usage/cfg.md b/docs/en/usage/cfg.md index 07d6db54..833932e8 100644 --- a/docs/en/usage/cfg.md +++ b/docs/en/usage/cfg.md @@ -229,24 +229,25 @@ It is crucial to thoughtfully configure these settings to ensure the exported mo Augmentation techniques are essential for improving the robustness and performance of YOLO models by introducing variability into the training data, helping the model generalize better to unseen data. The following table outlines the purpose and effect of each augmentation argument: -| Argument | Type | Default | Range | Description | -|----------------|---------|---------------|---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `hsv_h` | `float` | `0.015` | `0.0 - 1.0` | Adjusts the hue of the image by a fraction of the color wheel, introducing color variability. Helps the model generalize across different lighting conditions. | -| `hsv_s` | `float` | `0.7` | `0.0 - 1.0` | Alters the saturation of the image by a fraction, affecting the intensity of colors. Useful for simulating different environmental conditions. | -| `hsv_v` | `float` | `0.4` | `0.0 - 1.0` | Modifies the value (brightness) of the image by a fraction, helping the model to perform well under various lighting conditions. | -| `degrees` | `float` | `0.0` | `-180 - +180` | Rotates the image randomly within the specified degree range, improving the model's ability to recognize objects at various orientations. | -| `translate` | `float` | `0.1` | `0.0 - 1.0` | Translates the image horizontally and vertically by a fraction of the image size, aiding in learning to detect partially visible objects. | -| `scale` | `float` | `0.5` | `>=0.0` | Scales the image by a gain factor, simulating objects at different distances from the camera. | -| `shear` | `float` | `0.0` | `-180 - +180` | Shears the image by a specified degree, mimicking the effect of objects being viewed from different angles. | -| `perspective` | `float` | `0.0` | `0.0 - 0.001` | Applies a random perspective transformation to the image, enhancing the model's ability to understand objects in 3D space. | -| `flipud` | `float` | `0.0` | `0.0 - 1.0` | Flips the image upside down with the specified probability, increasing the data variability without affecting the object's characteristics. | -| `fliplr` | `float` | `0.5` | `0.0 - 1.0` | Flips the image left to right with the specified probability, useful for learning symmetrical objects and increasing dataset diversity. | -| `bgr` | `float` | `0.0` | `0.0 - 1.0` | Flips the image channels from RGB to BGR with the specified probability, useful for increasing robustness to incorrect channel ordering. | -| `mosaic` | `float` | `1.0` | `0.0 - 1.0` | Combines four training images into one, simulating different scene compositions and object interactions. Highly effective for complex scene understanding. | -| `mixup` | `float` | `0.0` | `0.0 - 1.0` | Blends two images and their labels, creating a composite image. Enhances the model's ability to generalize by introducing label noise and visual variability. | -| `copy_paste` | `float` | `0.0` | `0.0 - 1.0` | Copies objects from one image and pastes them onto another, useful for increasing object instances and learning object occlusion. | -| `auto_augment` | `str` | `randaugment` | - | Automatically applies a predefined augmentation policy (`randaugment`, `autoaugment`, `augmix`), optimizing for classification tasks by diversifying the visual features. | -| `erasing` | `float` | `0.4` | `0.0 - 1.0` | Randomly erases a portion of the image during classification training, encouraging the model to focus on less obvious features for recognition. | +| Argument | Type | Default | Range | Description | +|-----------------|---------|---------------|---------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `hsv_h` | `float` | `0.015` | `0.0 - 1.0` | Adjusts the hue of the image by a fraction of the color wheel, introducing color variability. Helps the model generalize across different lighting conditions. | +| `hsv_s` | `float` | `0.7` | `0.0 - 1.0` | Alters the saturation of the image by a fraction, affecting the intensity of colors. Useful for simulating different environmental conditions. | +| `hsv_v` | `float` | `0.4` | `0.0 - 1.0` | Modifies the value (brightness) of the image by a fraction, helping the model to perform well under various lighting conditions. | +| `degrees` | `float` | `0.0` | `-180 - +180` | Rotates the image randomly within the specified degree range, improving the model's ability to recognize objects at various orientations. | +| `translate` | `float` | `0.1` | `0.0 - 1.0` | Translates the image horizontally and vertically by a fraction of the image size, aiding in learning to detect partially visible objects. | +| `scale` | `float` | `0.5` | `>=0.0` | Scales the image by a gain factor, simulating objects at different distances from the camera. | +| `shear` | `float` | `0.0` | `-180 - +180` | Shears the image by a specified degree, mimicking the effect of objects being viewed from different angles. | +| `perspective` | `float` | `0.0` | `0.0 - 0.001` | Applies a random perspective transformation to the image, enhancing the model's ability to understand objects in 3D space. | +| `flipud` | `float` | `0.0` | `0.0 - 1.0` | Flips the image upside down with the specified probability, increasing the data variability without affecting the object's characteristics. | +| `fliplr` | `float` | `0.5` | `0.0 - 1.0` | Flips the image left to right with the specified probability, useful for learning symmetrical objects and increasing dataset diversity. | +| `bgr` | `float` | `0.0` | `0.0 - 1.0` | Flips the image channels from RGB to BGR with the specified probability, useful for increasing robustness to incorrect channel ordering. | +| `mosaic` | `float` | `1.0` | `0.0 - 1.0` | Combines four training images into one, simulating different scene compositions and object interactions. Highly effective for complex scene understanding. | +| `mixup` | `float` | `0.0` | `0.0 - 1.0` | Blends two images and their labels, creating a composite image. Enhances the model's ability to generalize by introducing label noise and visual variability. | +| `copy_paste` | `float` | `0.0` | `0.0 - 1.0` | Copies objects from one image and pastes them onto another, useful for increasing object instances and learning object occlusion. | +| `auto_augment` | `str` | `randaugment` | - | Automatically applies a predefined augmentation policy (`randaugment`, `autoaugment`, `augmix`), optimizing for classification tasks by diversifying the visual features. | +| `erasing` | `float` | `0.4` | `0.0 - 0.9` | Randomly erases a portion of the image during classification training, encouraging the model to focus on less obvious features for recognition. | +| `crop_fraction` | `float` | `1.0` | `0.1 - 1.0` | Crops the classification image to a fraction of its size to emphasize central features and adapt to object scales, reducing background distractions. | These settings can be adjusted to meet the specific requirements of the dataset and task at hand. Experimenting with different values can help find the optimal augmentation strategy that leads to the best model performance. diff --git a/ultralytics/cfg/default.yaml b/ultralytics/cfg/default.yaml index 2e165845..78b5f43e 100644 --- a/ultralytics/cfg/default.yaml +++ b/ultralytics/cfg/default.yaml @@ -116,8 +116,8 @@ mosaic: 1.0 # (float) image mosaic (probability) mixup: 0.0 # (float) image mixup (probability) copy_paste: 0.0 # (float) segment copy-paste (probability) auto_augment: randaugment # (str) auto augmentation policy for classification (randaugment, autoaugment, augmix) -erasing: 0.4 # (float) probability of random erasing during classification training (0-1) -crop_fraction: 1.0 # (float) image crop fraction for classification evaluation/inference (0-1) +erasing: 0.4 # (float) probability of random erasing during classification training (0-0.9), 0 means no erasing, must be less than 1.0. +crop_fraction: 1.0 # (float) image crop fraction for classification (0.1-1), 1.0 means no crop, must be greater than 0. # Custom config.yaml --------------------------------------------------------------------------------------------------- cfg: # (str, optional) for overriding defaults.yaml diff --git a/ultralytics/data/augment.py b/ultralytics/data/augment.py index aab3e626..9d714165 100644 --- a/ultralytics/data/augment.py +++ b/ultralytics/data/augment.py @@ -1056,7 +1056,7 @@ def classify_transforms( return T.Compose(tfl) -# Classification augmentations train --------------------------------------------------------------------------------------- +# Classification training augmentations -------------------------------------------------------------------------------- def classify_augmentations( size=224, mean=DEFAULT_MEAN,