Add Docs glossary links (#16448)

Signed-off-by: UltralyticsAssistant <web@ultralytics.com>
Co-authored-by: UltralyticsAssistant <web@ultralytics.com>
This commit is contained in:
Glenn Jocher 2024-09-23 23:48:46 +02:00 committed by GitHub
parent 8b8c25f216
commit 443fbce194
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
193 changed files with 1124 additions and 1124 deletions

View file

@ -4,7 +4,7 @@ description: Learn essential data preprocessing techniques for annotated compute
keywords: data preprocessing, computer vision, image resizing, normalization, data augmentation, training dataset, validation dataset, test dataset, YOLOv8
---
# Data Preprocessing Techniques for Annotated Computer Vision Data
# Data Preprocessing Techniques for Annotated [Computer Vision](https://www.ultralytics.com/glossary/computer-vision-cv) Data
## Introduction
@ -33,7 +33,7 @@ You can resize your images using the following methods:
To make resizing a simpler task, you can use the following tools:
- **OpenCV**: A popular computer vision library with extensive functions for image processing.
- **[OpenCV](https://www.ultralytics.com/glossary/opencv)**: A popular computer vision library with extensive functions for image processing.
- **PIL (Pillow)**: A Python Imaging Library for opening, manipulating, and saving image files.
With respect to YOLOv8, the 'imgsz' parameter during [model training](../modes/train.md) allows for flexible input sizes. When set to a specific size, such as 640, the model will resize input images so their largest dimension is 640 pixels while maintaining the original aspect ratio.
@ -65,7 +65,7 @@ The most commonly discussed data preprocessing step is data augmentation. Data a
Here are some other benefits of data augmentation:
- **Creates a More Robust Dataset**: Data augmentation can make the model more robust to variations and distortions in the input data. This includes changes in lighting, orientation, and scale.
- **Cost-Effective**: Data augmentation is a cost-effective way to increase the amount of training data without collecting and labeling new data.
- **Cost-Effective**: Data augmentation is a cost-effective way to increase the amount of [training data](https://www.ultralytics.com/glossary/training-data) without collecting and labeling new data.
- **Better Use of Data**: Every available data point is used to its maximum potential by creating new variations
#### Data Augmentation Methods
@ -96,7 +96,7 @@ Here's what each step of preprocessing would look like for this project:
- Resizing Images: Since YOLOv8 handles flexible input sizes and performs resizing automatically, manual resizing is not required. The model will adjust the image size according to the specified 'imgsz' parameter during training.
- Normalizing Pixel Values: YOLOv8 automatically normalizes pixel values to a range of 0 to 1 during preprocessing, so it's not required.
- Splitting the Dataset: Divide the dataset into training (70%), validation (20%), and test (10%) sets using tools like scikit-learn.
- Data Augmentation: Modify the dataset configuration file (.yaml) to include data augmentation techniques such as random crops, horizontal flips, and brightness adjustments.
- [Data Augmentation](https://www.ultralytics.com/glossary/data-augmentation): Modify the dataset configuration file (.yaml) to include data augmentation techniques such as random crops, horizontal flips, and brightness adjustments.
These steps make sure the dataset is prepared without any potential issues and is ready for Exploratory Data Analysis (EDA).
@ -120,7 +120,7 @@ Common tools for visualizations include:
### Using Ultralytics Explorer for EDA
For a more advanced approach to EDA, you can use the Ultralytics Explorer tool. It offers robust capabilities for exploring computer vision datasets. By supporting semantic search, SQL queries, and vector similarity search, the tool makes it easy to analyze and understand your data. With Ultralytics Explorer, you can create embeddings for your dataset to find similar images, run SQL queries for detailed analysis, and perform semantic searches, all through a user-friendly graphical interface.
For a more advanced approach to EDA, you can use the Ultralytics Explorer tool. It offers robust capabilities for exploring computer vision datasets. By supporting semantic search, SQL queries, and vector similarity search, the tool makes it easy to analyze and understand your data. With Ultralytics Explorer, you can create [embeddings](https://www.ultralytics.com/glossary/embeddings) for your dataset to find similar images, run SQL queries for detailed analysis, and perform semantic searches, all through a user-friendly graphical interface.
<p align="center">
<img width="100%" src="https://github.com/ultralytics/docs/releases/download/0/ultralytics-explorer-openai-integration.avif" alt="Overview of Ultralytics Explorer">
@ -151,7 +151,7 @@ Data preprocessing is essential in computer vision projects because it ensures t
### How can I use Ultralytics YOLO for data augmentation?
For data augmentation with Ultralytics YOLOv8, you need to modify the dataset configuration file (.yaml). In this file, you can specify various augmentation techniques such as random crops, horizontal flips, and brightness adjustments. This can be effectively done using the training configurations [explained here](../modes/train.md). Data augmentation helps create a more robust dataset, reduce overfitting, and improve model generalization.
For data augmentation with Ultralytics YOLOv8, you need to modify the dataset configuration file (.yaml). In this file, you can specify various augmentation techniques such as random crops, horizontal flips, and brightness adjustments. This can be effectively done using the training configurations [explained here](../modes/train.md). Data augmentation helps create a more robust dataset, reduce [overfitting](https://www.ultralytics.com/glossary/overfitting), and improve model generalization.
### What are the best data normalization techniques for computer vision data?
@ -164,7 +164,7 @@ For YOLOv8, normalization is handled automatically, including conversion to RGB
### How should I split my annotated dataset for training?
To split your dataset, a common practice is to divide it into 70% for training, 20% for validation, and 10% for testing. It is important to maintain the data distribution of classes across these splits and avoid data leakage by performing augmentation only on the training set. Use tools like scikit-learn or TensorFlow for efficient dataset splitting. See the detailed guide on [dataset preparation](../guides/data-collection-and-annotation.md).
To split your dataset, a common practice is to divide it into 70% for training, 20% for validation, and 10% for testing. It is important to maintain the data distribution of classes across these splits and avoid data leakage by performing augmentation only on the training set. Use tools like scikit-learn or [TensorFlow](https://www.ultralytics.com/glossary/tensorflow) for efficient dataset splitting. See the detailed guide on [dataset preparation](../guides/data-collection-and-annotation.md).
### Can I handle varying image sizes in YOLOv8 without manual resizing?