Add Docs glossary links (#16448)
Signed-off-by: UltralyticsAssistant <web@ultralytics.com> Co-authored-by: UltralyticsAssistant <web@ultralytics.com>
This commit is contained in:
parent
8b8c25f216
commit
443fbce194
193 changed files with 1124 additions and 1124 deletions
|
|
@ -4,7 +4,7 @@ description: Learn essential tips, insights, and best practices for deploying co
|
|||
keywords: Model Deployment, Machine Learning Model Deployment, ML Model Deployment, AI Model Deployment, How to Deploy a Machine Learning Model, How to Deploy ML Models
|
||||
---
|
||||
|
||||
# Best Practices for Model Deployment
|
||||
# Best Practices for [Model Deployment](https://www.ultralytics.com/glossary/model-deployment)
|
||||
|
||||
## Introduction
|
||||
|
||||
|
|
@ -31,13 +31,13 @@ With respect to YOLOv8, you can [export your model](../modes/export.md) to diffe
|
|||
|
||||
### Choosing a Deployment Environment
|
||||
|
||||
Choosing where to deploy your computer vision model depends on multiple factors. Different environments have unique benefits and challenges, so it's essential to pick the one that best fits your needs.
|
||||
Choosing where to deploy your [computer vision](https://www.ultralytics.com/glossary/computer-vision-cv) model depends on multiple factors. Different environments have unique benefits and challenges, so it's essential to pick the one that best fits your needs.
|
||||
|
||||
#### Cloud Deployment
|
||||
|
||||
Cloud deployment is great for applications that need to scale up quickly and handle large amounts of data. Platforms like AWS, [Google Cloud](../yolov5/environments/google_cloud_quickstart_tutorial.md), and Azure make it easy to manage your models from training to deployment. They offer services like [AWS SageMaker](../integrations/amazon-sagemaker.md), Google AI Platform, and [Azure Machine Learning](./azureml-quickstart.md) to help you throughout the process.
|
||||
|
||||
However, using the cloud can be expensive, especially with high data usage, and you might face latency issues if your users are far from the data centers. To manage costs and performance, it's important to optimize resource use and ensure compliance with data privacy rules.
|
||||
However, using the cloud can be expensive, especially with high data usage, and you might face latency issues if your users are far from the data centers. To manage costs and performance, it's important to optimize resource use and ensure compliance with [data privacy](https://www.ultralytics.com/glossary/data-privacy) rules.
|
||||
|
||||
#### Edge Deployment
|
||||
|
||||
|
|
@ -65,7 +65,7 @@ Pruning reduces the size of the model by removing weights that contribute little
|
|||
|
||||
### Model Quantization
|
||||
|
||||
Quantization converts the model's weights and activations from high precision (like 32-bit floats) to lower precision (like 8-bit integers). By reducing the model size, it speeds up inference. Quantization-aware training (QAT) is a method where the model is trained with quantization in mind, preserving accuracy better than post-training quantization. By handling quantization during the training phase, the model learns to adjust to lower precision, maintaining performance while reducing computational demands.
|
||||
Quantization converts the model's weights and activations from high [precision](https://www.ultralytics.com/glossary/precision) (like 32-bit floats) to lower precision (like 8-bit integers). By reducing the model size, it speeds up inference. Quantization-aware training (QAT) is a method where the model is trained with quantization in mind, preserving accuracy better than post-training quantization. By handling quantization during the training phase, the model learns to adjust to lower precision, maintaining performance while reducing computational demands.
|
||||
|
||||
<p align="center">
|
||||
<img width="100%" src="https://miro.medium.com/v2/resize:fit:1032/format:webp/1*Jlq_cyLvRdmp_K5jCd3LkA.png" alt="Model Quantization Overview">
|
||||
|
|
@ -73,7 +73,7 @@ Quantization converts the model's weights and activations from high precision (l
|
|||
|
||||
### Knowledge Distillation
|
||||
|
||||
Knowledge distillation involves training a smaller, simpler model (the student) to mimic the outputs of a larger, more complex model (the teacher). The student model learns to approximate the teacher's predictions, resulting in a compact model that retains much of the teacher's accuracy. This technique is beneficial for creating efficient models suitable for deployment on edge devices with constrained resources.
|
||||
Knowledge distillation involves training a smaller, simpler model (the student) to mimic the outputs of a larger, more complex model (the teacher). The student model learns to approximate the teacher's predictions, resulting in a compact model that retains much of the teacher's [accuracy](https://www.ultralytics.com/glossary/accuracy). This technique is beneficial for creating efficient models suitable for deployment on edge devices with constrained resources.
|
||||
|
||||
<p align="center">
|
||||
<img width="100%" src="https://github.com/ultralytics/docs/releases/download/0/knowledge-distillation-overview.avif" alt="Knowledge Distillation Overview">
|
||||
|
|
@ -98,7 +98,7 @@ When deploying YOLOv8, several factors can affect model accuracy. Converting mod
|
|||
|
||||
### Inferences Are Taking Longer Than You Expected
|
||||
|
||||
When deploying machine learning models, it's important that they run efficiently. If inferences are taking longer than expected, it can affect the user experience and the effectiveness of your application. Here are some steps to help you identify and resolve the problem:
|
||||
When deploying [machine learning](https://www.ultralytics.com/glossary/machine-learning-ml) models, it's important that they run efficiently. If inferences are taking longer than expected, it can affect the user experience and the effectiveness of your application. Here are some steps to help you identify and resolve the problem:
|
||||
|
||||
- **Implement Warm-Up Runs**: Initial runs often include setup overhead, which can skew latency measurements. Perform a few warm-up inferences before measuring latency. Excluding these initial runs provides a more accurate measurement of the model's performance.
|
||||
- **Optimize the Inference Engine:** Double-check that the inference engine is fully optimized for your specific GPU architecture. Use the latest drivers and software versions tailored to your hardware to ensure maximum performance and compatibility.
|
||||
|
|
@ -124,7 +124,7 @@ It's essential to control who can access your model and its data to prevent unau
|
|||
|
||||
### Model Obfuscation
|
||||
|
||||
Protecting your model from being reverse-engineered or misuse can be done through model obfuscation. It involves encrypting model parameters, such as weights and biases in neural networks, to make it difficult for unauthorized individuals to understand or alter the model. You can also obfuscate the model's architecture by renaming layers and parameters or adding dummy layers, making it harder for attackers to reverse-engineer it. You can also serve the model in a secure environment, like a secure enclave or using a trusted execution environment (TEE), can provide an extra layer of protection during inference.
|
||||
Protecting your model from being reverse-engineered or misuse can be done through model obfuscation. It involves encrypting model parameters, such as weights and biases in [neural networks](https://www.ultralytics.com/glossary/neural-network-nn), to make it difficult for unauthorized individuals to understand or alter the model. You can also obfuscate the model's architecture by renaming layers and parameters or adding dummy layers, making it harder for attackers to reverse-engineer it. You can also serve the model in a secure environment, like a secure enclave or using a trusted execution environment (TEE), can provide an extra layer of protection during inference.
|
||||
|
||||
## Share Ideas With Your Peers
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue