ultralytics 8.0.134 add MobileSAM support (#3474)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ayush Chaurasia <ayush.chaurarsia@gmail.com> Co-authored-by: Laughing <61612323+Laughing-q@users.noreply.github.com> Co-authored-by: Laughing-q <1185102784@qq.com> Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
2023-07-13 20:25:56 +08:00 · 2023-07-13 20:25:56 +08:00 · 201e69e4e4
commit 201e69e4e4
parent c55a98ab8e
32 changed files with 1472 additions and 841 deletions
--- a/docs/models/mobile-sam.md
+++ b/docs/models/mobile-sam.md
@ -0,0 +1,99 @@
+---
+comments: true
+description: MobileSAM is a lightweight adaptation of the Segment Anything Model (SAM) designed for mobile applications. It maintains the full functionality of the original SAM while significantly improving speed, making it suitable for CPU-only edge devices, such as mobile phones.
+keywords: MobileSAM, Faster Segment Anything, Segment Anything, Segment Anything Model, SAM, Meta SAM, image segmentation, promptable segmentation, zero-shot performance, SA-1B dataset, advanced architecture, auto-annotation, Ultralytics, pre-trained models, SAM base, SAM large, instance segmentation, computer vision, AI, artificial intelligence, machine learning, data annotation, segmentation masks, detection model, YOLO detection model, bibtex, Meta AI
+---
+
+![MobileSAM Logo](https://github.com/ChaoningZhang/MobileSAM/blob/master/assets/logo2.png?raw=true)
+
+# Faster Segment Anything (MobileSAM)
+
+The MobileSAM paper is now available on [ResearchGate](https://www.researchgate.net/publication/371851844_Faster_Segment_Anything_Towards_Lightweight_SAM_for_Mobile_Applications) and [arXiv](https://arxiv.org/pdf/2306.14289.pdf). The most recent version will initially appear on ResearchGate due to the delayed content update on arXiv.
+
+A demonstration of MobileSAM running on a CPU can be accessed at this [demo link](https://huggingface.co/spaces/dhkim2810/MobileSAM). The performance on a Mac i5 CPU takes approximately 3 seconds. On the Hugging Face demo, the interface and lower-performance CPUs contribute to a slower response, but it continues to function effectively.
+
+MobileSAM is implemented in various projects including [Grounding-SAM](https://github.com/IDEA-Research/Grounded-Segment-Anything), [AnyLabeling](https://github.com/vietanhdev/anylabeling), and [SegmentAnythingin3D](https://github.com/Jumpat/SegmentAnythingin3D).
+
+MobileSAM is trained on a single GPU with a 100k dataset (1% of the original images) in less than a day. The code for this training will be made available in the future.
+
+## Adapting from SAM to MobileSAM
+
+Since MobileSAM retains the same pipeline as the original SAM, we have incorporated the original's pre-processing, post-processing, and all other interfaces. Consequently, those currently using the original SAM can transition to MobileSAM with minimal effort.
+
+MobileSAM performs comparably to the original SAM and retains the same pipeline except for a change in the image encoder. Specifically, we replace the original heavyweight ViT-H encoder (632M) with a smaller Tiny-ViT (5M). On a single GPU, MobileSAM operates at about 12ms per image: 8ms on the image encoder and 4ms on the mask decoder.
+
+The following table provides a comparison of ViT-based image encoders:
+
+| Image Encoder | Original SAM | MobileSAM |
+|---------------|--------------|-----------|
+| Parameters    | 611M         | 5M        |
+| Speed         | 452ms        | 8ms       |
+
+Both the original SAM and MobileSAM utilize the same prompt-guided mask decoder:
+
+| Mask Decoder | Original SAM | MobileSAM |
+|--------------|--------------|-----------|
+| Parameters   | 3.876M       | 3.876M    |
+| Speed        | 4ms          | 4ms       |
+
+Here is the comparison of the whole pipeline:
+
+| Whole Pipeline (Enc+Dec) | Original SAM | MobileSAM |
+|--------------------------|--------------|-----------|
+| Parameters               | 615M         | 9.66M     |
+| Speed                    | 456ms        | 12ms      |
+
+The performance of MobileSAM and the original SAM are demonstrated using both a point and a box as prompts.
+
+![Image with Point as Prompt](https://raw.githubusercontent.com/ChaoningZhang/MobileSAM/master/assets/mask_box.jpg?raw=true)
+
+![Image with Box as Prompt](https://raw.githubusercontent.com/ChaoningZhang/MobileSAM/master/assets/mask_box.jpg?raw=true)
+
+With its superior performance, MobileSAM is approximately 5 times smaller and 7 times faster than the current FastSAM. More details are available at the [MobileSAM project page](https://github.com/ChaoningZhang/MobileSAM).
+
+## Testing MobileSAM in Ultralytics
+
+Just like the original SAM, we offer a straightforward testing method in Ultralytics, including modes for both Point and Box prompts.
+
+### Model Download
+
+You can download the model [here](https://github.com/ChaoningZhang/MobileSAM/blob/master/weights/mobile_sam.pt).
+
+### Point Prompt
+
+```python
+from ultralytics import SAM
+
+# Load the model
+model = SAM('mobile_sam.pt')
+
+# Predict a segment based on a point prompt
+model.predict('ultralytics/assets/zidane.jpg', points=[900, 370], labels=[1])
+```
+
+### Box Prompt
+
+```python
+from ultralytics import SAM
+
+# Load the model
+model = SAM('mobile_sam.pt')
+
+# Predict a segment based on a box prompt
+model.predict('ultralytics/assets/zidane.jpg', bboxes=[439, 437, 524, 709])
+```
+
+We have implemented `MobileSAM` and `SAM` using the same API. For more usage information, please see the [SAM page](./sam.md).
+
+### Citing MobileSAM
+
+If you find MobileSAM useful in your research or development work, please consider citing our paper:
+
+```bibtex
+@article{mobile_sam,
+  title={Faster Segment Anything: Towards Lightweight SAM for Mobile Applications},
+  author={Zhang, Chaoning and Han, Dongshen and Qiao, Yu and Kim, Jung Uk and Bae, Sung Ho and Lee, Seungkyu and Hong, Choong Seon},
+  journal={arXiv preprint arXiv:2306.14289},
+  year={2023}
+}
+```