Efficient SAM2 replaces the naive grid-based prompting strategy in Automatic SAM with a training-free, content-aware alternative. By clustering attention scores from the vision encoder with HDBSCAN, it generates semantically meaningful point prompts that improve segmentation quality while using up to 90% fewer decoder passes.
Tested across instance, salient object, and camouflaged object segmentation benchmarks (COCO, LVIS, SA-1B, and others), it consistently outperforms standard grid prompting with dramatically lower computational cost. Read the full write-up in the blog post.
