Side Adapter Network for Open-Vocabulary Semantic Segmentation
Published in CVPR, 2023
The paper introduces a new framework called Side Adapter Network (SAN) for open-vocabulary semantic segmentation, which utilizes a pre-trained vision-language model, CLIP. SAN incorporates a side network to predict mask proposals and attention bias, resulting in improved recognition of mask classes. The approach achieves high accuracy and inference speed with minimal additional trainable parameters, outperforming other methods in various benchmarks.