Publications

Side Adapter Network for Open-Vocabulary Semantic Segmentation

Published in CVPR, 2023

The paper introduces a new framework called Side Adapter Network (SAN) for open-vocabulary semantic segmentation, which utilizes a pre-trained vision-language model, CLIP. SAN incorporates a side network to predict mask proposals and attention bias, resulting in improved recognition of mask classes. The approach achieves high accuracy and inference speed with minimal additional trainable parameters, outperforming other methods in various benchmarks.

A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model

Published in ECCV, 2022

This paper introduces a two-stage open-vocabulary semantic segmentation framework that combines an off-the-shelf vision-language model, CLIP, with mask proposals to achieve superior open-vocabulary semantic segmentation performance. The framework outperforms previous methods on zero-shot semantic segmentation tasks and serves as a strong baseline for future research.

Bootstrap Your Object Detector via Mixed Training

Published in NeurIPS, 2021

MixTraining is a novel training approach for object detection that enhances data augmentation by combining different strengths and excluding potentially harmful augmentations. It also addresses localization noise and missing labels through the use of pseudo boxes. This method consistently improves the performance of various detectors on the COCO dataset, achieving significant accuracy gains for models like Faster R-CNN and Cascade-RCNN.

End-to-End Semi-Supervised Object Detection with Soft Teacher

Published in ICCV, 2021

This paper introduces an end-to-end semi-supervised object detection approach that surpasses previous methods by a significant margin on the COCO benchmark, achieving superior performance with labeling ratios of 1%, 5%, and 10%. By leveraging unlabeled data with all labeled data, the proposed approach enhances a strong Faster RCNN by +3.6 mAP, reaching 44.5 mAP, and improves the accuracy of a state-of-the-art Swin Transformer based object detector by +1.5 mAP, reaching 60.4 mAP. When combined with the Object365 pre-trained model, it achieves a new state-of-the-art detection accuracy of 61.3 mAP and instance segmentation accuracy of 53.0 mAP.

Asymmetric Non-local Neural Networks for Semantic Segmentation

Published in ICCV, 2019

This paper introduces the Asymmetric Non-local Neural Network (ANN) for semantic segmentation, addressing the computational and memory challenges of non-local modules. It includes two key components: the Asymmetric Pyramid Non-local Block (APNB) and the Asymmetric Fusion Non-local Block (AFNB). Experimental results demonstrate the effectiveness and efficiency of ANN, achieving state-of-the-art performance with an mIoU of 81.3 on the Cityscapes test set, while being significantly faster and occupying less GPU memory compared to traditional non-local blocks.