ArXiv Preprint
Active learning promises to improve annotation efficiency by iteratively
selecting the most important data to be annotated first. However, we uncover a
striking contradiction to this promise: active learning fails to select data as
efficiently as random selection at the first few choices. We identify this as
the cold start problem in vision active learning, caused by a biased and
outlier initial query. This paper seeks to address the cold start problem by
exploiting the three advantages of contrastive learning: (1) no annotation is
required; (2) label diversity is ensured by pseudo-labels to mitigate bias; (3)
typical data is determined by contrastive features to reduce outliers.
Experiments are conducted on CIFAR-10-LT and three medical imaging datasets
(i.e. Colon Pathology, Abdominal CT, and Blood Cell Microscope). Our initial
query not only significantly outperforms existing active querying strategies
but also surpasses random selection by a large margin. We foresee our solution
to the cold start problem as a simple yet strong baseline to choose the initial
query for vision active learning. Code is available:
https://github.com/c-liangyu/CSVAL
Liangyu Chen, Yutong Bai, Siyu Huang, Yongyi Lu, Bihan Wen, Alan L. Yuille, Zongwei Zhou
2022-10-05