Publications

SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos

Published in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

What’s a good way to select positives and negatives for self-supervised contrastive learning of video representations? In our paper, we analyze the idea of alternating between clustering and contrastive learning of video representations, using pseudolabels from clustering to improve the selection of positive and negative video examples for contrastive learning. This leads to significantly better retrieval of human action videos and similar accuracy on downstream classification tasks compared to existing self-supervised video representation learning methods.

[Project Page] [Arxiv] [Video] [Poster] [Code]