March 1, 2024
May 1, 2023
Silva, Thalles; Pedrini, Helio; Ramírez Rivera, Adín.
We present Contextualized Local Visual Embeddings (CLoVE), a self-supervised convolutional-based method that learns representations suited for dense prediction tasks. CLoVE deviates from current methods and optimizes a single loss function that operates at the level of contextualized local embeddings learned from output feature maps of convolution neural network (CNN) encoders. To learn contextualized embeddings, CLoVE proposes a normalized multhead self-attention layer that combines local features from different parts of an image based on similarity. We extensively benchmark CLoVE’s pre-trained representations on multiple datasets. CLoVE reaches state-of-the-art performance for CNN-based architectures in 4 dense prediction downstream tasks, including object detection, instance segmentation, keypoint detection, and dense pose estimation.
Self-supervised Learning of Contextualized Local Visual Embeddings.
Silva, Thalles; Pedrini, Helio; Ramírez Rivera, Adín.
2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). IEEE (Institute of Electrical and Electronics Engineers) 2023
May 1, 2023
Silva, Thalles; Pedrini, Helio; Ramírez Rivera, Adín.
2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). IEEE (Institute of Electrical and Electronics Engineers) 2023
May 1, 2023