Image:
Inger Solheim/ Torger Grytå/ Jonatan Ottesen

VI Seminar #65 - Modular Superpixel Tokenization in Vision Transformers

The program will be available shortly. Please check back later.

Modular Superpixel Tokenization in Vision Transformers

Marius Aasan, PhD Candidate, University of Oslo

Abstract:

The tokenization process in Transformers serve as a central preprocessing step, forming a set of discrete minimum units of granularity for the modelling process which dictates how a model is able to process data. In this talk, we discuss the impact of modular tokenisation in Vision Transformer models and how a model can be retrofitted with new tokenizers without significant loss of performance. Aditionally, we present hierarchical model selection processes, new positional embedding mechanisms, and how these can help improve scaling in existing models.

In compliance with GDPR consent requirements, presentations given in a Visual Intelligence context may be recorded with the consent of the speaker. All recordings are edited to remove all faces, names and voices of other participants. Questions and comments by the audience will hence be removed and will not appear in the recording.  With the freely given consent from the speaker, recorded presentation may be posted on the Visual Intelligence YouTube channel.

This seminar is open for members of the consortium. If you want to participate as a guest please sign up.

Sign up here