Learning from limited training data

Background

Deep learning methods have the strength of steadily improving performance with more training data. In the real world, the availability of suitable training data will often be limited, and annotation of complex image data requires domain experts and is both costly and time consuming. To succeed in our innovation areas there is an absolute need to research new methodology to learn from limited and complex training data.

Challenges

For real-life applications with complex images, training data will often be limited in the sense that annotations (labels) will often be sparse, even if the amount of acquired data may be vast. Annotations may also be incomplete or inconsistent (noisy) and they are generally made for other purposes than training machine learning algorithms, and thus may be less suited for that purpose. Moreover, the characteristics of complex image data are often very different from the standard images, making the current transfer learning go-to solution, based on pre-trained ImageNet models, infeasible because the image data of interest is statistically out-of-distribution with respect to the base model.

Main objective

To develop new deep learning methods to solve complex problems from limited training data.

Highlighted publications

Understanding Deep Learning via Generalization and Optimzation Analysis for Accenerated SGD
November 15, 2024
We provide a theoretical understanding on the generalization error of momentum-based accelerated variants of stochastic gradient descent.
Visual Data Diagnosis and Debiasing with Concept Graphs
October 17, 2024
We propose ConBias, a bias diagnosis and debiasing pipeline for visual datasets.
Reinventing Self-Supervised Learning: The Magic of Memory in AI Training
October 17, 2024
MaSSL is a novel approach to self-supervised learning that enhances training stability and efficiency.