The performance of transferability metrics does not translate to medical tasks

Levy Chaves, Alceu Bissoto, Eduardo Valle, Sandra Avila

Abstract

Transfer learning boosts the performance of medical image analysis by enabling deep learning (DL) on small datasets through the knowledge acquired from large ones. As the number of DL architectures explodes, exhaustively attempting all candidates becomes unfeasible, motivating cheaper alternatives for choosing them. Transferability scoring methods emerge as an enticing solution, allowing to efficiently calculate a score that correlates with the architectural accuracy on any target dataset. However, since transferability scores have not been evaluated on medical datasets, their use in this context remains uncertain, preventing them from benefiting practitioners. We fill that gap in this work, thoroughly evaluating seven transferability scores in three medical applications, including out-of-distribution scenarios. Despite promising results in general-purpose datasets, our results show that no transferability score can reliably and consistently estimate target performance in medical contexts, inviting further work in~that~direction.

Type

Conference

Publication

In: Domain Adaptation and Representation Transfer (DART) at MICCAI'23

Date:

July, 2023

Links

PDF -- DOI