Skip to main content
  1. Index/
  2. 📄 Publications/

MASS: MoErging through Adaptive Subspace Selection

Donato Crisostomi

Alessandro Zirilli

Antonio Andrea Gargiulo

Maria Sofia Bucarelli

Simone Scardapane

Fabrizio Silvestri

Iacopo Masi

Emanuele Rodolà

·1 min
Table of Contents
@inproceedings{vaswani2017attention,
  title={Attention is all you need},
  author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia},
  booktitle={Advances in neural information processing systems},
  pages={5998--6008},
  year={2017}
}

MASS is our novel, training free model MoErging method, that allows to recover up to ~98% of accuracy of finetuned models with only a two times increase in storage and computational cost.

We tested its versatitility across different domains (Vision and NLP), architetures (ViT-{B,L}-{32,16,14}, Flan-t5), and number of tasks (8-14-20 dataset) proving its increadible scalability at fixed overhead cost.

At 🤗 this page you can find all the checkpoints you need, while in the card below there is our codebase that contains detailed instructions to reproduce our experiments.

Alessandro Zirilli
Author
Alessandro Zirilli