@inproceedings{vaswani2017attention,
title={Attention is all you need},
author={Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N and Kaiser, {\L}ukasz and Polosukhin, Illia},
booktitle={Advances in neural information processing systems},
pages={5998--6008},
year={2017}
}
MASS is our novel, training free model MoErging method, that allows to recover up to ~98% of accuracy of finetuned models with only a two times increase in storage and computational cost.
We tested its versatitility across different domains (Vision and NLP), architetures (ViT-{B,L}-{32,16,14}, Flan-t5), and number of tasks (8-14-20 dataset) proving its increadible scalability at fixed overhead cost.
At 🤗 this page you can find all the checkpoints you need, while in the card below there is our codebase that contains detailed instructions to reproduce our experiments.