[Text Machine Lab] BERT Busters: Outlier Dimensions that Disrupt Transformers
This is a post I wrote during my time in Text Machine Lab: https://text-machine-lab.github.io/blog/2021/busters/. It reports on one of the first studies on the phenomenon of outlier dimensions in Transformer-based language models:
Olga Kovaleva, Saurabh Kulshreshtha, Anna Rogers, and Anna Rumshisky. 2021. BERT Busters: Outlier Dimensions that Disrupt Transformers. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3392–3405, Online. Association for Computational Linguistics. https://aclanthology.org/2021.findings-acl.300/
See also our follow-up 2022 study that discovered that these dimensions are causally related to the high-frequency tokens in the pre-training data!
Giovanni Puccetti, Anna Rogers, Aleksandr Drozd, and Felice Dell’Orletta. 2022. Outlier Dimensions that Disrupt Transformers are Driven by Frequency. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1286–1304, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics. https://aclanthology.org/2022.findings-emnlp.93/