Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure
ð¥ Hanseul Cho*, Jaeyoung Cha*, Pranjal Awasthi, Srinadh Bhojanapalli, Anupam Gupta, and Chulhee Yun
ð° NeurIPS 2024 (Short version at ICML 2024 Workshop on Long-Context Foundation Models (LCFM)) ð ð [paper] [arxiv] [code]