TY - JOUR
T1 - How Do Position Encodings Affect Length Generalization? Case Studies On In-Context Function Learning
AU - Lin, Di Nan
AU - Jui-Feng, Yao
AU - Wu, Kun Da
AU - Xu, Hao
AU - Huang, Chen Hsi
AU - Kao, Hung Yu
N1 - Publisher Copyright:
Copyright © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2025/4/11
Y1 - 2025/4/11
N2 - The capability of In-Context Learning (ICL) is crucial for large language models to generalize across a wide range of tasks. By utilizing prompts, these models can accurately predict outcomes for previously unseen tasks without necessitating retraining. However, this generalization ability does not extend to the length of the inputs; the effectiveness of ICL likely diminishes with excessively long inputs, resulting in errors in the generated text. To investigate this issue, we propose a study using a dataset of In-Context functions to understand the operational mechanisms of Transformer models in ICL and length generalization. We generated data using regression and Boolean functions and employed meta-learning techniques to endow the model with ICL capabilities. Our experimental results indicate that position encodings (PEs) can significantly mitigate length generalization issues, with the most effective encoding extending the maximum input length to over eight times that of the original training length. However, further analysis revealed that while PE enhances length generalization, it compromises the model's inherent capabilities, such as its ability to generalize across different data types. Overall, our research illustrates that PEs have a pronounced positive effect on length generalization, though it necessitates a careful trade-off with data generalization performance.
AB - The capability of In-Context Learning (ICL) is crucial for large language models to generalize across a wide range of tasks. By utilizing prompts, these models can accurately predict outcomes for previously unseen tasks without necessitating retraining. However, this generalization ability does not extend to the length of the inputs; the effectiveness of ICL likely diminishes with excessively long inputs, resulting in errors in the generated text. To investigate this issue, we propose a study using a dataset of In-Context functions to understand the operational mechanisms of Transformer models in ICL and length generalization. We generated data using regression and Boolean functions and employed meta-learning techniques to endow the model with ICL capabilities. Our experimental results indicate that position encodings (PEs) can significantly mitigate length generalization issues, with the most effective encoding extending the maximum input length to over eight times that of the original training length. However, further analysis revealed that while PE enhances length generalization, it compromises the model's inherent capabilities, such as its ability to generalize across different data types. Overall, our research illustrates that PEs have a pronounced positive effect on length generalization, though it necessitates a careful trade-off with data generalization performance.
UR - https://www.scopus.com/pages/publications/105004169362
UR - https://www.scopus.com/pages/publications/105004169362#tab=citedBy
U2 - 10.1609/aaai.v39i23.34637
DO - 10.1609/aaai.v39i23.34637
M3 - Conference article
AN - SCOPUS:105004169362
SN - 2159-5399
VL - 39
SP - 24576
EP - 24584
JO - Proceedings of the AAAI Conference on Artificial Intelligence
JF - Proceedings of the AAAI Conference on Artificial Intelligence
IS - 23
T2 - 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025
Y2 - 25 February 2025 through 4 March 2025
ER -