How Do Position Encodings Affect Length Generalization? Case Studies On In-Context Function Learning

  • Di Nan Lin
  • , Yao Jui-Feng
  • , Kun Da Wu
  • , Hao Xu
  • , Chen Hsi Huang
  • , Hung Yu Kao

Research output: Contribution to journalConference articlepeer-review

Abstract

The capability of In-Context Learning (ICL) is crucial for large language models to generalize across a wide range of tasks. By utilizing prompts, these models can accurately predict outcomes for previously unseen tasks without necessitating retraining. However, this generalization ability does not extend to the length of the inputs; the effectiveness of ICL likely diminishes with excessively long inputs, resulting in errors in the generated text. To investigate this issue, we propose a study using a dataset of In-Context functions to understand the operational mechanisms of Transformer models in ICL and length generalization. We generated data using regression and Boolean functions and employed meta-learning techniques to endow the model with ICL capabilities. Our experimental results indicate that position encodings (PEs) can significantly mitigate length generalization issues, with the most effective encoding extending the maximum input length to over eight times that of the original training length. However, further analysis revealed that while PE enhances length generalization, it compromises the model's inherent capabilities, such as its ability to generalize across different data types. Overall, our research illustrates that PEs have a pronounced positive effect on length generalization, though it necessitates a careful trade-off with data generalization performance.

Original languageEnglish
Pages (from-to)24576-24584
Number of pages9
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume39
Issue number23
DOIs
Publication statusPublished - 2025 Apr 11
Event39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States
Duration: 2025 Feb 252025 Mar 4

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'How Do Position Encodings Affect Length Generalization? Case Studies On In-Context Function Learning'. Together they form a unique fingerprint.

Cite this