TY - GEN
T1 - Breaking Boundaries in Retrieval Systems
T2 - 2023 Findings of the Association for Computational Linguistics: EMNLP 2023
AU - Chen, Che Wei
AU - Yang, Ching Wen
AU - Lin, Chun Yi
AU - Kao, Hung Yu
N1 - Publisher Copyright:
© 2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - Dense retrieval models have exhibited remarkable effectiveness, but they rely on abundant labeled data and face challenges when applied to different domains. Previous domain adaptation methods have employed generative models to generate pseudo queries, creating pseudo datasets to enhance the performance of dense retrieval models. However, these approaches typically use unadapted rerank models, leading to potentially imprecise labels. In this paper, we demonstrate the significance of adapting the rerank model to the target domain prior to utilizing it for label generation. This adaptation process enables us to obtain more accurate labels, thereby improving the overall performance of the dense retrieval model. Additionally, by combining the adapted retrieval model with the adapted rerank model, we achieve significantly better domain adaptation results across three retrieval datasets. We release our code for future research.
AB - Dense retrieval models have exhibited remarkable effectiveness, but they rely on abundant labeled data and face challenges when applied to different domains. Previous domain adaptation methods have employed generative models to generate pseudo queries, creating pseudo datasets to enhance the performance of dense retrieval models. However, these approaches typically use unadapted rerank models, leading to potentially imprecise labels. In this paper, we demonstrate the significance of adapting the rerank model to the target domain prior to utilizing it for label generation. This adaptation process enables us to obtain more accurate labels, thereby improving the overall performance of the dense retrieval model. Additionally, by combining the adapted retrieval model with the adapted rerank model, we achieve significantly better domain adaptation results across three retrieval datasets. We release our code for future research.
UR - https://www.scopus.com/pages/publications/85183308089
UR - https://www.scopus.com/pages/publications/85183308089#tab=citedBy
U2 - 10.18653/v1/2023.findings-emnlp.110
DO - 10.18653/v1/2023.findings-emnlp.110
M3 - Conference contribution
AN - SCOPUS:85183308089
T3 - Findings of the Association for Computational Linguistics: EMNLP 2023
SP - 1630
EP - 1642
BT - Findings of the Association for Computational Linguistics
PB - Association for Computational Linguistics (ACL)
Y2 - 6 December 2023 through 10 December 2023
ER -