Retrieval-Augmented In-Context Learning for Domain Adaptation

Overview

Unsupervised domain adaptation (UDA) remains a key challenge in natural language processing. UDA aims to transfer knowledge from a labeled source domain to an unlabeled target domain to improve model generalization in new domains. With large pretrained language models, in-context learning (ICL) has emerged as an effective technique across tasks. This article summarizes a retrieval-augmented ICL approach that uses target-domain examples retrieved as context to enable knowledge transfer without target labels.

Motivation

In practical cross-domain scenarios, source-domain demonstrations are not always aligned with target-domain data. Large language models (LLMs) can generate unpredictable outputs and struggle with long-tail knowledge in unfamiliar domains. UDA seeks to adapt models using labeled source samples and unlabeled target samples to learn domain-invariant features. The retrieval-augmented ICL framework described here addresses the lack of in-domain demonstrations by retrieving semantically similar target examples to build contextual prompts for source queries.

Method

The proposed framework, Domain Adaptive In-Context Learning (DAICL), retrieves similar examples from the target domain as context for source inputs so the model learns both the target distribution and task-specific signals. For given source and target corpora, a retrieval model such as SimCSE is used to find target examples similar to each source query. The retrieved examples are concatenated with the source input to form the prompt for in-context learning. This enriches semantics and reduces superficial domain differences, allowing the model to learn task discrimination conditioned on target-domain context.

DAICL consists of the following components:

Retrieval of similar target examples: Find representative target examples for each source input using dense retrieval models like SimCSE so the model sees target-domain characteristics in context.
Context construction: Concatenate retrieved target examples with the source input to form the prompt used for in-context learning.
In-context learning objective: Optimize two loss terms: (1) a task loss on the contextualized examples to learn task-specific features and predict label y; and (2) a contextual language modeling loss to learn the target-domain distribution. Joint optimization enables knowledge transfer to the target domain.
Model-specific prompting and training: Design prompts and training strategies according to architecture. For encoder-only models, use prompt concatenation of source and retrieved target examples. For decoder-only models, use the retrieved target examples as input for autoregressive learning. For token-level tasks such as NER, add a conditional random field (CRF) layer on top of language modeling features; for classification tasks like sentiment analysis, use average pooling over encoder outputs for prediction.

Training Paradigms for Decoder Models

For decoder-only architectures the study considers both pure inference and fine-tuning paradigms. Example prompts include retrieved demonstrations from the source labeled dataset surrounding the source input. For fine-tuning, LoRA is used to adapt larger language models with fewer trainable parameters; the fine-tuning examples are formatted accordingly.

Experimental Setup

The authors evaluate DAICL on named entity recognition (NER) and sentiment analysis (SA) using multiple source-target pairs across news, social media, finance, and biomedical domains. CoNLL-03 (English news) serves as a common source dataset. Target NER datasets include FIN (finance), WNUT-16 and WNUT-17 (social media), and BC2GM, BioNLP09, BC5CDR (biomedical). For sentiment analysis, an Amazon reviews dataset covering four domains was used: books (BK), electronics (E), beauty (BT), and music (M).

The study compares DAICL with baseline UDA methods such as pseudo-labeling and adversarial training, retrieval-based LM methods like REALM and RAG, and ICL baselines. Various LLM architectures were evaluated with the proposed ICL approach. NER performance is measured by F1 score; SA is measured by accuracy.

Results

DAICL jointly optimizes task and language-modeling objectives and outperforms baselines in most adaptation scenarios. Comparing DAICL to variants that only use task supervision with target context (ICL-sup) or source-only ICL (ICL-source) shows that combining task signals and language modeling improves domain adaptation.

Fine-tuning benefits UDA. In NER experiments, ChatGPT showed low performance in many adaptation settings, while fine-tuning a smaller RoBERTa model achieved state-of-the-art scores in most cases. In SA experiments, fine-tuning LLaMA with LoRA (1.7M trainable parameters) outperformed other methods. These findings suggest that although large pretrained models have strong generalization, adaptation strategies remain useful for UDA.

The authors also compared adaptive ICL to adaptive pretraining. Adaptive ICL mixes source inputs with target context during task prediction, whereas adaptive pretraining uses only source inputs when performing downstream task supervision after pretraining on unlabeled target text. Experiments on LLaMA with LoRA indicate that adaptive ICL is superior to adaptive pretraining, possibly because decoder-only models in adaptive ICL can learn both task and demonstration-conditioned objectives.

Conclusion

Domain Adaptive In-Context Learning (DAICL) is a retrieval-augmented framework for unsupervised domain adaptation. By retrieving similar target-domain examples and jointly optimizing task and domain-adaptation losses during in-context learning, DAICL enables knowledge transfer without target labels. Experiments across multiple source-target combinations and tasks (NER and SA) show consistent improvements over several baselines.

Future work includes exploring alternative context-construction strategies to further improve domain adaptation, joint training across tasks and domains to enhance generalization, and combining in-context learning with other adaptation techniques such as adversarial training. Extending DAICL to multi-task unsupervised domain adaptation may also improve model robustness across diverse applications.