Controlling out-of-domain gaps in LLMs for genre classification and generated text detection
Roussinov, Dmitri and Sharoff, Serge and Puchnina, Nadezhda; Rambow, Owen and Wanner, Leo and Apidianaki, Marianna and Al-Khalifa, Hend and Di Eugenio, Barbara and Schockaert, Steven, eds. (2025) Controlling out-of-domain gaps in LLMs for genre classification and generated text detection. In: Proceedings of the 31st International Conference on Computational Linguistics. Association for Computational Linguistics, Kerrville, TX, pp. 3329-3344. ISBN 9798891761964
Preview |
Text.
Filename: Roussinov-etal-COLING-2025-Controlling-out-of-domain-gaps-in-LLMs.pdf
Final Published Version License: ![]() Download (466kB)| Preview |
Abstract
This study demonstrates that the modern generation of Large Language Models (LLMs, such as GPT-4) suffers from the same out-of-domain (OOD) performance gap observed in prior research on pre-trained Language Models (PLMs, such as BERT). We demonstrate this across two non-topical classification tasks: (1) genre classification and (2) generated text detection. Our results show that when demonstration examples for In-Context Learning (ICL) come from one domain (e.g., travel) and the system is tested on another domain (e.g., history), classification performance declines significantly. To address this, we introduce a method that controls which predictive indicators are used and which are excluded during classification. For the two tasks studied here, this ensures that topical features are omitted, while the model is guided to focus on stylistic rather than content-based attributes. This approach reduces the OOD gap by up to 20 percentage points in a few-shot setup. Straightforward Chain-of-Thought (CoT) methods, used as the baseline, prove insufficient, while our approach consistently enhances domain transfer performance.
ORCID iDs
Roussinov, Dmitri
-
-
Item type: Book Section ID code: 92546 Dates: DateEvent24 January 2025PublishedSubjects: Science > Mathematics > Electronic computers. Computer science
Language and Literature > Philology. LinguisticsDepartment: Faculty of Science > Computer and Information Sciences Depositing user: Pure Administrator Date deposited: 07 Apr 2025 13:54 Last modified: 14 Apr 2025 08:57 Related URLs: URI: https://strathprints.strath.ac.uk/id/eprint/92546