Generating textual explanations for scheduling systems leveraging the reasoning capabilities of large language models

Powell, Cheyenne and Riccardi, Annalisa (2025) Generating textual explanations for scheduling systems leveraging the reasoning capabilities of large language models. Journal of Intelligent Information Systems. ISSN 0925-9902 (https://doi.org/10.1007/s10844-025-00940-w)

[thumbnail of Powell-Riccardi-JIIS-2025-Generating-textual-explanations-for-scheduling-systems]
Preview
Text. Filename: Powell-Riccardi-JIIS-2025-Generating-textual-explanations-for-scheduling-systems.pdf
Final Published Version
License: Creative Commons Attribution 4.0 logo

Download (4MB)| Preview

Abstract

Scheduling systems are critical for planning projects, resources, and activities across many industries to achieve goals efficiently. As scheduling requirements grow in complexity, the use of Artificial Intelligence (AI) solutions has received more attention. However, providing comprehensible explanations of these decision-making processes remains a challenge and blocker to adoption. The emergent field of eXplainable Artificial Intelligence (XAI) aims to address this by establishing human-centric interpretation of influencing factors for machine decisions. The leading field of autonomous interpretation in Natural Language Processing (NLP) is Large Language Model (LLM)s, for their generalist knowledge and reasoning capabilities. To explore LLMs’ potential to generate explanations for scheduling queries, we selected a benchmark set of Job Shop scheduling problems. A novel framework that integrates the selected language models, GPT-4 and Large Language Model Meta AI (LLaMA), into scheduling systems is introduced, facilitating human-like explanations to queries from different categories through few-shot learning. The explanations were analysed for accuracy, consistency, completeness, conciseness, and language across different scheduling problem sizes and complexities. The approach achieved an overall accuracy of 59% with GPT-4 and 35% with LLaMA, with minimal impact from the varied schedule sizes observed, proving the approach can handle different datasets and is performance scalable. Several responses demonstrated high comprehension of complex queries; however, response quality fluctuated due to the few-shot learning approach. This study establishes a baseline for measuring generalist LLM capabilities in handling explanations for autonomous scheduling systems, with promising results for an LLM providing XAI interactions to explain scheduling decisions.

ORCID iDs

Powell, Cheyenne and Riccardi, Annalisa ORCID logoORCID: https://orcid.org/0000-0001-5305-9450;