SpaceLDA : topic distributions aggregation from a heterogeneous corpus for space systems

Berquand, Audrey and Moshfeghi, Yashar and Riccardi, Annalisa (2021) SpaceLDA : topic distributions aggregation from a heterogeneous corpus for space systems. Engineering Applications of Artificial Intelligence, 102. 104273. ISSN 0952-1976 (

[thumbnail of Berquand-etal-EAAI-2021-SpaceLDA-topic-distributions-aggregation]
Text. Filename: Berquand_etal_EAAI_2021_SpaceLDA_topic_distributions_aggregation.pdf
Accepted Author Manuscript
License: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 logo

Download (772kB)| Preview


The design of highly complex systems such as spacecraft entails large amounts of documentation. Tracking relevant information, including hundreds of requirements, throughout several design stages is a challenge. In this study, we propose a novel strategy based on Topic Modelling to facilitate the management of spacecraft design requirements. We introduce spaceLDA, a novel domain-specific semi-supervised Latent Dirichlet Allocation (LDA) model enriched with lexical priors and an optimised Weighted Sum (WS). We collect and curate the first large collection of unstructured data related to space systems, combining several sources: Wikipedia pages, books, and feasibility reports provided by the European Space Agency (ESA). We train the spaceLDA model on three subsets of our heterogeneous training corpus. To combine the resulting per-document topic distributions, we enrich our model with an aggregation method based on an optimised WS. We evaluate our model through a case study, a categorisation of spacecraft design requirements. We finally compare our model’s performance with an unsupervised LDA model and with a literature aggregation method. The results demonstrate that the spaceLDA model successfully identifies the topics of requirements and that our proposed approach surpasses the use of a classic LDA model and the state of the art aggregation method.