Space mission design ontology : extraction of domain-specific entities and concepts similarity analysis

Berquand, Audrey and Moshfeghi, Yashar and Riccardi, Annalisa; (2020) Space mission design ontology : extraction of domain-specific entities and concepts similarity analysis. In: AIAA Scitech 2020 Forum. American Institute of Aeronautics and Astronautics Inc, AIAA, USA, pp. 1-13. ISBN 9781624105951 (https://doi.org/10.2514/6.2020-2253)

[thumbnail of Berquand-etal-SciTech2020-Space-mission-design-ontology-extraction-of-domain-specific-entities-and-concepts]
Preview
Text. Filename: Berquand_etal_SciTech2020_Space_mission_design_ontology_extraction_of_domain_specific_entities_and_concepts.pdf
Accepted Author Manuscript

Download (477kB)| Preview

Abstract

Expert Systems, computer programs able to capture human expertise and mimic experts’ reasoning, can support the design of future space missions by assimilating and facilitating access to accumulated knowledge. To organise these data, the virtual assistant needs to understand the concepts characterising space systems engineering. In other words, it needs an ontology of space systems. Unfortunately, there is currently no official European space systems ontology. Developing an ontology is a lengthy and tedious process, involving several human domain experts, and therefore prone to human error and subjectivity. Could the foundations of an ontology be instead semi-automatically extracted from unstructured data related to space systems engineering? This paper presents an implementation of the first layers of the Ontology Learning Layer Cake, an approach to semi-automatically generate an ontology. Candidate entities and synonyms are extracted from three corpora: a set of 56 feasibility reports provided by the European Space Agency, 40 books on space mission design publicly available and a collection of 273 Wikipedia pages. Lexica of relevant space systems entities are semi-automatically generated based on three different methods: a frequency analysis, a term frequency-inverse document frequency analysis, and a Weirdness Index filtering. The frequency-based lexicon of the combined corpora is then fed to a word embedding method, word2vec, to learn the context of each entity. With a cosine similarity analysis, concepts with similar contexts are matched.