Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning
Wilson, Callum and Riccardi, Annalisa (2021) Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning. Optimization and Engineering, 24 (1). pp. 223-255. ISSN 1389-4420 (https://doi.org/10.1007/s11081-021-09687-z)
Preview |
Text.
Filename: Wilson_Riccardi_OE_2021_Improving_the_efficiency_of_reinforcement_learning_for_a_spacecraft_powered_descent.pdf
Final Published Version License: Download (2MB)| Preview |
Abstract
Reinforcement learning entails many intuitive and useful approaches to solving various problems. Its main premise is to learn how to complete tasks by interacting with the environment and observing which actions are more optimal with respect to a reward signal. Methods from reinforcement learning have long been applied in aerospace and have more recently seen renewed interest in space applications. Problems in spacecraft control can benefit from the use of intelligent techniques when faced with significant uncertainties - as is common for space environments. Solving these control problems using reinforcement learning remains a challenge partly due to long training times and sensitivity in performance to hyperparameters which require careful tuning. In this work we seek to address both issues for a sample spacecraft control problem. To reduce training times compared to other approaches, we simplify the problem by discretising the action space and use a data-efficient algorithm to train the agent. Furthermore, we employ an automated approach to hyperparameter selection which optimises for a specified performance metric. Our approach is tested on a 3-DOF powered descent problem with uncertainties in the initial conditions. We run experiments with two different problem formulations - using a 'shaped' state representation to guide the agent and also a 'raw' state representation with unprocessed values of position, velocity and mass. The results show that an agent can learn a near-optimal policy efficiently by appropriately defining the action-space and state-space. Using the raw state representation led to 'reward-hacking' and poor performance, which highlights the importance of the problem and state-space formulation in successfully training reinforcement learning agents. In addition, we show that the optimal hyperparameters can vary significantly based on the choice of loss function. Using two sets of hyperparameters optimised for different loss functions, we demonstrate that in both cases the agent can find near-optimal policies with comparable performance to previously applied methods.
ORCID iDs
Wilson, Callum ORCID: https://orcid.org/0000-0003-3736-1355 and Riccardi, Annalisa ORCID: https://orcid.org/0000-0001-5305-9450;-
-
Item type: Article ID code: 77959 Dates: DateEvent4 October 2021Published4 October 2021Published Online1 September 2021Accepted31 July 2020SubmittedSubjects: Technology > Mechanical engineering and machinery
Technology > Motor vehicles. Aeronautics. Astronautics
Technology > Engineering (General). Civil engineering (General)Department: Faculty of Engineering > Mechanical and Aerospace Engineering
Strategic Research Themes > Ocean, Air and SpaceDepositing user: Pure Administrator Date deposited: 30 Sep 2021 13:57 Last modified: 17 Dec 2024 01:21 Related URLs: URI: https://strathprints.strath.ac.uk/id/eprint/77959