Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning

Wilson, Callum and Riccardi, Annalisa (2021) Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning. Optimization and Engineering, 24 (1). pp. 223-255. ISSN 1389-4420 (https://doi.org/10.1007/s11081-021-09687-z)

[thumbnail of Wilson-Riccardi-OE-2021-Improving-the-efficiency-of-reinforcement-learning-for-a-spacecraft-powered-descent]

Preview

Text. Filename: Wilson_Riccardi_OE_2021_Improving_the_efficiency_of_reinforcement_learning_for_a_spacecraft_powered_descent.pdf
Final Published Version
License:

Download (2MB)| Preview

Abstract

Reinforcement learning entails many intuitive and useful approaches to solving various problems. Its main premise is to learn how to complete tasks by interacting with the environment and observing which actions are more optimal with respect to a reward signal. Methods from reinforcement learning have long been applied in aerospace and have more recently seen renewed interest in space applications. Problems in spacecraft control can benefit from the use of intelligent techniques when faced with significant uncertainties - as is common for space environments. Solving these control problems using reinforcement learning remains a challenge partly due to long training times and sensitivity in performance to hyperparameters which require careful tuning. In this work we seek to address both issues for a sample spacecraft control problem. To reduce training times compared to other approaches, we simplify the problem by discretising the action space and use a data-efficient algorithm to train the agent. Furthermore, we employ an automated approach to hyperparameter selection which optimises for a specified performance metric. Our approach is tested on a 3-DOF powered descent problem with uncertainties in the initial conditions. We run experiments with two different problem formulations - using a 'shaped' state representation to guide the agent and also a 'raw' state representation with unprocessed values of position, velocity and mass. The results show that an agent can learn a near-optimal policy efficiently by appropriately defining the action-space and state-space. Using the raw state representation led to 'reward-hacking' and poor performance, which highlights the importance of the problem and state-space formulation in successfully training reinforcement learning agents. In addition, we show that the optimal hyperparameters can vary significantly based on the choice of loss function. Using two sets of hyperparameters optimised for different loss functions, we demonstrate that in both cases the agent can find near-optimal policies with comparable performance to previously applied methods.

Share and Export

Item metadata

Item type:	Article
ID code:	77959
Dates:	Date Event 4 October 2021 Published 4 October 2021 Published Online 1 September 2021 Accepted 31 July 2020 Submitted
Subjects:	Technology > Mechanical engineering and machinery Technology > Motor vehicles. Aeronautics. Astronautics Technology > Engineering (General). Civil engineering (General)
Department:	Faculty of Engineering > Mechanical and Aerospace Engineering Strategic Research Themes > Ocean, Air and Space
Depositing user:	Pure Administrator
Date deposited:	30 Sep 2021 13:57
Last modified:	08 Jul 2024 01:37
Related URLs:	Journal or Publication
URI:	https://strathprints.strath.ac.uk/id/eprint/77959

CORE (COnnecting REpositories)