Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning

Wilson, Callum and Riccardi, Annalisa (2021) Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning. Optimization and Engineering, 24 (1). pp. 223-255. ISSN 1389-4420 (https://doi.org/10.1007/s11081-021-09687-z)

[thumbnail of Wilson-Riccardi-OE-2021-Improving-the-efficiency-of-reinforcement-learning-for-a-spacecraft-powered-descent]
Preview
Text. Filename: Wilson_Riccardi_OE_2021_Improving_the_efficiency_of_reinforcement_learning_for_a_spacecraft_powered_descent.pdf
Final Published Version
License: Creative Commons Attribution 4.0 logo

Download (2MB)| Preview

Abstract

Reinforcement learning entails many intuitive and useful approaches to solving various problems. Its main premise is to learn how to complete tasks by interacting with the environment and observing which actions are more optimal with respect to a reward signal. Methods from reinforcement learning have long been applied in aerospace and have more recently seen renewed interest in space applications. Problems in spacecraft control can benefit from the use of intelligent techniques when faced with significant uncertainties - as is common for space environments. Solving these control problems using reinforcement learning remains a challenge partly due to long training times and sensitivity in performance to hyperparameters which require careful tuning. In this work we seek to address both issues for a sample spacecraft control problem. To reduce training times compared to other approaches, we simplify the problem by discretising the action space and use a data-efficient algorithm to train the agent. Furthermore, we employ an automated approach to hyperparameter selection which optimises for a specified performance metric. Our approach is tested on a 3-DOF powered descent problem with uncertainties in the initial conditions. We run experiments with two different problem formulations - using a 'shaped' state representation to guide the agent and also a 'raw' state representation with unprocessed values of position, velocity and mass. The results show that an agent can learn a near-optimal policy efficiently by appropriately defining the action-space and state-space. Using the raw state representation led to 'reward-hacking' and poor performance, which highlights the importance of the problem and state-space formulation in successfully training reinforcement learning agents. In addition, we show that the optimal hyperparameters can vary significantly based on the choice of loss function. Using two sets of hyperparameters optimised for different loss functions, we demonstrate that in both cases the agent can find near-optimal policies with comparable performance to previously applied methods.