Key information:
Student | Kryspin Varys |
Academic Supervisors |
Tim Norman, Adam Sobey, Federico Cerutti |
Cohort | 3 |
Pure Link | Active Project |
Reinforcement learning has enabled computers to beat human experts in many difficult domains
such as Atari, Star Craft and Dota 2. Moreover, these successes have been achieved without
requiring any human input. Other common machine learning techniques, such as supervised learning,
often require datasets which are expensive to produce. Instead reinforcement learning relies on
experience when looking for an optimal behaviour policy. This shows that reinforcement learning
has the potential to solve complex long-lasting tasks as well as save human effort. Yet challenges
prohibiting the integration of these algorithms within the broader industry remain.
The most profound challenge is that the behaviour of reinforcement learning agents is hard to
explain. This is a common issue with many methods relying on neural networks and reinforcement
learning is no exception. The actions the agents take might be sub-optimal and can lead to
dangerous situations for the agent or the environment. This decreases the trustworthiness of the
technology especially in the case of control applications where we require the algorithms to be
provably safe.
In this context our research has three parts. Firstly, to address the issue of safety we will
investigate how to make reinforcement learning agents verifiably safe. Secondly, to make the agent
resilient to changes in its environment we will enable the agent to learn continually throughout
its lifetime. Finally, to demonstrate the agent’s versatility, we will test the agent on a variety of
long-horizon complex tasks.
By making the agent verifiably safe we increase its trustworthiness and enable new applications.
Furthermore, we expect these applications to be long long-lasting and containing different tasks.
Therefore, we research ways in which to enable the agent to learn continually while handling a large
number of varying tasks.