The University of Southampton

Project: Interactive Lifelong Reinforcement Learning

Key information:

Student Kryspin Varys
Academic Supervisors

Tim Norman, Adam Sobey, Federico Cerutti

Cohort  3
Pure Link  Active Project

Abstract: 

Reinforcement learning has enabled computers to beat human experts in many difficult domains

such as Atari, Star Craft and Dota 2. Moreover, these successes have been achieved without

requiring any human input. Other common machine learning techniques, such as supervised learning,

often require datasets which are expensive to produce. Instead reinforcement learning relies on

experience when looking for an optimal behaviour policy. This shows that reinforcement learning

has the potential to solve complex long-lasting tasks as well as save human effort. Yet challenges

prohibiting the integration of these algorithms within the broader industry remain.

The most profound challenge is that the behaviour of reinforcement learning agents is hard to

explain. This is a common issue with many methods relying on neural networks and reinforcement

learning is no exception. The actions the agents take might be sub-optimal and can lead to

dangerous situations for the agent or the environment. This decreases the trustworthiness of the

technology especially in the case of control applications where we require the algorithms to be

provably safe.

In this context our research has three parts. Firstly, to address the issue of safety we will

investigate how to make reinforcement learning agents verifiably safe. Secondly, to make the agent

resilient to changes in its environment we will enable the agent to learn continually throughout

its lifetime. Finally, to demonstrate the agent’s versatility, we will test the agent on a variety of

long-horizon complex tasks.

By making the agent verifiably safe we increase its trustworthiness and enable new applications.

Furthermore, we expect these applications to be long long-lasting and containing different tasks.

Therefore, we research ways in which to enable the agent to learn continually while handling a large

number of varying tasks.