The University of Southampton

Project: Interactive Lifelong Reinforcement Learning

Key information:

Student John Birkbeck
Academic Supervisors

Tim Norman, Adam Sobey, Federico Cerutti

Cohort  3
Pure Link  Active Project

Abstract: 

Reinforcement Learning has produced AI agents with superhuman abilities, with AlphaGo Zero winning against humans in Go just one of many examples. These AI are 'narrow', however: they learn only one task, forget that task when learning another, and do not reuse their abilities across tasks. The popular media image of intelligent robots embedded across our society is, for now, just science fiction.
They are also costly to train: researchers at the University of Copenhagen estimated that the GPT-3 AI language model used 188MW of power to train, emitting 85 tonnes of CO2. To sustainably integrate AI into our lives, reducing training costs and emissions will be essential. Research toward interactive and lifelong agents can help reduce these costs by producing broader AI which reuses previous experience to solve new tasks, avoiding the need for intensive training for every task.
One approach toward broader AI agents is to have them learn 'skills', action policies that are useful across many tasks. An agent can learn how to apply its skills for each task without forgetting the skills themselves. Model-based skills can be planned with, so that new tasks can be solved with little or no training.
Skills can also be useful for interaction between AI and humans. Interactive agents have outperformed non-interactive agents on the famously difficult game of 'Montezuma's Revenge' by using language based instructions. Agents which can translate a sentence into skill-based behaviours could use this to perform many tasks efficiently, rather than just one.
The range of uses for interactive lifelong reinforcement learning agents is wide, and could transform our economy. In the future, these agents might be found performing maintenance tasks in hazardous environments like nuclear reactors, building cars without humans in fully automated factories, or just making you a cup of tea in your kitchen.