High-performance compute cluster enhances Southampton AI teaching and research
The Alpha cluster, based in the Universitys state-of-the-art data centre near Southampton, features 24 NVidia RTX8000 GPUs installed across six cluster nodes.
The Alpha cluster is the result of a joint initiative involving funding from the Faculty of Engineering and Physical Sciences, ECS and the UKRI Centre for Doctoral Training in Machine Intelligence for Nano- Electronic Devices and Systems (MINDS).
The new facility complements the existing GPU computing provision of the fifth generation IRIDIS Computer Cluster and ECS teaching laboratories, and is specifically aimed at workloads requiring large amounts of GPU memory or long run times.
The cluster enables researchers to train highly parallel machine learning and AI neural networks, while also providing an invaluable resource for Southamptons research-led teaching programmes.
Each GPU card has 48GB of RAM to tackle large training data sets that can be stored on the 20TB of fast storage attached to each node. In addition, each node features 100GB/s NVLink interfaces giving outstanding performance when using multiple GPUs.
Lance Draper, Research Systems Manager in Engineering and Physical Sciences, says: This equipment has been purchased for the exclusive use of ECS staff and students and allows the School to meet ever increasing teaching and research demands for artificial intelligence and machine learning systems.
Southampton undergraduate programmes that will particularly benefit from the resource include its MEng Electronic Engineering with AI, MEng Computer Science with AI, and MSc Artificial Intelligence degrees - both on individual projects and coursework for optional modules such as Advanced Machine Learning and Deep Learning.
The Alpha cluster, whose name is inspired by DeepMinds AlphaZero AI system, is already being used by researchers to simulate the complex processes of the optic nerve.
In recent work with PhD students Daniela Mihai, Ethan Harris and Associate Professor Dr Jonathon Hare, computer scientists used Alpha to train variants of the ResNet50 image classification model on the ImageNet dataset to understand how retinal bottlenecks affect these models.
The latest research will be presented at the Shared Visual Representations in Human & Machine Intelligence workshop at the 2020 Conference on Neural Information Processing Systems (NeurIPS) on Saturday 12th December.
Dr Hare, of the Vision, Learning and Control Research Group, says: The new ECS Alpha cluster allows us to investigate how changes to neural network architectures, hyperparameters and training regimes affects what those networks learn in ways that were not possible for us to do before.
The large-memory GPUs in the cluster allow us to be much more efficient in our training procedures as a result of increased data throughput, and remove limits on model size. Scientifically this is very important because it allows us to understand errors and draw more concrete conclusions from the models we create.
Mr Draper says: The Alpha project has been an excellent example of collaboration across the University and particularly between ECS and iSolutions.
Most of the systems development has been done by David Baker and David Hempston in iSolutions High Performance Computing team, along with excellent support from the Data Infrastructure team members Dom Malson, Tony Gregan and Jon Raney.