The University of Southampton

Project: Efficient DNNs for Inference at the Edge

Key information:

Student Sulaiman Sadiq
Academic Supervisor Geoff Merrett, Jonathan Hare
Cohort  1
Pure Link  Active Project


With approximately 13.8 billion connected devices, we are currently living in a world surrounded by Deep Learning AI models. These models power a range of applications on low power edge devices such as smart wearables and appliances amongst others. Typically, these devices have extremely low processing capabilities making it difficult to execute AI models locally with collected data often sent to the cloud for processing. This restricts the use cases of AI due to associated reasons of security, privacy and latency. On the other hand, manual design of efficient Deep Neural Networks (DNNs) for on-device execution is an involved time-consuming process which requires expert human knowledge to improve efficiency across different metrics such as latency, energy consumption and network size. In our work, carried out in collaboration with ARM Research and the International Centre for Spatial Computational Learning, we have been working on developing algorithms to automate the design of DNNs that deliver optimal performance on constrained edge devices.

We developed DEff-ARTS, a differentiable efficient architecture search algorithm for automatically deriving network architectures for image classification on resource constrained edge devices. By framing the search as a multi-objective optimisation problem, we simultaneously minimised the classification loss and the computational complexity of performing inference on the target device. Our formulation allowed for easy trading-off between the sub-objectives depending on user requirements. Experimental results on image classification showed that DEff-ARTS was able to derive highly competitive network architectures with up to 7x reduction in required compute cycles and 2x smaller model sizes compared to other contemporary approaches. Currently, we are working on Dynamic DNNs where a run-time configurable super-network contains multiple efficient sub-networks of varying complexity that can be switched between as required. With multiple models we can further reduce memory usage to enable concurrent running of larger models or multiple workloads.