Enabling Embedded Neural Network Processing
On-Line Class
February 3-7, 2025
|
While neural networks are omnipresent in cloud scenarios already, there recently is a steep rise of deployment of inferencing tasks in edge and extreme edge devices, such as cars, drones, phones, glasses and wearable medical devices. While such decentralized deployment brings advantages in terms of privacy, response time and reliability, it comes with significant technical challenges. The stringent latency requirements, scarce memory budget and limited energy availability in edge systems, demands a thorough optimization of hardware and software across the full deployment stack. This intensive course will dive deeply into the different optimization strategies across the stack, ranging from algorithmic techniques, over custom hardware architectures, to compiler implications and application-specific system optimizations. Each topic will be covered by a different expert in the field, building on top of recent state-of-the-art research.
|
Neural Network Introduction and Model Techniques
Tijmen Blankevoort, Meta
Abstract.
|
Custom Hardware Accelerators and Scheduling Techniques
Marian Verhelst, KU Leuven & IMEC
Neural networks cannot be executed efficiently on CPU or microprocessor. Over the last decade, a myriad of optimized hardware architectures have therefore been proposed to execute these workloads at high throughput and energy efficiency in customized accelerators or GPU extensions. While the field is very diverse, we will see that all implementations all rely on a few common architectural concepts and scheduling techniques, including spatial/temporal unrolling and fusion. We will discuss these techniques in depth, and illustrate them with many SotA examples from recent literature. Finally, we will discuss how to model these concepts at a high level, to enable rapid design space exploration across architectures.
|
RISC-V and Multi-Core Architectures
Luca Benini, ETHZ/Uni Bologna
This lecture will cover low-power instruction processors for NN workloads, with a focus on energy efficiency. The open RISC-V instruction set architecture (ISA) will be used as baseline for processor design and extensions. Several key ideas in extending the ISA to improve NN execution efficiency will be covered in details, moving from general techniques, such hardware loops and complex addressing modes, to increasingly domain specific improvements, such as mixed-precision SIMD and ternary operations. Vector and tensor instruction extensions will also be discussed. The implications of ISA extension on micro-architecture and hardware implementation will be discussed in depth, with example from several silicon prototypes and products. Techniques to boost performance at high energy efficiency through parallel execution in tightly coupled processor clusters will also be covered, stressing the importance of efficient access to shared memory, synchronization and describing advanced hardware and software design techniques to minimize efficiency losses in parallel architectures.
|
Compiler Implications
Tobias Grosser, UC Cambridge
Abstract.
|
System Integration and Applications
David Atienza, EPFL
There are major challenges in designing energy-efficient edge AI architectures due to the complexity of AI/CNN methods today. As a result, there is a new generation of design flows that target to reduce the complexity of traditional approaches to conceive smaller edge AI systems (pruning, quantization, etc.) while benefiting from AI hardware operating at sub-nominal conditions, such as Ensemble CNNs (E2CNNs). E2CNN will be presented in this module to design ultra-low power (ULP) and resource-efficient edge AI systems targeting real-life applications. These optimized edge AI systems will have the exact memory requirements as the original AI/ML designs but improved error robustness (in different types of memories) for sub-threshold operation. Finally, this module will discuss how such E2CNN-based edge AI systems can be enhanced by including different neural network accelerators for energy-scalable software execution according to the requirements of the target domain. In particular, this module will present different real-life industrial-edge AI systems in the areas of smart wearables and home automation.
|
|