Enabling Embedded Neural Network Processing

On-Line Class
CET – Central European Time Zone

February 2-6, 2026

Registration deadline: January 26, 2026
Payment deadline: January 29, 2026

TEACHING HOURS
DAILY	Central European Time CET	Eastern Standard Time EST	Pacific Standard Time PST	India Standard Time IST
Module 1	3:00-4:30 pm	9:00-10:30 am	6:00-7:30 am	7:30-9:00 pm
Module 2	5:00-6:30 pm	11:00-12:30 am	8:00-9:30 am	9:30-11:00 pm

Monday, February 2
3:00-6:30 pm	Neural Network Introduction and Model Techniques	Tijmen Blankevoort, Meta
Tuesday, February 3
3:00-6:30 pm	Custom Hardware Accelerators and Scheduling Techniques	Marian Verhelst, KU Leuven & IMEC
Wednesday, February 4
3:00-6:30 pm	RISC-V and Multi-Core Architectures	Luca Benini, ETHZ/Uni Bologna
Thursday, February 5
3:00-6:30 pm	Compiler Implications	Tobias Grosser, UC Cambridge
Friday, February 6
3:00-6:30 pm	System Integration and Applications	David Atienza, EPFL

Scroll to Top

Abstracts

Enabling Embedded Neural Network Processing
On-Line Class
February 2-6, 2026

While neural networks are omnipresent in cloud scenarios already, there recently is a steep rise of deployment of inferencing tasks in edge and extreme edge devices, such as cars, drones, phones, glasses and wearable medical devices. While such decentralized deployment brings advantages in terms of privacy, response time and reliability, it comes with significant technical challenges. The stringent latency requirements, scarce memory budget and limited energy availability in edge systems, demands a thorough optimization of hardware and software across the full deployment stack. This intensive course will dive deeply into the different optimization strategies across the stack, ranging from algorithmic techniques, over custom hardware architectures, to compiler implications and application-specific system optimizations. Each topic will be covered by a different expert in the field, building on top of recent state-of-the-art research.

Neural Network Introduction and Model Techniques
Tijmen Blankevoort, Meta

Abstract.

Custom Hardware Accelerators and Scheduling Techniques
Marian Verhelst, KU Leuven & IMEC

Neural networks cannot be executed efficiently on CPU or microprocessor. Over the last decade, a myriad of optimized hardware architectures have therefore been proposed to execute these workloads at high throughput and energy efficiency in customized accelerators or GPU extensions. While the field is very diverse, we will see that all implementations all rely on a few common architectural concepts and scheduling techniques, including spatial/temporal unrolling and fusion. We will discuss these techniques in depth, and illustrate them with many SotA examples from recent literature. Finally, we will discuss how to model these concepts at a high level, to enable rapid design space exploration across architectures.

RISC-V and Multi-Core Architectures
Luca Benini, ETHZ/Uni Bologna

This lecture will cover low-power instruction processors for NN workloads, with a focus on energy efficiency. The open RISC-V instruction set architecture (ISA) will be used as baseline for processor design and extensions. Several key ideas in extending the ISA to improve NN execution efficiency will be covered in details, moving from general techniques, such hardware loops and complex addressing modes, to increasingly domain specific improvements, such as mixed-precision SIMD and ternary operations. Vector and tensor instruction extensions will also be discussed. The implications of ISA extension on micro-architecture and hardware implementation will be discussed in depth, with example from several silicon prototypes and products. Techniques to boost performance at high energy efficiency through parallel execution in tightly coupled processor clusters will also be covered, stressing the importance of efficient access to shared memory, synchronization and describing advanced hardware and software design techniques to minimize efficiency losses in parallel architectures.

Compiler Implications
Tobias Grosser, UC Cambridge

Today, there are a plethora of neural network frameworks many of which use state-of-the-art compiler technology as their foundation. We will give an overview over the foundational compilation technology that powers these compilers, the MLIR compiler toolchain. In this interactive course, we will understand the foundations of SSA-based compilers, including how to inspect, modify, and define domain-specific abstractions. We will then use these abstractions to define the IR of an RISC-V style AI accelerator and show how a compiler can be used to generate high-performance code for such an accelerator. After this course, we have obtained a comprehensive understanding of the design of modern AI compilers.

System Integration and Applications
David Atienza, EPFL

There are major challenges in designing energy-efficient edge AI architectures due to the complexity of AI/CNN methods today. As a result, there is a new generation of design flows that target to reduce the complexity of traditional approaches to conceive smaller edge AI systems (pruning, quantization, etc.) while benefiting from AI hardware operating at sub-nominal conditions, such as Ensemble CNNs (E2CNNs). E2CNN will be presented in this module to design ultra-low power (ULP) and resource-efficient edge AI systems targeting real-life applications. These optimized edge AI systems will have the exact memory requirements as the original AI/ML designs but improved error robustness (in different types of memories) for sub-threshold operation. Finally, this module will discuss how such E2CNN-based edge AI systems can be enhanced by including different neural network accelerators for energy-scalable software execution according to the requirements of the target domain. In particular, this module will present different real-life industrial-edge AI systems in the areas of smart wearables and home automation.

Scroll to Top