Hardware-Efficient Edge AI

On-Line Class
CET – Central European Time Zone

Week 1: March 12-15, 2024

Week 2: March 18-22, 2024

Registration deadline: February 23, 2024
Payment deadline: March 1, 2024

TEACHING HOURS
DAILY	Central European Time CET	Eastern Standard Time EST	Pacific Standard Time PST	India Standard Time IST
Module 1	3:00-4:30 pm	9:00-10:30 am	6:00-7:30 am	7:30-9:00 pm
Module 2	5:00-6:30 pm	11:00 am-12:30 pm	8:00-9:30 am	9:30-11:00 pm

WEEK 1: March 12-15

Tuesday, March 12
3:00-4:30 pm	Context: ML Applications, Scenario’s and Constraints for the Edge	Marian Verhelst, KU Leuven
5:00-6:30 pm	Context: ML Algorithms and Resulting Challenges	Marian Verhelst, KU Leuven
Wednesday, March 13
3:00-4:30 pm	Algorithms: Neural Network Compression for the Edge	Tijmen Blankevoort, Meta
5:00-6:30 pm	Algorithms: Neural Network Quantization for the Edge	Tijmen Blankevoort, Meta
Thursday, March 14
3:00-4:30 pm	HW, CPU: Specializing Processors for ML	Luca Benini, Uni Bologna/ETHZ
5:00-6:30 pm	HW, CPU: From Single to Multi-Core Low-Power SoCs for ML	Luca Benini, Uni Bologna/ETHZ
Friday, March 15
3:00-4:30 pm	HW, Digital: Concepts Towards ML Acceleration	Marian Verhelst, KU Leuven
5:00-6:30 pm	HW, Digital: Exploiting Quantization and Sparsity at the HW Level	Marian Verhelst, KU Leuven

WEEK 2: March 18-22

Monday, March 18
3:00-4:30 pm	HW, Analog: Analog/Mixed-Signal Acceleration	Naveen Verma, Princeton
5:00-6:30 pm	HW, Tech: Architectural Integration of Emerging Compute Models and Technologies	Naveen Verma, Princeton
Tuesday, March 19
3:00-4:30 pm	Tools: Accelerator Code Generation	Tobias Grosser, University of Cambridge
5:00-6:30 pm	Tools: Landscape of DL Compilers	Tobias Grosser, University of Cambridge
Wednesday, March 20
3:00-4:30 pm	Emerging ML Paradigms: Neuro-Inspired Computing	Jan Rabaey, UC Berkeley
5:00-6:30 pm	Emerging ML Paradigms: Towards Cognitive Systems	Jan Rabaey, UC Berkeley
Thursday, March 21
3:00-4:30 pm	System: Efficient Execution of Approximated AI Algorithms on Heterogeneous Edge AI Systems	David Atienza, EPFL
5:00-6:30 pm	Use Cases: Application-Driven System Design and Optimization flow of Edge AI Use Cases in Industrial and Medical Domains	David Atienza, EPFL
Friday, March 22
3:00-4:30 pm	Practical Use Cases: Energy Efficient ML Applications for Metaverse	Huichu Liu, Meta
5:00-6:30 pm	Cross-Layer Optimization	Marian Verhelst, KU Leuven

Scroll to Top

Abstracts

Hardware-Efficient Edge AI
On-Line Class
March 12-22, 2024

Course Abstract

Artificial intelligence workloads become more and more important for intelligent edge and extreme-edge devices, a trend also known as “tinyML”. Yet, these workloads come with significant computational complexity, making their execution until recently only feasible on power-hungry server or GPU platforms. Recently a surge of techniques at the algorithmic, hardware architecture and circuit domain are now also creating breakthroughs to enable ML workloads in real-time embedded devices at the edge and extreme edge under low energy and latency budget. Such system optimization, however, does require thorough knowledge and cross-layer optimization across all these fields, all the way from application and algorithm, over compilers and schedulers, to system design, macro-architecture design and circuit design. In this course, we will go over all these aspects of machine learning at the edge, with especially a deeper dive into hardware optimization opportunities. Enough time is also foreseen to discuss practical case studies and end-to-end-optimizations.

ML Applications, Scenario’s and Constraints for the Edge
Marian Verhelst, KU Leuven, Belgium

– Overview of applications
– Cloud vs Edge vs tinyML (extreme edge)
– Inference vs learning vs federated learning
– Application constraints and scenario’s
– Flavors of ML and AI (types of models)

ML Algorithms and Resulting Challenges
Marian Verhelst, KU Leuven, Belgium

Computational consequences and HW requirements for the EDGE of:
– Probabilistic models
– decision trees
– SVM’s
– NN’s (deep and non-deep; layer types, …)
– NN training
– Hyperdim computing?
Challenges and requirements for efficient AI at the edge.

Neural Network Compression for the Edge
Tijmen Blankevoort, Meta, The Netherlands

– Take any network, how can we make it smaller structurally?
– Neural Network Pruning
– Structured Compression
– Neural Architecture Search as a compression method.

Neural Network Quantization for the Edge
Tijmen Blankevoort, Meta, The Netherlands

– Quantization Introduction and Simulation
– Quantization-aware training
– Post-training quantization techniques
– Mixed Precision.

Specializing Processors for ML
Luca Benini, Università di Bologna, Italy/ETHZ, Switzerland

– Classical instruction set architectures (ISAs) limitations for ML
– ISA Extensions for ML
– Micro-architecture of ML-specialized cores
– PPA (power performance area) optimization and implementation techniques.

From Single to Multi-Core Low-Power SoCs for ML
Luca Benini, Università di Bologna, Italy/ETHZ, Switzerland

– Single-core ML SoCs – architecture, implementation, PPA analysis
– Multi-core ML SoCs – architecture, implementation, PPA analysis
– Integration of cores and Hardwired ML accelerators
– Memory hierarchy: challenges and solutions.

Concepts Towards ML Acceleration
Marian Verhelst, KU Leuven, Belgium

– ML models / CNN / DNN recap formalization; GeMM
– GeMM on traditional CPU / GPU
– Energy/latency losses and opportunities
– Concepts towards more efficient ML acceleration on single core/single layer
– Parallelization (spatial unrolling optimization)
– Stationarity (temporal unrolling optimization)
– Extending spatial and temporal unrolling to higher levels.

Exploiting Quantization and Sparsity at the HW Level
Marian Verhelst, KU Leuven, Belgium

– Concepts towards more efficient AI acceleration on single core/single layer (ctu)
– Sparse workloads
– Quantization – analog domain
[Optional: – Concepts towards more efficient AI acceleration on multi-core/multi-layer].

Analog/Mixed-Signal Acceleration
Naveen Verma, Princeton University, USA

– Review of key ops to accelerate (MACs) and potential energy savings through analog
– Overview of Approaches for MACs
• Electronic (current, voltage, charge summing)
• Optical
– Overheads and limitations
• Memory accessing -> motivates in-memory computing
• Data conversion
• Technology integration with digital engines/memory
– Fundamental tradeoffs
• Energy/throughput vs SNR
– In-memory computing
• Different memory techs and approaches.

Architectural Integration of Emerging Compute Models and Technologies
Naveen Verma, Princeton University, USA

– Dataflow bottlenecks (weight/state loading)
– Structured memory accessing and benefits of emerging memory
– Co-design with trainers.

Tools: Accelerator Code Generation
Tobias Grosser, University of Cambridge, UK

Abstract.

Tools: Accelerator Code Generation
Tobias Grosser, University of Cambridge, UK

Abstract.

Emerging ML Paradigms: Neuro-Inspired Computing
Jan Rabaey, UC Berkeley, USA

Lessons from the brain and what it means for:
– ML hardware,
– Neuromorphic,
– Hyper-dimensional, etc.

Emerging ML Paradigms: Towards Cognitive Systems
Jan Rabaey, UC Berkeley, USA

– Autonomous sensor-control-actuation
– Model versus data driven
– Reinforcement learning
– Symbolic reasoning
– Probabilistic learning, graphs

Efficient Execution of Approximated AI Algorithms on Heterogeneous Edge AI Systems
David Atienza, EPFL, Switzerland

i. Major challenges in designing energy-efficient edge AI architectures due to the complexity of AI/CNN.
ii. Design options to reduce complexity (pruning, quantization, etc.) and benefits of operating edge AI architectures at sub-nominal conditions.
iii. New architectural design methodologies for edge AI systems, called Embedded Ensemble CNNs (E2CNNs) to conceive pruned CNNs and AI implementations with improved robustness against memory errors in pruned/quantized single-instance ML/CNNs.
iv. Experimental evaluation of compression methods and design space exploration to produce an ensemble of CNNs for edge AI devices with the same memory requirements as the original architectures but improved error robustness (in different types of memories) for sub-threshold operation.

Application-Driven System Design and Optimization flow of Edge AI Use Cases in Industrial and Medical Domains
David Atienza, EPFL, Switzerland

i. Overview of major key challenges in different industrial case studies for AI/ML systems (computation vs communication and other trade-offs to consider particularly for medical applications in the context of Big Data healthcare).
ii. Different design option for AI/ML hardware systems using centralized vs. federated approaches on edge AI systems.
iii. Mapping options for ULP multi-core embedded systems with neural network accelerators for energy-scalable software layers based on target applications.
iv. Examples of next-generation of smart wearable devices in the healthcare context
v. Examples of industrial edge AI systems for home automation.

Practical Case Studies: “Energy Efficient ML Applications for Metaverse”
Huichu Liu, Meta, USA

– Overview of AR/VR system features and energy constraints
– Breakdown different applications running on AR/VR HW and its related ML algorithms
– HW techniques to enable energy efficient NN execution
– HW-SW techniques to enable efficient NN mapping
– Algorithm techniques for practical applications
– Future applications/challenges/research directions

Cross-Layer Optimization
Marian Verhelst, KU Leuven, Belgium

– Need for optimization across the stack cross layer design space exploration
– Tools flows for cross layer optimization
– Final Q&A and adjourn

Scroll to Top