Hardware-Efficient Edge AI
On-Line Class
March 12-22, 2024
|
Course Abstract
Artificial intelligence workloads become more and more important for intelligent edge and extreme-edge devices, a trend also known as “tinyML”. Yet, these workloads come with significant computational complexity, making their execution until recently only feasible on power-hungry server or GPU platforms. Recently a surge of techniques at the algorithmic, hardware architecture and circuit domain are now also creating breakthroughs to enable ML workloads in real-time embedded devices at the edge and extreme edge under low energy and latency budget. Such system optimization, however, does require thorough knowledge and cross-layer optimization across all these fields, all the way from application and algorithm, over compilers and schedulers, to system design, macro-architecture design and circuit design. In this course, we will go over all these aspects of machine learning at the edge, with especially a deeper dive into hardware optimization opportunities. Enough time is also foreseen to discuss practical case studies and end-to-end-optimizations.
|
ML Applications, Scenario’s and Constraints for the Edge
Marian Verhelst, KU Leuven, Belgium
– Overview of applications
– Cloud vs Edge vs tinyML (extreme edge)
– Inference vs learning vs federated learning
– Application constraints and scenario’s
– Flavors of ML and AI (types of models)
|
ML Algorithms and Resulting Challenges
Marian Verhelst, KU Leuven, Belgium
Computational consequences and HW requirements for the EDGE of:
– Probabilistic models
– decision trees
– SVM’s
– NN’s (deep and non-deep; layer types, …)
– NN training
– Hyperdim computing?
Challenges and requirements for efficient AI at the edge.
|
Neural Network Compression for the Edge
Tijmen Blankevoort, Meta, The Netherlands
– Take any network, how can we make it smaller structurally?
– Neural Network Pruning
– Structured Compression
– Neural Architecture Search as a compression method.
|
Neural Network Quantization for the Edge
Tijmen Blankevoort, Meta, The Netherlands
– Quantization Introduction and Simulation
– Quantization-aware training
– Post-training quantization techniques
– Mixed Precision.
|
Specializing Processors for ML
Luca Benini, Università di Bologna, Italy/ETHZ, Switzerland
– Classical instruction set architectures (ISAs) limitations for ML
– ISA Extensions for ML
– Micro-architecture of ML-specialized cores
– PPA (power performance area) optimization and implementation techniques.
|
From Single to Multi-Core Low-Power SoCs for ML
Luca Benini, Università di Bologna, Italy/ETHZ, Switzerland
– Single-core ML SoCs – architecture, implementation, PPA analysis
– Multi-core ML SoCs – architecture, implementation, PPA analysis
– Integration of cores and Hardwired ML accelerators
– Memory hierarchy: challenges and solutions.
|
Concepts Towards ML Acceleration
Marian Verhelst, KU Leuven, Belgium
– ML models / CNN / DNN recap formalization; GeMM
– GeMM on traditional CPU / GPU
– Energy/latency losses and opportunities
– Concepts towards more efficient ML acceleration on single core/single layer
– Parallelization (spatial unrolling optimization)
– Stationarity (temporal unrolling optimization)
– Extending spatial and temporal unrolling to higher levels.
|
Exploiting Quantization and Sparsity at the HW Level
Marian Verhelst, KU Leuven, Belgium
– Concepts towards more efficient AI acceleration on single core/single layer (ctu)
– Sparse workloads
– Quantization – analog domain
[Optional: – Concepts towards more efficient AI acceleration on multi-core/multi-layer].
|
Analog/Mixed-Signal Acceleration
Naveen Verma, Princeton University, USA
– Review of key ops to accelerate (MACs) and potential energy savings through analog
– Overview of Approaches for MACs
• Electronic (current, voltage, charge summing)
• Optical
– Overheads and limitations
• Memory accessing -> motivates in-memory computing
• Data conversion
• Technology integration with digital engines/memory
– Fundamental tradeoffs
• Energy/throughput vs SNR
– In-memory computing
• Different memory techs and approaches.
|
Architectural Integration of Emerging Compute Models and Technologies
Naveen Verma, Princeton University, USA
– Dataflow bottlenecks (weight/state loading)
– Structured memory accessing and benefits of emerging memory
– Co-design with trainers.
|
Tools: Accelerator Code Generation
Tobias Grosser, University of Cambridge, UK
Abstract.
|
Tools: Accelerator Code Generation
Tobias Grosser, University of Cambridge, UK
Abstract.
|
Emerging ML Paradigms: Neuro-Inspired Computing
Jan Rabaey, UC Berkeley, USA
Lessons from the brain and what it means for:
– ML hardware,
– Neuromorphic,
– Hyper-dimensional, etc.
|
Emerging ML Paradigms: Towards Cognitive Systems
Jan Rabaey, UC Berkeley, USA
– Autonomous sensor-control-actuation
– Model versus data driven
– Reinforcement learning
– Symbolic reasoning
– Probabilistic learning, graphs
|
Efficient Execution of Approximated AI Algorithms on Heterogeneous Edge AI Systems
David Atienza, EPFL, Switzerland
i. Major challenges in designing energy-efficient edge AI architectures due to the complexity of AI/CNN.
ii. Design options to reduce complexity (pruning, quantization, etc.) and benefits of operating edge AI architectures at sub-nominal conditions.
iii. New architectural design methodologies for edge AI systems, called Embedded Ensemble CNNs (E2CNNs) to conceive pruned CNNs and AI implementations with improved robustness against memory errors in pruned/quantized single-instance ML/CNNs.
iv. Experimental evaluation of compression methods and design space exploration to produce an ensemble of CNNs for edge AI devices with the same memory requirements as the original architectures but improved error robustness (in different types of memories) for sub-threshold operation.
|
Application-Driven System Design and Optimization flow of Edge AI Use Cases in Industrial and Medical Domains
David Atienza, EPFL, Switzerland
i. Overview of major key challenges in different industrial case studies for AI/ML systems (computation vs communication and other trade-offs to consider particularly for medical applications in the context of Big Data healthcare).
ii. Different design option for AI/ML hardware systems using centralized vs. federated approaches on edge AI systems.
iii. Mapping options for ULP multi-core embedded systems with neural network accelerators for energy-scalable software layers based on target applications.
iv. Examples of next-generation of smart wearable devices in the healthcare context
v. Examples of industrial edge AI systems for home automation.
|
Practical Case Studies: “Energy Efficient ML Applications for Metaverse”
Huichu Liu, Meta, USA
– Overview of AR/VR system features and energy constraints
– Breakdown different applications running on AR/VR HW and its related ML algorithms
– HW techniques to enable energy efficient NN execution
– HW-SW techniques to enable efficient NN mapping
– Algorithm techniques for practical applications
– Future applications/challenges/research directions
|
Cross-Layer Optimization
Marian Verhelst, KU Leuven, Belgium
– Need for optimization across the stack cross layer design space exploration
– Tools flows for cross layer optimization
– Final Q&A and adjourn
|
|