HW Accelerated Machine Learning at the Edge
On-Line Class
April 25 – May 6, 2022
|
Course Abstract
Machine learning workloads become increasingly important for IoT devices and intelligent extreme edge devices, due to the desire to move more and more intelligence into the edge. Yet, these workloads come with significant computational complexity, making their execution until recently only feasible on power-hungry server or GPU platforms.
In this course, we will go over all these aspects of machine learning at the edge, with especially a deeper dive into hardware optimization opportunities. Enough time is also foreseen to discuss practical case studies and end-to-end-optimizations along with introducing the vibrant tinyML® cross-functional ecosystem.
|
ML Applications, Scenario’s and Constraints for the Edge
Marian Verhelst, KU Leuven, Belgium
– Overview of applications
– Cloud vs Edge vs tinyML (extreme edge)
– Inference vs learning vs federated learning
– Application constraints and scenario’s
– Flavors of ML and AI (types of models)
|
ML Algorithms and Resulting Challenges
Marian Verhelst, KU Leuven, Belgium
Computational consequences and HW requirements for the EDGE of:
– Probabilistic models
– decision trees
– SVM’s
– NN’s (deep and non-deep; layer types, …)
– NN training
– Hyperdim computing?
Challenges and requirements for efficient AI at the edge.
|
Neural Network Compression for the Edge
Tijmen Blankevoort, Qualcomm, The Netherlands
– Take any network, how can we make it smaller structurally?
– Neural Network Pruning
– Structured Compression
– Neural Architecture Search as a compression method.
|
Neural Network Quantization for the Edge
Tijmen Blankevoort, Qualcomm, The Netherlands
– Quantization Introduction and Simulation
– Quantization-aware training
– Post-training quantization techniques
– Mixed Precision.
|
Specializing Processors for ML
Luca Benini, Università di Bologna, Italy/ETHZ, Switzerland
– Classical instruction set architectures (ISAs) limitations for ML
– ISA Extensions for ML
– Micro-architecture of ML-specialized cores
– PPA (power performance area) optimization and implementation techniques.
|
From Single to Multi-Core Low-Power SoCs for ML
Luca Benini, Università di Bologna, Italy/ETHZ, Switzerland
– Single-core ML SoCs – architecture, implementation, PPA analysis
– Multi-core ML SoCs – architecture, implementation, PPA analysis
– Integration of cores and Hardwired ML accelerators
– Memory hierarchy: challenges and solutions.
|
Concepts Towards ML Acceleration
Marian Verhelst, KU Leuven, Belgium
– ML models / CNN / DNN recap formalization; GeMM
– GeMM on traditional CPU / GPU
– Energy/latency losses and opportunities
– Concepts towards more efficient ML acceleration on single core/single layer
– Parallelization (spatial unrolling optimization)
– Stationarity (temporal unrolling optimization)
– Extending spatial and temporal unrolling to higher levels.
|
Exploiting Quantization and Sparsity at the HW Level
Marian Verhelst, KU Leuven, Belgium
– Concepts towards more efficient AI acceleration on single core/single layer (ctu)
– Sparse workloads
– Quantization – analog domain
[Optional: – Concepts towards more efficient AI acceleration on multi-core/multi-layer].
|
Analog/Mixed-Signal Acceleration
Naveen Verma, Princeton University, USA
– Review of key ops to accelerate (MACs) and potential energy savings through analog
– Overview of Approaches for MACs
• Electronic (current, voltage, charge summing)
• Optical
– Overheads and limitations
• Memory accessing -> motivates in-memory computing
• Data conversion
• Technology integration with digital engines/memory
– Fundamental tradeoffs
• Energy/throughput vs SNR
– In-memory computing
• Different memory techs and approaches.
|
Architectural integration of emerging compute models and technologies
Naveen Verma, Princeton University, USA
– Dataflow bottlenecks (weight/state loading)
– Structured memory accessing and benefits of emerging memory
– Co-design with trainers.
|
Training Frameworks for ML at the Edge
Vijay Janapa Reddi, Harvard University, USA
– Automatic Dataset Generation
– Few-shot Keyword Spotting
– Multilingual Spoken Words Corpus.
|
Deploying ML Models at the Edge: from CPU to Accelerator
Vijay Janapa Reddi, Harvard University, USA
– TensorFlow Lite Micro
– CFU Playground
– Benchmarking Data- and Model-centric ML
– DataPerf
– MLPerf
|
Tools: Landscape of DL Compilers and Challenges for Inference
Tushar Krishna, Geogia Tech, USA
Prasanth Chatarasi, IBM, USA
TVM, compilers for custom accelerators, runtimes, mappers, compilers, TVM.
|
Mapping and HW Co-optimization
Tushar Krishna, Geogia Tech, USA
Prasanth Chatarasi, IBM, USA
– Mapping/HW optimization using rapid cost models
– HW-aware NAS.
|
Efficient Execution of Approximated AI Algorithms on Heterogeneous Edge AI Systems
David Atienza, EPFL, Switzerland
i. Major challenges in designing energy-efficient edge AI architectures due to the complexity of AI/CNN.
ii. Design options to reduce complexity (pruning, quantization, etc.) and benefits of operating edge AI architectures at sub-nominal conditions.
iii. New architectural design methodologies for edge AI systems, called Embedded Ensemble CNNs (E2CNNs) to conceive pruned CNNs and AI implementations with improved robustness against memory errors in pruned/quantized single-instance ML/CNNs.
iv. Experimental evaluation of compression methods and design space exploration to produce an ensemble of CNNs for edge AI devices with the same memory requirements as the original architectures but improved error robustness (in different types of memories) for sub-threshold operation.
|
Application-Driven System Design and Optimization flow of Edge AI Use Cases in Industrial and Medical Domains
David Atienza, EPFL, Switzerland
i. Overview of major key challenges in different industrial case studies for AI/ML systems (computation vs communication and other trade-offs to consider particularly for medical applications in the context of Big Data healthcare).
ii. Different design option for AI/ML hardware systems using centralized vs. federated approaches on edge AI systems.
iii. Mapping options for ULP multi-core embedded systems with neural network accelerators for energy-scalable software layers based on target applications.
iv. Examples of next-generation of smart wearable devices in the healthcare context
v. Examples of industrial edge AI systems for home automation.
|
Emerging ML Paradigms: Neuro-Inspired Computing
Jan Rabaey, UC Berkeley, USA
Lessons from the brain and what it means for:
– ML hardware,
– Neuromorphic,
– Hyper-dimensional, etc
|
Emerging ML Paradigms: Towards Cognitive Systems
Jan Rabaey, UC Berkeley, USA
– Autonomous sensor-control-actuation
– Model versus data driven
– Reinforcement learning
– Symbolic reasoning
– Probabilistic learning, graphs
|
Practical Case Studies: “Energy Efficient ML Applications for Metaverse”
Huichu Liu, Facebook, USA
– Overview of AR/VR system features and energy constraints
– Breakdown different applications running on AR/VR HW and its related ML algorithms
– HW techniques to enable energy efficient NN execution
– HW-SW techniques to enable efficient NN mapping
– Algorithm techniques for practical applications
– Future applications/challenges/research directions
|
 |