Hardware-Efficient Edge AI

    On-Line Class
    CET – Central European Time Zone

    Download One-Page Schedule Here

    Week 1: March 12-15, 2024

    Week 2: March 18-22, 2024

    Registration deadline: February 23, 2024
    Payment deadline: March 1, 2024

    registration

    TEACHING HOURS

    DAILY Central European Time CET Eastern Standard Time EST Pacific Standard Time PST India Standard Time IST
    Module 1 3:00-4:30 pm 9:00-10:30 am 6:00-7:30 am 7:30-9:00 pm
    Module 2 5:00-6:30 pm 11:00 am-12:30 pm 8:00-9:30 am 9:30-11:00 pm

    WEEK 1: March 12-15

    Tuesday, March 12

    3:00-4:30 pm Context: ML Applications, Scenario’s and Constraints for the Edge Marian Verhelst,
    KU Leuven
    5:00-6:30 pm Context: ML Algorithms and Resulting Challenges Marian Verhelst,
    KU Leuven

    Wednesday, March 13

    3:00-4:30 pm Algorithms: Neural Network Compression for the Edge Tijmen Blankevoort, Meta
    5:00-6:30 pm Algorithms: Neural Network Quantization for the Edge Tijmen Blankevoort, Meta

    Thursday, March 14

    3:00-4:30 pm HW, CPU: Specializing Processors for ML Luca Benini,
    Uni Bologna/ETHZ
    5:00-6:30 pm HW, CPU: From Single to Multi-Core Low-Power SoCs for ML Luca Benini,
    Uni Bologna/ETHZ

    Friday, March 15

    3:00-4:30 pm HW, Digital: Concepts Towards ML Acceleration Marian Verhelst,
    KU Leuven
    5:00-6:30 pm HW, Digital: Exploiting Quantization and Sparsity at the HW Level Marian Verhelst,
    KU Leuven

    WEEK 2: March 18-22

    Monday, March 18

    3:00-4:30 pm HW, Analog: Analog/Mixed-Signal Acceleration Naveen Verma,
    Princeton
    5:00-6:30 pm HW, Tech: Architectural Integration of Emerging Compute Models and Technologies Naveen Verma,
    Princeton

    Tuesday, March 19

    3:00-4:30 pm Tools: Accelerator Code Generation Tobias Grosser, University of Cambridge
    5:00-6:30 pm Tools: Landscape of DL Compilers Tobias Grosser, University of Cambridge

    Wednesday, March 20

    3:00-4:30 pm Emerging ML Paradigms: Neuro-Inspired Computing Jan Rabaey,
    UC Berkeley
    5:00-6:30 pm Emerging ML Paradigms: Towards Cognitive Systems Jan Rabaey,
    UC Berkeley

    Thursday, March 21

    3:00-4:30 pm System: Efficient Execution of Approximated AI Algorithms on Heterogeneous Edge AI Systems David Atienza,
    EPFL
    5:00-6:30 pm Use Cases: Application-Driven System Design and Optimization flow of Edge AI Use Cases in Industrial and Medical Domains David Atienza,
    EPFL

    Friday, March 22

    3:00-4:30 pm Practical Use Cases: Energy Efficient ML Applications for Metaverse Huichu Liu, Meta
    5:00-6:30 pm Cross-Layer Optimization Marian Verhelst,
    KU Leuven
    registration

    Scroll to Top


    Abstracts

    registration

    Hardware-Efficient Edge AI
    On-Line Class
    March 12-22, 2024

    Course Abstract

    Artificial intelligence workloads become more and more important for intelligent edge and extreme-edge devices, a trend also known as “tinyML”. Yet, these workloads come with significant computational complexity, making their execution until recently only feasible on power-hungry server or GPU platforms. Recently a surge of techniques at the algorithmic, hardware architecture and circuit domain are now also creating breakthroughs to enable ML workloads in real-time embedded devices at the edge and extreme edge under low energy and latency budget. Such system optimization, however, does require thorough knowledge and cross-layer optimization across all these fields, all the way from application and algorithm, over compilers and schedulers, to system design, macro-architecture design and circuit design. In this course, we will go over all these aspects of machine learning at the edge, with especially a deeper dive into hardware optimization opportunities. Enough time is also foreseen to discuss practical case studies and end-to-end-optimizations.

    ML Applications, Scenario’s and Constraints for the Edge
    Marian Ver
    helst, KU Leuven, Belgium

    – Overview of applications
    – Cloud vs Edge vs tinyML (extreme edge)
    – Inference vs learning vs federated learning
    – Application constraints and scenario’s
    – Flavors of ML and AI (types of models)

    ML Algorithms and Resulting Challenges
    Marian Verhelst, KU Leuven, Belgium

    Computational consequences and HW requirements for the EDGE of:
    – Probabilistic models
    – decision trees
    – SVM’s
    – NN’s (deep and non-deep; layer types, …)
    – NN training
    – Hyperdim computing?
    Challenges and requirements for efficient AI at the edge.

    Neural Network Compression for the Edge
    Tijmen Blankevoort, Meta, The Netherlands

    – Take any network, how can we make it smaller structurally?
    – Neural Network Pruning
    – Structured Compression
    – Neural Architecture Search as a compression method.

    Neural Network Quantization for the Edge
    Tijmen Blankevoort, Meta, The Netherlands

    – Quantization Introduction and Simulation
    – Quantization-aware training
    – Post-training quantization techniques
    – Mixed Precision.

    Specializing Processors for ML
    Luca Benini, Università di Bologna, Italy/ETHZ, Switzerland

    – Classical instruction set architectures (ISAs) limitations for ML
    – ISA Extensions for ML
    – Micro-architecture of ML-specialized cores
    – PPA (power performance area) optimization and implementation techniques.

    From Single to Multi-Core Low-Power SoCs for ML
    Luca Benini, Università di Bologna, Italy/ETHZ, Switzerland

    – Single-core ML SoCs – architecture, implementation, PPA analysis
    – Multi-core ML SoCs – architecture, implementation, PPA analysis
    – Integration of cores and Hardwired ML accelerators
    – Memory hierarchy: challenges and solutions.

    Concepts Towards ML Acceleration
    Marian Verhelst, KU Leuven, Belgium

    – ML models / CNN / DNN recap formalization; GeMM
    – GeMM on traditional CPU / GPU
    – Energy/latency losses and opportunities
    – Concepts towards more efficient ML acceleration on single core/single layer
    – Parallelization (spatial unrolling optimization)
    – Stationarity (temporal unrolling optimization)
    – Extending spatial and temporal unrolling to higher levels.

    Exploiting Quantization and Sparsity at the HW Level
    Marian Verhelst, KU Leuven, Belgium

    – Concepts towards more efficient AI acceleration on single core/single layer (ctu)
    – Sparse workloads
    – Quantization – analog domain
    [Optional: – Concepts towards more efficient AI acceleration on multi-core/multi-layer]. 

    Analog/Mixed-Signal Acceleration
    Naveen Verma, Princeton University, USA

    – Review of key ops to accelerate (MACs) and potential energy savings through analog
    – Overview of Approaches for MACs
    • Electronic (current, voltage, charge summing)
    • Optical
    – Overheads and limitations
    • Memory accessing -> motivates in-memory computing
    • Data conversion
    • Technology integration with digital engines/memory
    – Fundamental tradeoffs
    • Energy/throughput vs SNR
    – In-memory computing
    • Different memory techs and approaches.

    Architectural Integration of Emerging Compute Models and Technologies
    Naveen Verma, Princeton University, USA

    – Dataflow bottlenecks (weight/state loading)
    – Structured memory accessing and benefits of emerging memory
    – Co-design with trainers.

    Tools: Accelerator Code Generation
    Tobias Grosser, University of Cambridge, UK

    Abstract.

    Tools: Accelerator Code Generation
    Tobias Grosser, University of Cambridge, UK

    Abstract.

    Emerging ML Paradigms: Neuro-Inspired Computing
    Jan Rabaey, UC Berkeley, USA

    Lessons from the brain and what it means for:
    – ML hardware,
    – Neuromorphic,
    – Hyper-dimensional, etc.

    Emerging ML Paradigms: Towards Cognitive Systems
    Jan Rabaey, UC Berkeley, USA

    – Autonomous sensor-control-actuation
    – Model versus data driven
    – Reinforcement learning
    – Symbolic reasoning
    – Probabilistic learning, graphs

    Efficient Execution of Approximated AI Algorithms on Heterogeneous Edge AI Systems
    David Atienza, EPFL, Switzerland

    i. Major challenges in designing energy-efficient edge AI architectures due to the complexity of AI/CNN.
    ii. Design options to reduce complexity (pruning, quantization, etc.) and benefits of operating edge AI architectures at sub-nominal conditions.
    iii. New architectural design methodologies for edge AI systems, called Embedded Ensemble CNNs (E2CNNs) to conceive pruned CNNs and AI implementations with improved robustness against memory errors in pruned/quantized single-instance ML/CNNs.
    iv. Experimental evaluation of compression methods and design space exploration to produce an ensemble of CNNs for edge AI devices with the same memory requirements as the original architectures but improved error robustness (in different types of memories) for sub-threshold operation.

    Application-Driven System Design and Optimization flow of Edge AI Use Cases in Industrial and Medical Domains
    David Atienza, EPFL, Switzerland

    i. Overview of major key challenges in different industrial case studies for AI/ML systems (computation vs communication and other trade-offs to consider particularly for medical applications in the context of Big Data healthcare).
    ii. Different design option for AI/ML hardware systems using centralized vs. federated approaches on edge AI systems.
    iii. Mapping options for ULP multi-core embedded systems with neural network accelerators for energy-scalable software layers based on target applications.
    iv. Examples of next-generation of smart wearable devices in the healthcare context
    v. Examples of industrial edge AI systems for home automation.

    Practical Case Studies: “Energy Efficient ML Applications for Metaverse”
    Huichu Liu, Meta, USA

    – Overview of AR/VR system features and energy constraints
    – Breakdown different applications running on AR/VR HW and its related ML algorithms
    – HW techniques to enable energy efficient NN execution
    – HW-SW techniques to enable efficient NN mapping
    – Algorithm techniques for practical applications
    – Future applications/challenges/research directions

    Cross-Layer Optimization
    Marian Verhelst, KU Leuven, Belgium

    – Need for optimization across the stack cross layer design space exploration
    – Tools flows for cross layer optimization
    – Final Q&A and adjourn

    registration

    Scroll to Top


Search

Time Zone

  • Lausanne, Delft (CET)
  • Santa Cruz (PST)
  • New-York (EST)
  • India (IST)

Local Weather

Lausanne
3°
moderate rain
humidity: 89%
wind: 3m/s WNW
H 10 • L 3
3°
Mon
2°
Tue
0°
Wed
Weather from OpenWeatherMap