Enabling Embedded Neural Network Processing

    On-Line Class
    CET – Central European Time Zone

    Download One-Page Schedule Here

    February 3-7, 2025

    Registration deadline: Extended to January 27, 2025
    Payment deadline: Extended to January 30, 2025

    registration

    TEACHING HOURS

    DAILY Central European Time CET Eastern Standard Time EST Pacific Standard Time PST India Standard Time IST
    Module 1 3:00-4:30 pm 9:00-10:30 am 6:00-7:30 am 7:30-9:00 pm
    Module 2 5:00-6:30 pm 11:00-12:30 am 8:00-9:30 am 9:30-11:00 pm

     

    Monday, February 3

    3:00-6:30 pm Neural Network Introduction and Model Techniques Tijmen Blankevoort, Meta

    Tuesday, February 4

    3:00-6:30 pm Custom Hardware Accelerators and Scheduling Techniques Marian Verhelst, KU Leuven & IMEC

    Wednesday, February 5

    3:00-6:30 pm RISC-V and Multi-Core Architectures Luca Benini, ETHZ/Uni Bologna

    Thursday, February 6

    3:00-6:30 pm Compiler Implications Tobias Grosser, UC Cambridge

    Friday, February 7

    3:00-6:30 pm System Integration and Applications David Atienza, EPFL
    registration

    Scroll to Top


    Abstracts

    Enabling Embedded Neural Network Processing
    On-Line Class
    February 3-7, 2025

    While neural networks are omnipresent in cloud scenarios already, there recently is a steep rise of deployment of inferencing tasks in edge and extreme edge devices, such as cars, drones, phones, glasses and wearable medical devices. While such decentralized deployment brings advantages in terms of privacy, response time and reliability, it comes with significant technical challenges. The stringent latency requirements, scarce memory budget and limited energy availability in edge systems, demands a thorough optimization of hardware and software across the full deployment stack. This intensive course will dive deeply into the different optimization strategies across the stack, ranging from algorithmic techniques, over custom hardware architectures, to compiler implications and application-specific system optimizations. Each topic will be covered by a different expert in the field, building on top of recent state-of-the-art research.

    Neural Network Introduction and Model Techniques
    Tijmen Blankevoort, Meta

    Abstract.

    Custom Hardware Accelerators and Scheduling Techniques
    Marian Verhelst, KU Leuven & IMEC

    Neural networks cannot be executed efficiently on CPU or microprocessor. Over the last decade, a myriad of optimized hardware architectures have therefore been proposed to execute these workloads at high throughput and energy efficiency in customized accelerators or GPU extensions.  While the field is very diverse, we will see that all implementations all rely on a few common architectural concepts and scheduling techniques, including spatial/temporal unrolling and fusion. We will discuss these techniques in depth, and illustrate them with many SotA examples from recent literature. Finally, we will discuss how to model these concepts at a high level, to enable rapid design space exploration across architectures.

    RISC-V and Multi-Core Architectures
    Luca Benini, ETHZ/Uni Bologna

    This lecture will cover  low-power instruction processors for NN workloads, with a focus on energy efficiency. The open RISC-V instruction set architecture (ISA) will be used as baseline for processor design and extensions.  Several key ideas in extending the ISA to improve NN execution efficiency will be covered in details, moving from general techniques, such hardware loops and complex addressing modes, to increasingly domain specific improvements, such as  mixed-precision SIMD and ternary operations. Vector and tensor instruction extensions will also be discussed.  The implications of ISA extension on micro-architecture and hardware implementation will be discussed in depth, with example from several silicon prototypes and products. Techniques to boost performance at high energy efficiency through parallel execution in tightly coupled processor clusters will also be covered, stressing the importance of efficient access to shared memory, synchronization and describing advanced hardware and software design techniques to minimize efficiency losses in parallel architectures.

    Compiler Implications
    Tobias Grosser, UC Cambridge

    Abstract.

    System Integration and Applications
    David Atienza, EPFL

    There are major challenges in designing energy-efficient edge AI architectures due to the complexity of AI/CNN methods today. As a result, there is a new generation of design flows that target to reduce the complexity of traditional approaches to conceive smaller edge AI systems (pruning, quantization, etc.) while benefiting from AI hardware operating at sub-nominal conditions, such as Ensemble CNNs (E2CNNs). E2CNN will be presented in this module to design ultra-low power (ULP) and resource-efficient edge AI systems targeting real-life applications. These optimized edge AI systems will have the exact memory requirements as the original AI/ML designs but improved error robustness (in different types of memories) for sub-threshold operation. Finally, this module will discuss how such E2CNN-based edge AI systems can be enhanced by including different neural network accelerators for energy-scalable software execution according to the requirements of the target domain. In particular, this module will present different real-life industrial-edge AI systems in the areas of smart wearables and home automation.

    registration

    Scroll to Top


Search

Time Zone

  • Lausanne, Delft (CET)
  • Santa Cruz (PST)
  • New-York (EST)
  • India (IST)

Local Weather

Lausanne
6°
moderate rain
humidity: 81%
wind: 2m/s WSW
H 7 • L 4
7°
Thu
8°
Fri
9°
Sat
Weather from OpenWeatherMap