Post

[DynDNNs] Outline: Optimizing Runtime & System Level for Dynamic DNN

[DynDNNs] Outline: Optimizing Runtime & System Level for Dynamic DNN

Abstract

This outline presents the high-level research plan for accelerating dynamic DNN workloads at the runtime and system software levels.

1. Introduction

Dynamic DNNs—neural networks whose input shapes or structures change at runtime—pose unique challenges for both hardware and system software.
To address these, we will use Gemmini as the acceleration tool and llama.cpp as the execution platform, integrating and optimizing them for variable-shape inference.

2. Organization

  1. Gemmini Hardware Analysis (Acceleration Tool) — Analysis of the Gemmini accelerator’s microarchitecture and its key components.

  2. llama.cpp Framework Analysis (Execution Platform) — Review of the llama.cpp (GGML) inference engine’s architecture and inference pipeline.

  3. Research Progress Updates — Summary of ongoing work: porting, profiling, and optimization experiments.

This post is licensed under CC BY 4.0 by the author.