Skip to content
Snippets Groups Projects

Draft: [issue 24] Refactoring kernels

3 unresolved threads

Context

In order to optimize ONNX operands for ARM embedded target it is necessary to refactor them. According to this related issue the following kernels have been reworked :

  • Add
  • Atan
  • BatchNorm
  • Concat
  • Div
  • MatMul
  • Mul
  • ReLU
  • Reshape
  • Sigmoid
  • Slice
  • Softmax
  • Sub

Modifications

  1. For the kernels mentioned bellow the associated aidge_kernel.h has been creating. Those files use Template features and if needed factorise conditional loop in order to lighten MCU's works.

  2. All the new implementations results have been compared and validated with python numpy and legacy kernels as references

  3. Standalone benchmark between legacy kernels and refatored kernels to measure execution time delta has been done on STM32H7 cortex M7 DISCO board.

TODO

It could be pertinent to add a python script that can be integrated to this Add Maxence benchmark ticket url to make those benchmark more scalable and faster to do if more kernels needs to be reworks in the future

Merge request reports

Merge request pipeline #67334 failed

Merge request pipeline failed for bb81ff6e

Closed by Cyril MoineauCyril Moineau 3 months ago (Jun 3, 2025 12:03pm UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
1 template <
2 typename T,
3 typename Dim_T,
4 typename Size_T
5 >
6 __attribute__((always_inline)) inline static
7 void aidge_matmul(T* __restrict input_a,
8 T* __restrict input_b,
9 T* __restrict output,
10 Dim_T* __restrict dim_a,
  • All dimensions parameters should be template parameters, so that all dimensions computation logic is optimized aways at compilation. That is the spirit of the C++ export. Same for the other operators.

  • Please register or sign in to reply
  • 3 template <
    4 typename T,
    5 typename MeanVar_T,
    6 typename ScaleBias_T,
    7 typename SpatialDims_T,
    8 unsigned int NB_Channels,
    9 unsigned int NB_SpatialDims
    10 >
    11 __attribute__((always_inline)) inline static
    12 void aidge_batchnorm(T* __restrict inputs,
    13 T* __restrict outputs,
    14 MeanVar_T* __restrict input_mean,
    15 MeanVar_T* __restrict input_var,
    16 ScaleBias_T* __restrict scale,
    17 ScaleBias_T* __restrict bias,
    18 SpatialDims_T* __restrict spatial_dims,
  • 1 #include <math.h>
    2
    3 #define MAX_DIMS_AXIS_SIZE 128 /** TODO : is 128 enough or to big ? | Other possibility is to use a shared buffer as param, but this could have a side effect on Aidge's overall mechanics **/
    4 float exps[MAX_DIMS_AXIS_SIZE];
    5
    6 template <
    7 typename T,
    8 typename Dim_T,
    9 typename Size_T
    10 >
    11 __attribute__((always_inline)) inline static
    12 void aidge_softmax(T* __restrict input,
    13 T* __restrict output,
    14 Dim_T* __restrict dims,
  • Please register or sign in to reply
    Loading