Skip to content
Snippets Groups Projects

Add flash support and updated benchmark.py and pyproject.toml

Context

- Edit the 'flash' directory to handle firmware flashing and UART output capture.

- Edit benchmark.py , add comput_output.

-Updated pyproject.toml to include pyocd and pyserial as dependencies for flashing support and add json fils in pip package

Modified files

  • benchmark.py
  • add flash folder
  • pyproject.toml

Detailed major modifications

With these modifications, we now have a functional benchmark.py with a working compute_output function: running benchmark.py → retrieving inputs → compilation → creation of the benchmark_export_arm folder → retrieving the .elf file → flashing the board → collecting data → processing data → comparison.

I ran the test for the Relu operator.

TODO

  • Add the measure_inference_time in the benchmark.py

  • Test with another operator

  • NOT DONE
  • DONE
  • TO DO

Merge request reports

Merge request pipeline #72420 passed

Merge request pipeline passed for 9f2873a2

Test coverage 0.00% (0.00%) from 1 job
Approval is optional
Code Quality is loading
Test summary results are being parsed

Merged by Cyril MoineauCyril Moineau 5 months ago (May 8, 2025 7:47pm UTC)

Merge details

  • Changes merged into dev with 8d2cd9a5 (commits were squashed).
  • Deleted the source branch.

Pipeline #72421 passed

Pipeline passed for 8d2cd9a5 on dev

Test coverage 0.00% (0.00%) from 1 job

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Hello @cmoineau, @wboussella this is a first advanced version of the benchmark. So far, I’ve successfully run an end-to-end test and obtained results with the ReLU operator. However, I’m encountering compilation issues when working with larger tensor sizes — for example, the tests pass for dims size 1 and 4, but starting from dims size 16, problems occur (as shown in the attached files). I plan to address these issues later, as my current focus is to finalize the work properly by integrating the measure_inference_time function.

    image.png

  • Cyril Moineau requested review from @cmoineau

    requested review from @cmoineau

  • it's seems that the dtcram is overflowed. I think that there is three reasons :

    • First one is that the given input generated by benchmark.py from the json configuration file is way too big for the STM32h7 [16,16,16,16]
    • Second, even if the inputs fits in the stm32, the attribute .nn_data is not declared for read only FLASH memory in the flash.ld
    • Third, the mem attribute in forward.cpp need also to be declared in a bigger memory like RAM_D1 which size is 512kb instead of the default DTCRAM (128kb)
    • The DTCMRAM overflow was caused by the size of the static array generated in forward.cpp "static float mem[65536];". This array was automatically placed in the .bss section.

      I explored two approaches to resolve this issue:

      • Automatic placement in FLASH using const :

        I modified the .jinja template to add the const qualifier to the array. This way, the linker automatically places the variable in FLASH. However, this solution is not ideal, since the array, although declared as const, is later cast and modified during execution. This goes against best practices and may lead to undefined behavior.

      • Using a dedicated memory section:

        The second, cleaner solution leverages the mechanism already designed within the framework, i added the mem_section attribute to the ExportLibAidgeARM class to explicitly define the memory section during export (scheduler_export)

      For my tests, I used the custom section .nn_buffer_d1, which redirects buffers to RAM_D1 (512 KB available).

      The second approach is preferable as it aligns with the intended architecture of the code and provides greater flexibility for memory allocation management. However, we should clearly define which memory section should be used for this type of variable in future developments.

      Edited by Racim Boumbar
    • Please register or sign in to reply
  • FYI @gallasko @cguillon : operator benchmarking on device

  • Racim Boumbar added 1 commit

    added 1 commit

    • 07a03a59 - Improve Cortexm Benchmark ( inference_time, flash an capture uart )

    Compare with previous version

  • Cyril Moineau
  • Cyril Moineau
  • Overall the MR looks good!

    Changes are required:

    • adapt to Sphinx style documentation
    • use type hint functionnality for functions
    • docstring is often not informative, it is a good point that you tried to add it but then it need to describe what the function does better than what you can guess by the function name and parameters, this means describing handling of different cases and going more in depth when describing things (example: "do x using a config files", yes we know reading the function that one param is config_file but what extension is it what should the config file contains... all these informations are lacking right now and so the current docstring is as good as non existant)
    • Remove print usage for logging and instead use aidge logging functions. prints are fine when the goal of the function is to print however (for example in hte benchmark scripts)
  • Cyril Moineau requested changes

    requested changes

  • Racim Boumbar added 1 commit

    added 1 commit

    Compare with previous version

  • I have implemented all the changes mentioned in the MR.

  • Racim Boumbar requested review from @cmoineau

    requested review from @cmoineau

  • Cyril Moineau
  • Racim Boumbar added 1 commit

    added 1 commit

    Compare with previous version

  • Racim Boumbar added 1 commit

    added 1 commit

    Compare with previous version

  • Racim Boumbar requested review from @cmoineau

    requested review from @cmoineau

  • @rboumbar I will test tomorrow your work on a STM32 before aceepting this MR!

  • Racim Boumbar added 1 commit

    added 1 commit

    Compare with previous version

  • When trying to run pyocs on the server i ran into this error:

    $ pyocs list
    No available debug probes are connected

    However when checking connected usb devices with lsusb i could see the STM32.

    I found the solution here: https://www.tombroughton.me/2020/02/using-the-stm32-st-link-programmer-updating-the-firmware

    To fix this I’ve created a udev rule `/etc/udev/rules.d/50-st-link.rules

    SUBSYSTEM=="usb", ATTR{idVendor}=="0483", ATTR{idProduct}=="374b", MODE="0666"

    Then I reloaded udev using the commands:

    sudo udevadm control --reload-rules
    sudo udevadm trigger

    And now pyocs list work fine:

    $ pyocd list
      #   Probe/Board     Unique ID                  Target           
    ------------------------------------------------------------------
      0   STM32 STLink    066FFF484851897767012341   ✔︎ stm32h743zitx  
          NUCLEO-H743ZI                                    ****

    We should update README with this possible error and the fix :)

  • 30 {{ outputs_dtype[o] }}* {{ outputs_name[o] }} = NULL;
    31 {% endfor %}
    32 uint32_t start;
    33 uint32_t end;
    34 size_t NB_WARMUP = {{ nb_warmup }};
    35 size_t NB_ITERATIONS = {{ nb_iterations }};
    36 double times[{{ nb_iterations }}] = {0};
    37
    38 // Warm-up phase
    39 for (int i = 0; i < NB_WARMUP; ++i) {
    40 {{ func_name }}({{ inputs_name|join(", ") }}{% if inputs_name %}, {% endif %}&{{ outputs_name|join(", &") }});
    41 }
    42
    43 // Timed measurements
    44
    45 for (size_t i = 0; i < NB_ITERATIONS; ++i) {
    • Resolved by Cyril Moineau

      I encounter an error due to the arm compiler not being present on my machine.

      1. The error was not clear that I needed to look at the log file generated inside the export.
      2. The makefile propose to compile using Docker which was not available with the current script.

      I will create a MR to fix these two points once this MR is merged!

      Edited by Cyril Moineau
  • When trying to flash the card I got the error:

      ├─Flashing and capturing attempt 1/5
    [ERROR] - Error connecting to serial port: [Errno 13] could not open port /dev/ttyACM0: [Errno 13]
    [ERROR]   Permission denied: '/dev/ttyACM0'

    Which I solved following the solution provided in https://forum.arduino.cc/t/permission-denied-on-dev-ttyacm0/475568/15 by adding the USER to the dialout group.

    sudo usermod -a -G dialout $USER

    This should also figure in the README :smile:

    Edited by Cyril Moineau
  • Racim Boumbar added 1 commit

    added 1 commit

    Compare with previous version

  • Cyril Moineau
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading