Skip to content
Snippets Groups Projects

Add and modify operators to run the ConvNeXt onnx model

Closed Matthew Newson requested to merge mnewson/aidge_export_cpp:main into dev
8 unresolved threads

Context

While trying to export the ConvNeXt onnx model to cpp, we encountered several issues. First, several operators were missing, including transpose and erf. Additionally, we needed to add broadcasting support for operators like add, sub, mul, div, and matmul. For the convolution layers, we added the groups parameter. Lastly, batch normalization and global average pooling required debugging. Specifically, batch normalization had issues, and for global average pooling, there was a type error that resulted in incorrect values. ConvNeXt_export.onnx

Modified files

  • added erf and transpose_diff and matching jinja files
  • added convolution_groups to have the parameter groups (not to interfer with original convolution model)
  • operator.py : add erf, transpose, convolution_groups operators
  • pooling global : modified the type of sum for Average
  • batchnorm : updated to perform a batchnorm and changed the parameters order
  • modified matmul : for broadcasting support
  • added transpose_diff (not to interfer with original model because i wasn't able to make it work)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Beware of include you used to debug the export!

    Some typo need to be changed.

    I did not review the content of the code as the MR is quite big.

    I would be in favor of adding unit test on it.

    Note: we need to update &2

  • Cyril Moineau requested changes

    requested changes

  • 22 53 const Input_T* __restrict inputs,
    23 54 Output_T* __restrict outputs)
    24 55 {
    56
    57 float inputs_ordered[NB_CHANNELS * CHANNELS_HEIGHT * CHANNELS_WIDTH];
    58 float outputs_unordered[NB_OUTPUTS * OUTPUTS_HEIGHT * OUTPUTS_WIDTH];
    59 reorder_NCHW_NHWC_pool(inputs, inputs_ordered, 1, NB_CHANNELS, CHANNELS_HEIGHT, CHANNELS_WIDTH, true);
  • Cyril Moineau mentioned in merge request !24 (closed)

    mentioned in merge request !24 (closed)

  • Matthew Newson added 4 commits

    added 4 commits

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • Be careful, you should think about putting an ifdef on the OPENMP pragmas.Indeed, if someone uses your code without the openmp option then the compilation will crash (#24 (closed))

    Edited by Mickael GUIBERT
  • added 1 commit

    Compare with previous version

  • added 1 commit

    • f74ec2c3 - Add ifdef pragma or delete unneeded pragma

    Compare with previous version

  • Matthew Newson requested review from @cmoineau

    requested review from @cmoineau

  • 21 20 if (std::is_floating_point<Input_T>::value)
    22 21 {
    23 22 Input_T val = 0;
    24
    23
    25 24 switch (ELEM_OP) {
    26 25 case Add: {
    • You have updated elemwise.hpp and each elemwise kernel individually add.hpp etc ..

      But looking at operator.py you are only using elemwise can you remove the other files?

    • Please register or sign in to reply
  • 71 }
    72 }
    73 int offsetIn0 = 0;
    74 int offsetIn1 = 0;
    75 int offsetOut = 0;
    76 int nbMatrices = 1;
    77 for(int i = 0; i<contiguousidx ; ++i){
    78 nbMatrices *= OUTPUT_DIMS[i];
    79
    80 }
    81 int dim = contiguousidx - 1;
    82 for(int stack = 0; stack < nbMatrices;){
    83 for(int i = 0; i < output_contiguous_size; ++i){
    84 int in0_id = (input0_contiguous_size != 1) ? i : 0;
    85 int in1_id = (input1_contiguous_size != 1) ? i : 0;
    86 outputs[i + offsetOut*output_contiguous_size] = inputs1[in0_id + offsetIn0*input0_contiguous_size] + inputs2[in1_id + offsetIn1*input1_contiguous_size];
    • You could do your switch case here to avoid redundancy in code

      switch (ELEM_OP) {
                  case Add: {
                              outputs[i + offsetOut*output_contiguous_size] = inputs1[in0_id + offsetIn0*input0_contiguous_size] + inputs2[in1_id + offsetIn1*input1_contiguous_size];
      
    • Please register or sign in to reply
  • 7 19 ({{in_name[0]}}, {{in_name[1]}}, {{out_name[0]}}, {{name|upper}}_RESCALING);
    8 20 {% include "./_save_outputs.jinja" %}
    9 21 {% endfilter %}
    22 >>>>>>> origin/dev
  • 1 1 {% filter indent(width=4, first=False) %}
    2 2 {% include "./_mem_offset.jinja" %}
    3 <<<<<<< HEAD
    4 matmul_forward<{{name|upper}}_INPUT_A_DIMS,
    5 {{name|upper}}_INPUT_B_DIMS,
    6 {{name|upper}}_OUTPUT_DIMS,
    7 {{name|upper}}_SIZE_DIM_IN_A,
    8 {{name|upper}}_SIZE_DIM_IN_B,
    9 {{name|upper}}_SIZE_DIM_OUT,
    10 {{name|upper}}_ACTIVATION>
    11 ({{in_name[0]}}, {{in_name[1]}}, {{out_name[0]}});
    12 {% include "./_save_outputs.jinja" %}
    13 {% endfilter %}
    14 =======
  • 1 {#- For name header -#}
  • 6 6
    7 7 // Generic function for matmul and activation
    8 8
    9 template<int M,
    10 int K,
    11 int N,
    9 template<int INPUT_A_DIMS[], int INPUT_B_DIMS[], int OUTPUT_DIMS[],
  • The MR is quite big with a lot of changes so it is hard to review everything.


    You added a lot of kernels that are not registered:

    • Add
    • Mul
    • Div
    • Sub
    • Transpose
    • Matmul

    elemwise.hpp could benefit from changing the switch case when doing the elemwise operation to avoid to duplicate the Broadcasting logic.

    side note: A funny thing would be when registering the operator ElemWise to check if Broadcasting is required and if not use a simpler kernel @olivierbichler if you have ideas on how to do it with current ImplSpec.


    Erf implementation is valid for me.


    I haven't look at conv grouping implementation yet


    FullyConnected and pooling implementation seems weird to "fix" an issue with data format. I cannot validate it with no unit test.


    I didn't look at the softmax implementation but it is already done in: !36 (merged)


    Overall

    I wonder if it would be possible to break this MR in multiple ones?

    • Adding broadcasting for elem wise operator
    • Adding erf operator
    • Adding Transpose operator
    • Addign MatMul operator
    • Update of Conv with grouping

    And then for each one adding unit test that prove the implementatiosn are correct.

    As proposed by @pierregaillard Adding the convNext onnx to our hugging face and creating an integration test on it in order to make sure we don't have regression on it would be nice.

  • Cyril Moineau requested changes

    requested changes

  • changed milestone to %aidge v0.7.0

  • This MR has been split in !46 (merged), !47 (merged) and !48 (merged). Closing.

  • closed

  • Please register or sign in to reply
    Loading