Add and modify operators to run the ConvNeXt onnx model
Context
While trying to export the ConvNeXt onnx model to cpp, we encountered several issues. First, several operators were missing, including transpose and erf. Additionally, we needed to add broadcasting support for operators like add, sub, mul, div, and matmul. For the convolution layers, we added the groups parameter. Lastly, batch normalization and global average pooling required debugging. Specifically, batch normalization had issues, and for global average pooling, there was a type error that resulted in incorrect values. ConvNeXt_export.onnx
Modified files
- added erf and transpose_diff and matching jinja files
- added convolution_groups to have the parameter groups (not to interfer with original convolution model)
- operator.py : add erf, transpose, convolution_groups operators
- pooling global : modified the type of sum for Average
- batchnorm : updated to perform a batchnorm and changed the parameters order
- modified matmul : for broadcasting support
- added transpose_diff (not to interfer with original model because i wasn't able to make it work)
Merge request reports
Activity
assigned to @mnewson
Good first MR!
For your information @mnewson I am working on adding unit test for CPP export. !24 (closed)
Will you be able to create some unit test on the operator you created?
If so, we can create issues on the operators to create and assign you to some
Cheers,
Cyril
Thank you! I can try to create unit tests for the operators I added. What should I use as a reference to compare the operator being tested? I used ONNX runtime to compare the results. I believe this is necessary in certain cases, like with
convolution_groups
, since thegroups
parameter isn’t currently available in the Aidge core.
changed milestone to %aidge v0.6.0
added Feature 🚀 StatusReview Ready labels
- Resolved by Olivier BICHLER
Note: This is not documented yet but @olivierbichler worked on a feature to skip transpose thanks to the memory manager, so this kernel should not be necessary.
Also what is transpose_diff? Expecially:
(not to interfer with original model because i wasn't able to make it work)
Did you find an issue with current Transpose?
- Resolved by Matthew Newson
- Resolved by Matthew Newson
- Resolved by Matthew Newson
- Resolved by Matthew Newson
- Resolved by Matthew Newson
- Resolved by Matthew Newson
- Resolved by Matthew Newson
Beware of include you used to debug the export!
Some typo need to be changed.
I did not review the content of the code as the MR is quite big.
I would be in favor of adding unit test on it.
Note: we need to update &2
22 53 const Input_T* __restrict inputs, 23 54 Output_T* __restrict outputs) 24 55 { 56 57 float inputs_ordered[NB_CHANNELS * CHANNELS_HEIGHT * CHANNELS_WIDTH]; 58 float outputs_unordered[NB_OUTPUTS * OUTPUTS_HEIGHT * OUTPUTS_WIDTH]; 59 reorder_NCHW_NHWC_pool(inputs, inputs_ordered, 1, NB_CHANNELS, CHANNELS_HEIGHT, CHANNELS_WIDTH, true); This was necessary to obtain the correct output in certain cases, because the format NHCW needs to be inverted. There was a similar case with the convolution which Clément Fisher corrected, if I remember correctly. Just so you know, I used the ONNX runtime to compare with the Aidge export, not the Aidge core.
@cmoineau Changes are needed or is that ok for you ?
We need to talk about it with @axelfarr @mick94 and @olivierbichler A lot of MR have been proposed for export cpp and we are too overwhelmed to respond right now.
hi @mnewson. We should be able to give you some guidelines for May 7. The goal is to have this feature in the next milestone...
mentioned in merge request !24 (closed)
added 4 commits
-
f06fd92b...45d12a67 - 2 commits from branch
eclipse/aidge:dev
- 5389bb9e - Merge remote-tracking branch 'origin/dev'
- 9fc76615 - Delete debug includes
-
f06fd92b...45d12a67 - 2 commits from branch
Be careful, you should think about putting an ifdef on the OPENMP pragmas.Indeed, if someone uses your code without the openmp option then the compilation will crash (#24 (closed))
Edited by Mickael GUIBERT- Resolved by Matthew Newson
added StatusChanges required label and removed StatusReview Ready label
requested review from @cmoineau
21 20 if (std::is_floating_point<Input_T>::value) 22 21 { 23 22 Input_T val = 0; 24 23 25 24 switch (ELEM_OP) { 26 25 case Add: { 71 } 72 } 73 int offsetIn0 = 0; 74 int offsetIn1 = 0; 75 int offsetOut = 0; 76 int nbMatrices = 1; 77 for(int i = 0; i<contiguousidx ; ++i){ 78 nbMatrices *= OUTPUT_DIMS[i]; 79 80 } 81 int dim = contiguousidx - 1; 82 for(int stack = 0; stack < nbMatrices;){ 83 for(int i = 0; i < output_contiguous_size; ++i){ 84 int in0_id = (input0_contiguous_size != 1) ? i : 0; 85 int in1_id = (input1_contiguous_size != 1) ? i : 0; 86 outputs[i + offsetOut*output_contiguous_size] = inputs1[in0_id + offsetIn0*input0_contiguous_size] + inputs2[in1_id + offsetIn1*input1_contiguous_size]; 1 1 {% filter indent(width=4, first=False) %} 2 2 {% include "./_mem_offset.jinja" %} 3 <<<<<<< HEAD 4 matmul_forward<{{name|upper}}_INPUT_A_DIMS, 5 {{name|upper}}_INPUT_B_DIMS, 6 {{name|upper}}_OUTPUT_DIMS, 7 {{name|upper}}_SIZE_DIM_IN_A, 8 {{name|upper}}_SIZE_DIM_IN_B, 9 {{name|upper}}_SIZE_DIM_OUT, 10 {{name|upper}}_ACTIVATION> 11 ({{in_name[0]}}, {{in_name[1]}}, {{out_name[0]}}); 12 {% include "./_save_outputs.jinja" %} 13 {% endfilter %} 14 ======= The MR is quite big with a lot of changes so it is hard to review everything.
You added a lot of kernels that are not registered:
- Add
- Mul
- Div
- Sub
- Transpose
- Matmul
elemwise.hpp could benefit from changing the switch case when doing the elemwise operation to avoid to duplicate the Broadcasting logic.
side note: A funny thing would be when registering the operator ElemWise to check if Broadcasting is required and if not use a simpler kernel @olivierbichler if you have ideas on how to do it with current ImplSpec.
Erf implementation is valid for me.
I haven't look at conv grouping implementation yet
FullyConnected and pooling implementation seems weird to "fix" an issue with data format. I cannot validate it with no unit test.
I didn't look at the softmax implementation but it is already done in: !36 (merged)
Overall
I wonder if it would be possible to break this MR in multiple ones?
- Adding broadcasting for elem wise operator
- Adding erf operator
- Adding Transpose operator
- Addign MatMul operator
- Update of Conv with grouping
And then for each one adding unit test that prove the implementatiosn are correct.
As proposed by @pierregaillard Adding the convNext onnx to our hugging face and creating an integration test on it in order to make sure we don't have regression on it would be nice.
changed milestone to %aidge v0.7.0
@gallasko FYI
This MR has been split in !46 (merged), !47 (merged) and !48 (merged). Closing.