Add MobileNetV3 kernels to aidge_backend_cpu
This MR is linked to aidge#299 (closed).
This MR aims to add backend cpu kernel implementations of the missing operators required in MobileNet-v3 Small. The current ONNX model (https://huggingface.co/qualcomm/MobileNet-v3-Small/resolve/main/MobileNet-v3-Small.onnx) uses two operators not yet implemented in Aidge:
- HardSigmoid
- HardSwish
HardSwish - note
HardSwish is implemented as a meta-operator in the core framework (aidge_core/src/operator/MetaOperatorDefs/HardSwish.cpp
), defined as x * HardSigmoid(x)
. No additional backend implementation is required as it leverages the existing Mul, Identity & HardSigmoid implementations through operator composition.
HardSigmoid - note
Modified files
-
HardSigmoidImpl.hpp
(33 lines) - Main operator implementation header with CPU backend registration -
HardSigmoidImpl_kernels.hpp
(80 lines) - Forward and backward kernel implementations with Float32/Float64 type support -
HardSigmoidImpl.cpp
(56 lines) - Operator implementation source with forward/backward dispatch logic -
Test_HardSigmoidImpl.cpp
(198 lines) - Comprehensive unit tests covering multiple scenarios -
cpu.hpp
(line 42) - Main CPU backend header including HardSigmoid implementation
Detailed major modifications
Data Type Support
HardSigmoidImpl_kernels.hpp
)
Kernel Implementation (-
Float32 Support: Full implementation with
HardSigmoidImpl_cpu_forward_kernel<float, float>
andHardSigmoidImpl_cpu_backward_kernel<float, float, float>
-
Float64 Support: Complete implementation with
HardSigmoidImpl_cpu_forward_kernel<double, double>
andHardSigmoidImpl_cpu_backward_kernel<double, double, double>
-
Type Registration: Automatic registration for both data types using
REGISTRAR
macro - Template Specialization: Efficient type casting and computation for both precision levels
Performance Considerations
Current Performance Status
- Inference Time: Currently ~10-20x slower than ONNX Runtime
- Optimization Required: Further kernel optimization needed for production deployment
- SIMD Ready: Kernel structure supports potential OpenMP parallelization (commented pragma available)
Test_HardSigmoidImpl.cpp
)
Unit Testing Coverage (Data Type Validation (5 test sections)
- Float32 Testing: 1D, 2D, 4D tensor operations with default and custom parameters
- Float64 Testing: 3D tensor precision validation with complex shapes
- Precision Validation: Direct pointer comparison with tolerance (0.00001f/0.00001)
- Multi-dimensional Support: Tests 1D, 2D, 3D, and 4D tensor operations
- Edge Case Coverage: Tests saturation behavior (clamping to 0 and 1)
External References
Edited by matthieu marchal