Skip to content

Add MobileNetV3 kernels to aidge_backend_cpu

This MR is linked to aidge#299 (closed).

This MR aims to add backend cpu kernel implementations of the missing operators required in MobileNet-v3 Small. The current ONNX model (https://huggingface.co/qualcomm/MobileNet-v3-Small/resolve/main/MobileNet-v3-Small.onnx) uses two operators not yet implemented in Aidge:

  • HardSigmoid
  • HardSwish

HardSwish - note

HardSwish is implemented as a meta-operator in the core framework (aidge_core/src/operator/MetaOperatorDefs/HardSwish.cpp), defined as x * HardSigmoid(x). No additional backend implementation is required as it leverages the existing Mul, Identity & HardSigmoid implementations through operator composition.

HardSigmoid - note

Modified files

  • HardSigmoidImpl.hpp (33 lines) - Main operator implementation header with CPU backend registration
  • HardSigmoidImpl_kernels.hpp (80 lines) - Forward and backward kernel implementations with Float32/Float64 type support
  • HardSigmoidImpl.cpp (56 lines) - Operator implementation source with forward/backward dispatch logic
  • Test_HardSigmoidImpl.cpp (198 lines) - Comprehensive unit tests covering multiple scenarios
  • cpu.hpp (line 42) - Main CPU backend header including HardSigmoid implementation

Detailed major modifications

Data Type Support

Kernel Implementation (HardSigmoidImpl_kernels.hpp)
  • Float32 Support: Full implementation with HardSigmoidImpl_cpu_forward_kernel<float, float> and HardSigmoidImpl_cpu_backward_kernel<float, float, float>
  • Float64 Support: Complete implementation with HardSigmoidImpl_cpu_forward_kernel<double, double> and HardSigmoidImpl_cpu_backward_kernel<double, double, double>
  • Type Registration: Automatic registration for both data types using REGISTRAR macro
  • Template Specialization: Efficient type casting and computation for both precision levels

Performance Considerations

Current Performance Status
  • Inference Time: Currently ~10-20x slower than ONNX Runtime
  • Optimization Required: Further kernel optimization needed for production deployment
  • SIMD Ready: Kernel structure supports potential OpenMP parallelization (commented pragma available)

Unit Testing Coverage (Test_HardSigmoidImpl.cpp)

Data Type Validation (5 test sections)
  • Float32 Testing: 1D, 2D, 4D tensor operations with default and custom parameters
  • Float64 Testing: 3D tensor precision validation with complex shapes
  • Precision Validation: Direct pointer comparison with tolerance (0.00001f/0.00001)
  • Multi-dimensional Support: Tests 1D, 2D, 3D, and 4D tensor operations
  • Edge Case Coverage: Tests saturation behavior (clamping to 0 and 1)

External References

Edited by matthieu marchal

Merge request reports

Loading