Add MobileNetV3 kernels to aidge_backend_cpu (!197) · Merge requests · Eclipse Projects / aidge / aidge_backend_cpu

matthieu marchal requested to merge mmarchal/aidge_backend_cpu:mm/feature/mobilenetv3-cpu-kernels into dev Jul 31, 2025

This MR is linked to aidge#299 (closed).

This MR aims to add backend cpu kernel implementations of the missing operators required in MobileNet-v3 Small. The current ONNX model (https://huggingface.co/qualcomm/MobileNet-v3-Small/resolve/main/MobileNet-v3-Small.onnx) uses two operators not yet implemented in Aidge:

HardSigmoid
HardSwish

HardSwish - note

HardSwish is implemented as a meta-operator in the core framework (aidge_core/src/operator/MetaOperatorDefs/HardSwish.cpp), defined as x * HardSigmoid(x). No additional backend implementation is required as it leverages the existing Mul, Identity & HardSigmoid implementations through operator composition.

HardSigmoid - note

Modified files

HardSigmoidImpl.hpp (33 lines) - Main operator implementation header with CPU backend registration
HardSigmoidImpl_kernels.hpp (80 lines) - Forward and backward kernel implementations with Float32/Float64 type support
HardSigmoidImpl.cpp (56 lines) - Operator implementation source with forward/backward dispatch logic
Test_HardSigmoidImpl.cpp (198 lines) - Comprehensive unit tests covering multiple scenarios
cpu.hpp (line 42) - Main CPU backend header including HardSigmoid implementation

Detailed major modifications

Data Type Support

Kernel Implementation (`HardSigmoidImpl_kernels.hpp`)

Float32 Support: Full implementation with HardSigmoidImpl_cpu_forward_kernel<float, float> and HardSigmoidImpl_cpu_backward_kernel<float, float, float>
Float64 Support: Complete implementation with HardSigmoidImpl_cpu_forward_kernel<double, double> and HardSigmoidImpl_cpu_backward_kernel<double, double, double>
Type Registration: Automatic registration for both data types using REGISTRAR macro
Template Specialization: Efficient type casting and computation for both precision levels

Performance Considerations

Current Performance Status

Inference Time: Currently ~10-20x slower than ONNX Runtime
Optimization Required: Further kernel optimization needed for production deployment
SIMD Ready: Kernel structure supports potential OpenMP parallelization (commented pragma available)

Unit Testing Coverage (`Test_HardSigmoidImpl.cpp`)

Data Type Validation (5 test sections)

Float32 Testing: 1D, 2D, 4D tensor operations with default and custom parameters
Float64 Testing: 3D tensor precision validation with complex shapes
Precision Validation: Direct pointer comparison with tolerance (0.00001f/0.00001)
Multi-dimensional Support: Tests 1D, 2D, 3D, and 4D tensor operations
Edge Case Coverage: Tests saturation behavior (clamping to 0 and 1)

External References

Edited Aug 06, 2025 by matthieu marchal

Add MobileNetV3 kernels to aidge_backend_cpu