Add registry system to operators
Context
This MR refactors all operator implementations to use the Registrar system (similar to backend_cpu) instead of manual instantiation and type dispatch.
Previously, each operator implementation contained manual if/else dispatch on DataType to call templated forward_ and backward_ functions.
With this refactor:
- Operators are now defined as type aliases of a generic
OperatorImpl_cuda<Op,...>template. -
dtype-specifickernels are moved into dedicated*_CUDA_kernels.hppfiles. - Each kernel is registered once via the
REGISTRARmacro for supported data types(It can also be specific to data format, specific attributes values ...).
It resolves #35 (moved).
Edited by Houssem ROUIS