Learning backend cuda
Context
This MR is intended to add the forward implementations for Accuracy computation operators:
- And
 - ArgMax
 - ReduceSum
 
Modified files
- 
AndImpl.hpp,AndImpl_forward_kernels.hpp,AndImpl.cppandTests_AndImpl.cpp, add And forward impl; - 
ArgMaxImpl.hpp,ArgMaxImpl_forward_kernels.hpp,ArgMaxImpl.cppandTests_ArgMaxImpl.cpp, add ArgMaxforward impl; - 
ReduceSumImpl.hpp,ReduceSumImpl_forward_kernels.hpp,ReduceSumImpl.cppandTests_ReduceSumImpl.cpp, add ReduceSum forward impl; 
TODO
- 
And  - 
ArgMax  - 
ReduceSum