Upd 2D Conv[DepthWise] kernels
Context
Part of #31
Update 2D Convolution kernels Conv
and ConvDethWise
Operators to perform faster computation.
Protocol
Tests were made on x86-64
CPU architecture. Code was compiled with GCC 9.4.0
/ Clang 10.0.0
compilers. The number of trials was 50 with no warm-up and the time measured was the CPU time using clock()
function from the ctime
library.
My results are presented measuring the value new_kernel_time / old_kernel_time
.
I oberved that time could vary up to 20% between two sets of trials spaced in time.
Convolution
Each trial is based on a convolution with the following parameters:
Conv2D(in_channels = 16,
out_channels = 16,
kernels = [3,3],
stride = [1,1],
dilation = [1,1])
input = Tensor([8,16,64,64]) # shape
Then, a single parameter is tweaked from this base convolution:
- kernel size
- dilation size
- stride size
- number of inpt channels
- number of output channels
- number of batchs
- size of each dimensions of the feature map
Additionaly, a comparison was also performed with parameters that appeared the least favorable in the new convolution implementation. Even though this case is rare, I found it relevant to show. It is titled "special" in the results.
Conv2D(in_channels = 16,
out_channels = 16,
kernels = [5,5],
stride = [3,3],
dilation = [2,2])
Convolution depth-wise
The parameters and protocole are the same as for convolution. The only exception is for in_channels
and out_channels
that are merged into a single parameter channels
.