Skip to content

Upd 2D Conv[DepthWise] kernels

Maxence Naud requested to merge upd_improve-conv-kernel into dev

Context

Part of #31

Update 2D Convolution kernels Conv and ConvDethWise Operators to perform faster computation.

Protocol

Tests were made on x86-64 CPU architecture. Code was compiled with GCC 9.4.0 / Clang 10.0.0 compilers. The number of trials was 50 with no warm-up and the time measured was the CPU time using clock() function from the ctime library.

My results are presented measuring the value new_kernel_time / old_kernel_time.

I oberved that time could vary up to 20% between two sets of trials spaced in time.

Convolution

Each trial is based on a convolution with the following parameters:

Conv2D(in_channels = 16,
       out_channels = 16,
       kernels = [3,3],
       stride = [1,1],
       dilation = [1,1])

input = Tensor([8,16,64,64]) # shape

Then, a single parameter is tweaked from this base convolution:

  • kernel size
  • dilation size
  • stride size
  • number of inpt channels
  • number of output channels
  • number of batchs
  • size of each dimensions of the feature map

Additionaly, a comparison was also performed with parameters that appeared the least favorable in the new convolution implementation. Even though this case is rare, I found it relevant to show. It is titled "special" in the results.

Conv2D(in_channels = 16,
       out_channels = 16,
       kernels = [5,5],
       stride = [3,3],
       dilation = [2,2])

Convolution depth-wise

The parameters and protocole are the same as for convolution. The only exception is for in_channels and out_channels that are merged into a single parameter channels.

Results

Conv

Conv_imp_GCC_9-4-0

Conv_imp_Clang_10-0-0

ConvDepthWise

ConvDepthWise_imp_GCC_9-4-0

ConvDepthWise_imp_Clang_10-0-0

Edited by Maxence Naud

Merge request reports

Loading