Refactoring Tensor

changed milestone to %v0.1.0

added 1 deleted label

requested review from @pineapple and @olivierbichler

changed the description

added 1 commit

963e131d - [WIP][NF][BUILD] ongoing debugging memory corruptions

Compare with previous version

added 29 commits

963e131d...51af88a0 - 27 commits from branch main
0b58b387 - [MERGE][master] updating from main
fc8dff77 - [WIP][NF][BUILD] adding strict aliasing compiler option (the warning option...

Compare with previous version

One general comment: I think you are trying to change maybe too deeply the initial concept of Tensor and TensorImpl. TensorImpl was supposed to be a very thin object that would provide device data storage and access, according to the specified dimensions of Tensor. In this vision, Tensor would not just be any view on storage, but would specify how the data is stored, with TensorImpl handling only the actual memory allocation.

That's the reason why we discussed the idea of having an additionnal TensorView object, that would be a particular view on an existing Tensor, independently on the Tensor's implementation (TensorImpl). Tensor is garanteed to be contiguous in memory, while TensorView is not.

This is what I suggested:

classDiagram
  Data<|--Tensor
  Data<|--TensorView
  TensorView-->Tensor
  Tensor-->TensorImpl_CPU
  Tensor-->TensorImpl_CUDA
  Tensor-->TensorImpl_Eigen
  class Tensor{
     +vector<int> actualDims
     +TensorView getView()
  }
  class TensorView{
     +vector<int> dims
     +vector<int> strides
  }
  class TensorImpl_CPU{
     +T* data
  }
  class TensorImpl_CUDA{
     +T* deviceData
  }
  class TensorImpl_Eigen{
     +T* eigenData
  }

Your vision would be like this (correct it if I am wrong):

classDiagram
  Data<|--Tensor
  Tensor-->TensorImpl_CPU
  Tensor-->TensorImpl_CUDA
  Tensor-->TensorImpl_Eigen
  class Tensor{
     +vector<int> dims
     +vector<int> strides
     +Tensor getView()
  }
  class TensorImpl_CPU{
     +T* data
     +vector<int> actualDims
     +vector<int> actualStrides
  }
  class TensorImpl_CUDA{
     +T* deviceData
     +vector<int> actualDims
     +vector<int> actualStrides
  }
  class TensorImpl_Eigen{
     +T* eigenData
     +vector<int> actualDims
     +vector<int> actualStrides
  }

In this case, every operator must be able to handle any type of memory layout, which is problematic. In most operators implemenation, we make the basic assumption that data is contiguous in memory, with also the same layout according to the dimensions. While some implementation might accept other type of layout (e.g. CuDNN), there is usually a lot of restrictions. By having a different TensorView object, it is possible for the operators to specify if they accept a non-contiguous layout with strides. It also makes easy to insert operators in the computation graph to copy TensorView to actual Tensor if needed by the implementation.

And just to be clear: contiguous data means no padding!

added 19 commits

fc8dff77...29ba2cbf - 17 commits from branch main
3fae09e7 - [MERGE][main] updating branch with main
847f63bd - [WIP][NF] distinguishing different Tensor implementation state

Compare with previous version

added 2 commits

3268136d - [WIP][NF] distinguishing between having an impl and having an impl with data
3519cd7d - [BUILD] Adding stricter compilation options to unit tests

Compare with previous version

added 1 commit

695040aa - [FIX][TBV] fixing ComputeLayout

Compare with previous version

added 1 commit

4f573026 - [BUILD] temporarily remove strict compiler option to build unit tests

Compare with previous version

added 1 commit

05d95054 - [FIX] typo in CMakeLists.txt

Compare with previous version

added 1 commit

9815673b - [FIX][BUILD] sanitizing wrongly activated by default: fixed

Compare with previous version

added 34 commits

9815673b...8228ba63 - 30 commits from branch main
2cc13004 - [MERGE][master][WIP][NF] merging refactoring from master
365b8d59 - [MERGE][master]
676209c6 - [FIX] fixing after merge with main branch
60c2401b - [MERGE][main] updating branch with main

Compare with previous version

added 62 commits

60c2401b...d4f09c00 - 57 commits from branch main
dcc65f21 - [fix/build][FIX][BUILD][WIP] updating build related files
8f103239 - [FIX] fixing msvc compilation
260e3401 - [MERGE][main][WIP][NF] merging from main, still need to fix inconsistency between APIs
22c6276a - [MAJ] getIdx returns an index in number of elements
0c3c5027 - [MAJ] adjusting python binding to new interface

Compare with previous version

added 1 commit

dfbb1f27 - [BUILD] adjusting strict aliasing compilation options, fixing an include file error

Compare with previous version

added 1 commit

6fbf32e2 - [FIX] fix for python binding only

Compare with previous version

added 1 commit

fe12beaf - [FIX][MIN] removing deprecated and now wrong assert

Compare with previous version

added 1 commit

c91faa99 - [FIX] fixing getcoord

Compare with previous version

added 1 commit

75175e97 - [FUN][FIX] Adding resize to TensorImpl and fixing python binding

Compare with previous version

added 1 commit

6907a813 - [UT] moving a tensor test to backend has the implemented test actually...

Compare with previous version

The merge is hard to read due to a lot of formating changes, it is hard to see what is just formating what has changed. I may have missed some changes. I think other reviewers are recquired, @pineapple, @vtemplier.

The TensorView feature is not implemented as discussed between us. Instead it was added in the Tensor object which increase its complexity. I fear that this merge request decrease the maintanability of the Tensor object.

Some added methods are not very clear on what they do (getFirstDataCoord).

I do not understand exactly what MakeInteger.hpp does, is it really necessary ?

Naming convention is not respected with multiple method begining with uppercase.

clang-format related change: no better move than sharing the .clang-format file...

TensorView: !13 (comment 1223010)
Discussing more on this point will not bring tiling faster.

naming conventions and documentation clarification request noted and wip.

MakeInteger is used to deduce, in aidge context, types from std::size_t. There is no function id STL to say "I want a type whos size is half the size of another one", or "one byte less than...". MakeInteger can do that.

added 2 commits

fab26c43 - [MIN][DOX] renaming wrt to implicit naming convention
4f6757a1 - [IMPR] Removing a python-binding only function from C++ API and make it a...

Compare with previous version

added 2 commits

bd427aa7 - [FIX] fixed typo in runHooks
ac44f114 - [UT] Use all unit-tests by default

Compare with previous version

Comments answered. Corresponding update pushed and passed the CI.

You have, in TensorImpl, the following member variables:

std::vector<DimSize_t> mvDimensions;
std::vector<std::size_t> mvLayout;
Byte_t *mStorage = nullptr;
std::vector<Coord_t> mvFirstDataCoordinates;

In Tensor, you have:

std::vector<DimSize_t> mDims;

Why keep mDims in Tensor? Itsn't it redondant with mvDimensions? More generally, I do not understand the separation between Tensor and TensorImpl. It feels like you could have put what is in TensorImpl in Tensor and keep TensorImpl a pure abstract class.

No, please read again above discussion.
I may have underdocument the notion of active area in the code which I may fix. mDims are the dimension of the active area, mvDimensions, those of the storage.
I've got a slideware somewhere that illustrate that, I will try to retrieve it and upload it here.tensor_english.pdf

Then why isn't mvFirstDataCoordinates in Tensor? It seems like you would need that in order to generate different views to the same TensorImpl?

You're right, same thing there should be also a First Data Coordinates in Tensor in order to define the position of the active area. If it fell under the radar, as it is not in unit-tests yet, I propose to add it in the next MR.

added 22 commits

ac44f114...0921c777 - 21 commits from branch main
715ec7a9 - [MERGE][main] update merge from main

Compare with previous version

added 1 commit

3debb70b - [FIX][TBV] trying a fix for an issue with clang 9 to 14

Compare with previous version

added 32 commits

3debb70b...94139951 - 28 commits from branch main
ae5703c0 - [FIX][MIN] fixing some type (std::size_t => Aidge type)
f1f1f0ec - [IMPR] adding partial std::uint16_t support
c45c85c1 - [MERGE][main] updating from main
8b6e97b9 - [DOX][MIN]

Compare with previous version

mentioned in issue #42 (closed)

added 2 commits

e2eda56e - [FUN][IMPR] Adding TensorImpl cloning for cleaner Tensor copy
f3ab1b2e - [IMPR][MAJ] removing reference to Tensor object inside TensorImpl and improving creation interface

Compare with previous version