Tensor prerequisite
We should fix the requirement of Tensor/TensorImpl.
So far, we made the assumption that a TensorImpl is wrapping an in-memory CPU array of contiguous bytes storing typed data, organised as a multidimensional array.
Compliance with externally stored tensors (GPU, other HW accelerator,...) is obtained through transmitters that synchronises data across hardware boundaries.
I don't know if there exist HW optimized interconnect that would give direct access across these boundaries (It's more than 10 years that I hear of NVidia project to implement interconnect that would allow to efficiently share memory between GPU and host, but I don't think it has been achieved).
These assumptions have consequences on issue #13 (closed)