Tensor/TensorImpl efficiency
Data related to the in-memory layout of the stored data are semantically properties of the TensorImpl. Yet they seldomly change once initialized and they may be frequently needed by the Tensor (typically the per coordinates getter). Yet retrieving these data from the TensorImpl is costing an extra indirection, due to the PIMPL pattern. I'm wondering if the non virtual part of TensorImpl should not be aggregated inside Tensor and only the virtual part be managed as a pointer to impl. The drawback is that it makes Tensor less "abstract" as it will store low level information instead of delegating them to implementation. Besides we'll have to be careful about these information update.