Fuse bn
Files
3- Cyril Moineau authored
actually, a TensorImpl is required to store a contiguous 1D array of bytes. Thus retrieving address of this array is not virtual, nor accessing a given byte. Besides, getRaw is thus not virtual and totally equivalent to the default
::operator[](std::size_t)
Unless I missed something, I propose to remove it entirelyMaking lazyInit explicit can indeed solve the issue of useless calls (even though we still have to solve how to manage users manually generated Tensors) Couldn't
getRaw()
return a (void*) pointer and::operator[](std::size_t)
return a typed value?Also, as we discussed,
getRaw()
could return only a device-usable "address"What
::operator[](std::size_t)
should return ? I am not sure yetWhen I created getRaw the goal was to NOT return a standard CPU addr.
The goal is to return a pointer of the real data (RAW) to mimic PyTorch behavior of https://pytorch.org/docs/stable/generated/torch.Tensor.data_ptr.html.
This is a usefull feature for interoperability especialy with Pytorch CUDA.
Maybe we should add another method (getHostPtr) to access a CPU memory addr.
AFAIK, there is no such raw pointer for cuda. A CUDA pointer is only usable by CUDA, if you dereferenced it from the CPU, it is undefined behavior and you are certain that if it does not segfault, you will not have the expected value: https://stackoverflow.com/questions/20607546/dereferencing-pointer-in-cuda-c#:~:text=Fundamental%20CUDA%20rules%20(ignoring%20cuda,device%20pointer%20in%20host%20code
CUDA has made the choice to represent its memory location identifier as pointer but it's confusing for user.
Besides, other target may return any kind of type to represent a memory location (see, for instance, he handle types in windows).Yet this pointer cannot be used for reading values. It seems that actual implementation cannot prohibit a client to try to unreference it.
Typically the getter cannot work, as it is implemented, on a CUDA backend.
Reading single values from host would probably be excessively expansive and acceptable only of this access is seldom.We could (should) have separate interfaces for host and device memory location indeed. Yet there is no common type for the device side (it is not necessarily a value stored in a pointer).
See, for instance what is done with the standard thread API: https://en.cppreference.com/w/cpp/thread/thread/native_handle
Yet it simpler in this case because there is only one underlying library. Thus thenative_handle
type can be anything but it's known compile time.
Our use-case is more complicated (we can return avoid *
to a memory location handle and, in a translation unit that is aware of the device, cast it to a"handle" *
that can be manipulated with device routines.