Skip to content
Snippets Groups Projects

Fuse bn

Merged Cyril Moineau requested to merge fuseBN into main
4 unresolved threads
3 files
+ 61
28
Compare changes
  • Side-by-side
  • Inline
Files
3
    • actually, a TensorImpl is required to store a contiguous 1D array of bytes. Thus retrieving address of this array is not virtual, nor accessing a given byte. Besides, getRaw is thus not virtual and totally equivalent to the default ::operator[](std::size_t) Unless I missed something, I propose to remove it entirely

      • getRaw() also runs the lazyInit() functionnality and does not return a typed pointer (void*)

      • According to our design meeting on Tensor, I was of the opinion to make lazyInit() an explicit function and don't hide it in low level calls where it might be often called for nothing, leading to overhead.

      • Making lazyInit explicit can indeed solve the issue of useless calls (even though we still have to solve how to manage users manually generated Tensors) Couldn't getRaw() return a (void*) pointer and ::operator[](std::size_t) return a typed value?

        Also, as we discussed, getRaw() could return only a device-usable "address"

        What ::operator[](std::size_t) should return ? I am not sure yet

      • ::operator[](std::size_t) is the default array access operator. It can be used only on a properly typed array. Not on raw storage nor on void *
        Thus it can be used only if getRaw returns a correctly typed address (and, as you say, a standard CPU memory address).

      • Author Maintainer

        When I created getRaw the goal was to NOT return a standard CPU addr.

        The goal is to return a pointer of the real data (RAW) to mimic PyTorch behavior of https://pytorch.org/docs/stable/generated/torch.Tensor.data_ptr.html.

        This is a usefull feature for interoperability especialy with Pytorch CUDA.

        Maybe we should add another method (getHostPtr) to access a CPU memory addr.

      • AFAIK, there is no such raw pointer for cuda. A CUDA pointer is only usable by CUDA, if you dereferenced it from the CPU, it is undefined behavior and you are certain that if it does not segfault, you will not have the expected value: https://stackoverflow.com/questions/20607546/dereferencing-pointer-in-cuda-c#:~:text=Fundamental%20CUDA%20rules%20(ignoring%20cuda,device%20pointer%20in%20host%20code
        CUDA has made the choice to represent its memory location identifier as pointer but it's confusing for user.
        Besides, other target may return any kind of type to represent a memory location (see, for instance, he handle types in windows).

      • Author Maintainer

        That is what I tried to say, raw pointer should return the memory pointer to the CUDA device.

      • Yet this pointer cannot be used for reading values. It seems that actual implementation cannot prohibit a client to try to unreference it.
        Typically the getter cannot work, as it is implemented, on a CUDA backend.
        Reading single values from host would probably be excessively expansive and acceptable only of this access is seldom.

      • Author Maintainer

        The device (GPU) pointer is used to read values using synhcronization method with the host (CPU).

        The goal of getRaw is to be able to have de DEVICE ptr, the (normal) getter should return a HOST ptr.

      • We could (should) have separate interfaces for host and device memory location indeed. Yet there is no common type for the device side (it is not necessarily a value stored in a pointer).
        See, for instance what is done with the standard thread API: https://en.cppreference.com/w/cpp/thread/thread/native_handle
        Yet it simpler in this case because there is only one underlying library. Thus the native_handle type can be anything but it's known compile time.
        Our use-case is more complicated (we can return a void * to a memory location handle and, in a translation unit that is aware of the device, cast it to a "handle" * that can be manipulated with device routines.

      • BTW, wouldn't GetNativeStorageHandle() less error-prone?
        template<???> auto GetNativeStorageHandle()?
        Possibly not possible like that.

      • Author Maintainer

        This is why getRawPtr return a void*.

      • Please register or sign in to reply
@@ -27,6 +27,9 @@ public:
{
printf("Cannot set raw pointer for backend %s\n", mBackend);
};
virtual void* getRaw(std::size_t /*idx*/)=0;
virtual std::size_t scalarSize() const = 0; // Size of one scalar (in bytes)
constexpr const char *backend() const { return mBackend; }
virtual ~TensorImpl() = default;
Loading