[upd] prototype to fluent interface
Context
The current prototype-style interface requires manual cloning and sequential mutation:
Tensor tmp = A.clone();
tmp.toDtype(Float32);
tmp.toBackend("cuda");
tmp.abs();
tmp.clip();
This is verbose, error-prone, and hard to chain.
Proposed Solution
Adopt a fluent interface:
Tensor B = A.toDtype(Float32).toBackend("cuda").abs().clip();
For in-place versions, use a trailing underscore (_): (see #294)
A.toDtype_().abs_().clip_(0, 1); // Avoids allocating temporaries
This:
- Enables expressive, composable transformations
- Reduces boilerplate and memory overhead
- Follows familiar patterns from libraries like PyTorch (e.g. add_())