About changing the Node / Operator class hierarchy
Currently, a Node contains a pointer to its Operator
classDiagram
direction LR
Node o-- Operator
The same Operator can be pointed to by several Nodes, enabling multiple topologies without costly Operator copy.
Problem
User API
Even though this configuration is pretty usefull for building GraphView with fast access to topology functions, it appears, with experience, that most of the code is dedicated to accessing and setting Operator attributes (like Tensor, Parameters or Backend). It then requires an indirection to access stored Operator, but it also enforces a cast to the right Operator type each time a specific attribute is reached (as the stored Operator is of abstract type in Node), leading to lines like:
std::size_t nbChannel = std::static_pointer_cast<Conv_Op<2>>(myNode->getOperator())->get<ConvParam::NbChannel>();
//or
Tensor myTensor = std::static_pointer_cast<FC_Op>(myNode->getOperator())->output(0);
which are quickly heavy.
Moreover, many functions in Node
are simply shortcuts to the ossiciated Operator
functions, which ouldn't be necessary if users were directly interacting with Operator
. Examples are:
type()
-
nbInputs()
/nbOutputs()
inputCategory()
Accessing some node attributes from the Operator
Operator could need some of its Node's attributes like mAttr
.
Solution
A solution implying very few code change but solving a lot could be the following: reverse Node and Operator hierarchy.
- Node would hold a shared_ptr to its Operator
- Operator would have a vector of weak_ptr of Nodes containing it and an index to point to the right current Node (for the multiple topology feature)
- When a Node is deleted, it signals its Operator to erase it from its list
std::size_t nbChannel = myOperator->get<ConvParam::NbChannel>();
//or
Tensor myTensor = myOperator->output(0);
Why shared and weak ?
To avoid circular reference of shared_ptr and memory leakage.
Why is the Node the one to hold the shared_ptr ?
So that as long as a Node points to the Operator, it is not deleted.
OK the index is for pointing to the Node in the GraphView currenly used, but what if two Nodes share the same Operator ? One index is not enough then.
Indeed but it simply cannot happen, as it implies for the two Nodes to have the same type and exactly the same parent/child Node. They are the same Node, so it makes no sens having two of them. You just need one.
Main advantage
- Way more user-friendly access to operator attributes.
- Conv(), FC() and other generation function do not generate Node anymore but real Operators, so less confusion for the end user
- Access to the true operator class instead of the abstract class
Main disadvantage
- An indirection is created to access topological functions. How is a manual GraphView creation handled from here?
- Operators can access their topological informations if needed (is it a disadvantage really?)