Proposition to clarify Data/Param/Attributes

Problem description

The current management of Data, OptionalData, Param and OptionalParam is a little lackluster.

For example, there is an overlap between Attributes and OptionalData, this is handled in a hacky way by forwardDims()

The initial definition of inputs was:

The main update was the introduction of optional inputs:

OptionalData for Attributes that could also be inputs of Operators, like the axis attribute of Slice
OptionalParam for parameter that are not mendatory like the bias of the convolution

This update, that was not thought for the whole framework, regularly creates tricky issues requiring hacks after hacks. e.g:

Getting graph input is tricky as we need to exclude Param and, depending on interpretation, OptionalData...
Redundance between Attributes and Inputs

In this issue we (@pineapple and I) propose to update the system as follow:

Everything is a Data.
Learnable parameters are defined using an attribute "learnable" (stored at Node level) that can be true or false (potential link with skip backward !391 (merged) @jeromeh)

We noticed that for optional data we have three kind of "optionalities":

A default value exists, or an Attribute is used. In this case, we propose to not use an Attribute. For default value we propose to create a Producer with the default value at construction time
The Data is truely optional (example Bias), in this case we propose to block the input with a Producer with an empty and undefined Tensor. This avoids to always retrieve this input as "free".
The Data is "XOR optional" (example Resize uses either Scale or Size) In this case we propose to also block the unused input with an empty Producer.

Advantages:

With this, FreeDataInputs are clearer (and can be renamed freeInputs) since they simply are inputs that are not linked to a Tensor
Handling of Attributes and Data are clearer

Edited 4 weeks ago