Use Tensor as input of the ConstantOfShape kernel to avoid redundant size computation.
Kernel performance is unchanged. (x-axis is the number of elements in the output)