Fix loss derivative
Context
The derivative of the ReduceMean operator is not correct when computing the loss gradient for MSE and BCE loss.
Detailed major modifications
Replace target->dims()[0] by target->size() in the computation of the loss gradient.
Merge request reports
Activity
added 6 commits
-
40dfa0ea...9c7303e5 - 5 commits from branch
eclipse/aidge:dev
- 608963e9 - Fix loss derivative
-
40dfa0ea...9c7303e5 - 5 commits from branch
@olivierbichler : This is also an important fix for v0.6.0
No, I don't think we should normalize by the batch size only. In PyTorch, they also normalize by both the batch size and the number of channels. I think this makes sense in the case of a network with multiple outputs, each output having a very different number of channels (say, two outputs with 10 and 100 channels). Normalizing by the number of channels gives each output roughly the same impact on the overall loss, which is a good thing.
Currently, Aidge's implementation of MSE/BCE loss matches PyTorch's with the "reduction" parameter set to "mean," which is perfect. The only problem is that the loss derivative is incorrect, but the loss value is correct. After the proposed changes, I verified that the gradient tensors in Aidge and PyTorch are identical.
If it is not clear, please speak to @cmoineau that will make it clear for you.
changed milestone to %aidge v0.6.0
added Fix 🔥🔥 label
enabled an automatic merge when all merge checks for 608963e9 pass
enabled an automatic merge when all merge checks for 608963e9 pass
requested review from @olivierbichler
assigned to @oantoni
mentioned in commit d2e420ea