Skip to content
Snippets Groups Projects

Accumulation of the backward gradients

Merged Benjamin Halimi requested to merge accumulate into dev
2 unresolved threads

Context

The CUDA implementation of the backward propagation is currently heavily used out for the development of the QAT.

While it does work fine for streamlined single-branch architectures, the current policy that consists in writing the gradient in the approriate tensors does not work well for multiple-branch architectures.

Indeed, when a branch forks, the gradient coming from the left branch should be added to the gradient coming from the right one. A possible solution to address all kinds of toplogies is to accumulate the gradient in the tensors instead of writing in them.

This MR does exactly this : it turns all the operator's backward implementation (when it exists) in accumulation mode.

Modifications

All the operators that have a backward have it modified.

We use an alpha/beta blending scheme similar to what cudnn currently proposes, that is:

curr_grad = alpha * computed_grad + beta * curr_grad

And we set both alpha and beta to 1.

Edited by Benjamin Halimi

Merge request reports

Merge request pipeline #61376 passed

Merge request pipeline passed for e50feb47

Test coverage 74.60% (0.05%) from 2 jobs

Merged by Benjamin HalimiBenjamin Halimi 3 months ago (Dec 11, 2024 9:30am UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
105 105 tensorDesc = std::dynamic_pointer_cast<TensorImpl_cuda_>(input1.getImpl())->getCudnnTensorDesc(input1);
106 106 }
107 107
108 if (op.trainingMode())
    • The MR looks good. However, it looks like you introduced back scaling operator which is now supposed to be a MetaOperator? Is it intended?

    • Yes you are right, it looks like the good old Scaling node is now deprecated in aidge_core !

      I'm going to remove thoses files :)

    • Please register or sign in to reply
  • Cyril Moineau requested review from @cmoineau

    requested review from @cmoineau

  • Benjamin Halimi added 25 commits

    added 25 commits

    • 65cd7698...75102058 - 13 commits from branch dev
    • 75102058...a55884fc - 2 earlier commits
    • e5dd3f9d - add sqrt cuda backend
    • 73659134 - accumulate the gradients instead of replacing them (done for several nodes but not all)
    • 1c40c8bd - add the alpha and beta for the Padding operator
    • cb88eb71 - move every node with a backward in gradient accumulation mode
    • 6d78b49e - minor changes
    • 374e351a - add the backend of the Scaling node
    • 24ab9fce - add sqrt cuda backend
    • 73989490 - add sqrt cuda backend
    • 3465120c - move every node with a backward in gradient accumulation mode
    • 63b2d000 - Merge remote-tracking branch 'refs/remotes/origin/accumulate' into accumulate

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • added 1 commit

    • f684d1cc - remove the backend of the old Scaling node

    Compare with previous version

  • added 1 commit

    • 0bfa13a7 - minor fix (remove the scaling op include)

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • Benjamin Halimi changed the description

    changed the description

  • Benjamin Halimi added 26 commits

    added 26 commits

    • 0517b3bd...d9c094f6 - 2 commits from branch dev
    • d9c094f6...c22df87b - 14 earlier commits
    • d1355343 - move every node with a backward in gradient accumulation mode
    • bd2dbe48 - base setup for scaling branch
    • 283ddbf8 - add the backend of the Scaling node
    • c819699a - add sqrt cuda backend
    • 69d4560f - add sqrt cuda backend
    • fff9b4cd - move every node with a backward in gradient accumulation mode
    • de94d8ac - remove the backend of the old Scaling node
    • 830add3b - minor fix (remove the scaling op include)
    • 08f3083e - remove duplicate include
    • 94e29872 - Merge branch 'accumulate' of gitlab.eclipse.org:eclipse/aidge/aidge_backend_cuda into accumulate

    Compare with previous version

  • added 1 commit

    Compare with previous version

  • Benjamin Halimi enabled an automatic merge when all merge checks for a0191317 pass

    enabled an automatic merge when all merge checks for a0191317 pass

  • Benjamin Halimi canceled the automatic merge

    canceled the automatic merge

  • added 1 commit

    • e50feb47 - add alpha/beta for the sqrt operator

    Compare with previous version

  • Olivier BICHLER approved this merge request

    approved this merge request

  • mentioned in commit 57faa45a

  • mentioned in merge request aidge_learning!34 (merged)

  • Please register or sign in to reply
    Loading