Add Scaling Factors by inserting Scaling Nodes every time in the PTQ

added EnhancementStructure 🏗 LanguageC++ Refactoring🎨 labels

changed the description

changed title from Add Scaling Factors by inserting Scaling Nodes every time in the PTQ to [Temporary] Add Scaling Factors by inserting Scaling Nodes every time in the PTQ

changed the description

changed title from [Temporary] Add Scaling Factors by inserting Scaling Nodes every time in the PTQ to Add Scaling Factors by inserting Scaling Nodes every time in the PTQ

We are now resuming work on this issue, which should help improve the structure of the PTQ code. Additionally, thanks to constant folding, this should not introduce any additional problems.

Initially, we experimented with a method to set all scaling factors (SF) to 1 in order to bypass this issue. However, this approach has been deemed unsuitable as it compromises code cleanliness and maintainability.

Problem Description

After several attempts to integrate the proposed changes into the current PTQ pipeline, I realized that this constitutes a major change impacting the entire pipeline in a cascading manner. The initial approach to refactor the pipeline turned out to require more time and effort than anticipated.

Proposed Solution

To address this issue, I propose a more straightforward solution:

Encoding the Scaling Factor:
- Encode the scaling factor directly into each Producer of weights and biases within the graph (using Dynamic Attributes).
Providing Restoration Recipes:
- Supply recipes to automatically restore the graph by applying the scaling factor to tensors.
- This approach replicates the effect of constant folding, which would otherwise require inserting additional nodes.

Benefits of the Proposed Solution

Graph Simplification:
- This method significantly simplifies complex graphs with a large number of Producers.
Efficiency Gains:
- Avoids the need for extensive refactoring across the PTQ pipeline.
- Reduces the risk of cascading issues caused by pipeline changes.
Maintainability:
- Simplifies debugging and maintenance by keeping the graph clean and consistent.

Can you explain the issues you faced when trying to refactor the PTQ?

During the refactoring, I initially tried to simply insert scaling nodes behind the producers of weights and biases. However, I realized this caused an execution error because the new scaling nodes were being processed by the PTQ pipeline. For instance, the NodeVector used to iterate through the graph would include these new nodes, causing them to go through every step of the PTQ pipeline.

Even after resolving the execution errors, the performance significantly degraded (e.g., random accuracy of 8.5% on MNIST). Upon deeper investigation, step by step, I discovered that, for instance, the histograms calculated were slightly different in the pipeline with scaling nodes compared to the original pipeline.

My conclusion is that inserting scaling nodes for the producers is not as straightforward as I initially thought. While it is certainly feasible, it would require significant rework on the PTQ pipeline and a redesign of certain functions. This would likely take more time than initially anticipated.

Oh I see the kind of error you got.

What I propose is to simply clone with shallow copy the graph. Insert scaling node in one graph and keep the other one for parsing the graph and computing statistics and the other one to update the scalings.

Tho I have a question why do you compute histograms after insetting scaling? Shoudn't scaling be added after?

Cheers,

Cyril

That's a good idea, I will try that!

@bhalimi @cmoineau

closed

Add Scaling Factors by inserting Scaling Nodes every time in the PTQ

Designs

Child items ...

Activity

Problem Description

Proposed Solution

Benefits of the Proposed Solution