The goal of this issue is to transform the code within Post Training Quantization (PTQ) to handle the quantization operations for weights and biases through a Meta-Operator Scaling node (similar to how activations are currently managed). This will enable the retrieval of scaling factors after PTQ to support ONNX import. Additionally, a method is being experimented with to set the scaling factors to 1, thereby bypassing this issue.
for more informations please see: aidge_onnx#42 (comment 2851563)
For now this issue is on pause since we are trying to bypass all ONNX LinearConv by setting the SF to 1.
Edited
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
Noam Zerahchanged the descriptionCompare with previous version
changed the description
Noam Zerahchanged title from Add Scaling Factors by inserting Scaling Nodes every time in the PTQ to [Temporary] Add Scaling Factors by inserting Scaling Nodes every time in the PTQ
changed title from Add Scaling Factors by inserting Scaling Nodes every time in the PTQ to [Temporary] Add Scaling Factors by inserting Scaling Nodes every time in the PTQ
Noam Zerahchanged the descriptionCompare with previous version
changed the description
Noam Zerahchanged title from [Temporary] Add Scaling Factors by inserting Scaling Nodes every time in the PTQ to Add Scaling Factors by inserting Scaling Nodes every time in the PTQ
changed title from [Temporary] Add Scaling Factors by inserting Scaling Nodes every time in the PTQ to Add Scaling Factors by inserting Scaling Nodes every time in the PTQ
We are now resuming work on this issue, which should help improve the structure of the PTQ code. Additionally, thanks to constant folding, this should not introduce any additional problems.
Initially, we experimented with a method to set all scaling factors (SF) to 1 in order to bypass this issue. However, this approach has been deemed unsuitable as it compromises code cleanliness and maintainability.
After several attempts to integrate the proposed changes into the current PTQ pipeline, I realized that this constitutes a major change impacting the entire pipeline in a cascading manner. The initial approach to refactor the pipeline turned out to require more time and effort than anticipated.
Proposed Solution
To address this issue, I propose a more straightforward solution:
Encoding the Scaling Factor:
Encode the scaling factor directly into each Producer of weights and biases within the graph (using Dynamic Attributes).
Providing Restoration Recipes:
Supply recipes to automatically restore the graph by applying the scaling factor to tensors.
This approach replicates the effect of constant folding, which would otherwise require inserting additional nodes.
Benefits of the Proposed Solution
Graph Simplification:
This method significantly simplifies complex graphs with a large number of Producers.
Efficiency Gains:
Avoids the need for extensive refactoring across the PTQ pipeline.
Reduces the risk of cascading issues caused by pipeline changes.
Maintainability:
Simplifies debugging and maintenance by keeping the graph clean and consistent.
During the refactoring, I initially tried to simply insert scaling nodes behind the producers of weights and biases. However, I realized this caused an execution error because the new scaling nodes were being processed by the PTQ pipeline. For instance, the NodeVector used to iterate through the graph would include these new nodes, causing them to go through every step of the PTQ pipeline.
Even after resolving the execution errors, the performance significantly degraded (e.g., random accuracy of 8.5% on MNIST). Upon deeper investigation, step by step, I discovered that, for instance, the histograms calculated were slightly different in the pipeline with scaling nodes compared to the original pipeline.
My conclusion is that inserting scaling nodes for the producers is not as straightforward as I initially thought. While it is certainly feasible, it would require significant rework on the PTQ pipeline and a redesign of certain functions. This would likely take more time than initially anticipated.
What I propose is to simply clone with shallow copy the graph. Insert scaling node in one graph and keep the other one for parsing the graph and computing statistics and the other one to update the scalings.
Tho I have a question why do you compute histograms after insetting scaling? Shoudn't scaling be added after?