Tristan de Blauwe requested to merge tblauwe/aidge_core:feat_non_linearity_pruning into dev Sep 26, 2025

Feature: Fast Input-Based Non-Linearity Pruning (FIBNLR)

This MR is a proposal for aidge#305, by adding a new recipe in aidge_core.

This MR is dependent on !526 (merged).

Overview

This recipe is a 3-step process:

Preparation: add hooks and attributes
Computation: compute metrics by running inferences until convergence.
Pruning: replace prunable nodes with identity layers.

Due to the nature of the method, there is no one-size-fits-all strategy. That is why you can tweak its behaviour through a parameters struct. You will likely need to experiment quite a bit to obtain results relevant to your model.

Some metrics

Accuracy before
Accuracy after recipes
Accurcay after recipes & training

Parameters

Parameter	Values	Description
method	Method::
	Strict	There are no exceptions to how each method below is applied.
	Loose	Group of size one are not normalized and keep their value. This is to mimic Baptiste algorithm.
normalisation	NormalisationMethod::	How data should be normalized ?
	None	No normalization is applied, so ablation is using NPR.
	MinMax	Min-Max normalisation which is useful if you want to rank layers in a group, where a NNPR value of 0 is the least useful and a value of 1 is the most useful.
	Sum	Sum normalisation which is useful if you want to see how much did a layer participate in its group. 0 means 0% and 1 means 100%.
	Average	Values are divided by the average of their group. Required to mimic paper's algorithm
group	GroupMethod::	How non-linearity layers are grouped ?
	None	No grouping is done, which means all layer will be in the same default group.
	ImageSize	Layers are grouped by their image size (height * width). This is relevant if you want to group layers by their "stage".
	Manual	You must specify a group for each relevant nodes, through the `group_assignments` parameter.
ablation_criteria	AblationCriteria::	How should layers be selected for removal ?
	None	No selection is performed.
	XLeastPerGroup	The X first layer per each group are eligible for pruning if they are under the threshold
	Threshold	All layers with a NNPR value below this threshold will be eligible. The exact value is highly dependent of your model and the other parameters. If you have no normalisation, this threshold will most likely result in non-sense ablation.
ablation_method	AblationMethod::	How should eligible layers should be removed ?
	None	No eligible layers are removed
	Identity	Eligible layers are replaced by an Identity layer
	Remove	Eligible layers are removed
ablation_threshold	0.1	The threshold value used in the ablation process. Any value below this threshold might be considered for ablation, depending on the chosen AblationMethod.
ablation_x_least	1	A minimum number of elements that must be kept during the ablation process, preventing all elements from being ablated.
max_iteration	100	The maximum number of iterations the algorithm will run. Setting it to 0 will disable the iteration limit, allowing the process to run until convergence or indefinitely.
convergence_threshold	1,00E-06	The threshold used to determine if the algorithm has converged. When the change between iterations is smaller than this value, the algorithm is considered to have converged and will stop.
group_assignments	{}	A map used to store explicit group assignments for specific nodes.

Some Results

First of all, like said in the paper, converge is really quick. On average for ResNet18 on CIFAR100, it takes 8 to 9 iterations (with a batch size of 64).

Now, to illustrate how the differents parameters affect the pruning, here a few plots showcasing which layers are pruned for a given configuration. Here grouping is done by the image size and the ablation criteria is the threshold.

For each layer, you can read, below their labels, their groups (G:1) -> layer belong to group 1 and how many neurons from previous conv (N:256) -> Number of neurons from preceding conv layer.

The main thing to notice is that layers are pruned quite differently given the configuration, and that you need to be very careful with how you set the ablation_threshold, given the configuration, but also your model and dataset !

Source Code Example (Python)

Here is an example on how to use this recipe, with :

ResNet18 v1-7 (from an onnx model)
Cifar10 (from pytorch)

import aidge_core
import aidge_backend_cpu    

#------------------------------------------------------------------------------- 
#        INITIALISATION   
#------------------------------------------------------------------------------- 
aidge_database          = aidge_cifar10()
cifar10_dataprovider    = aidge_core.DataProvider(aidge_database, 
	backend="cpu", batch_size=64, shuffle=True, drop_last=True)

model  = load_model("data/resnet18/resnet18-v1-7.onnx")

# Retrieve first input tensor to forward dims.
(tensor, lbl) = next(iter(provider))
model.set_mandatory_inputs_first()
model.compile("cpu", datatype= aidge_core.dtype.float32, dims=[tensor.dims])

#------------------------------------------------------------------------------- 
#        Fast Input-Based Non-Linearity Pruning
#------------------------------------------------------------------------------- 
# Recipe can be tweaked by altering the default parameters.
params = aidge_core.fibnlr_Parameters()
pruned_nodes = aidge_core.fibnlr(model, provider, params)
#                           ^
#                           |
#  This function is equivalent to apply the following steps
#
# Step 1: Model preparation
# Add necessary attributes and hooks to relevant nodes, to compute NPR and NNPR. 
# Group relevant nodes according to `params.group`
prepared_nodes = aidge_core.fibnlr_prepare(model, params)

# Step 2: Compute NPR and NNPR
# Run inferences until :
	# - NPR values converged (fairly quick < 100 iterations)
	# - or max iterations has been reached
	# - or provider has no data left
# Compute NNPR according to `params.normalization`
aidge_core.fibnlr_compute(model, provider, params)

# Step 3: Prune non-linearity layers
# They are replaced with identity layers.
# Pruning strategy can be tweaked with params.pruning
pruned_nodes = aidge_core.fibnlr_prune(model, params)

Full source code can be found here: https://gitlab.eclipse.org/tblauwe/fibnlr-test

Changes

New files

New header file : aidge_core\include\recipes\fibnlr.hpp
New sources files :
- aidge_core\src\recipes\fibnlr.cpp
- aidge_core\unit_test\recipes\test_fibnlr.cpp
- aidge_core\python_bindings\recipes\pybind_fibnlr.cpp

Attribute Helpers

Quite a few helpers have been added. If you found them useful, maybe they could be added directly to Aidge:: ? For now, I added them to Aidge::fibnlr

For example, to simplify the usage of attributes, the following API has beed added:

// Add attribute @c T to @c node
template <typename T>
void add(const NodePtr node);

// Set attribute @c T with @c value to @c node
template <typename T>
void set(const NodePtr node, T&& value);

// Set attribute @c T with @c value to @c node
template <typename T>
void set(const NodePtr node, const T& value);

// Returns true if @c node has attribute @c T
template <typename T>
bool has(const NodePtr node);

/**
* Returns a @c T& if @c node has it, otherwise asserts.
* Use @c tryGet instead if unsure @c node has it.
*/ 
template <typename T>
T& get(const NodePtr node);

/**
* Returns a @c T* if @c node has it, otherwise a nullptr.
* Use @c get instead if sure @c node has it.
*/ 
template <typename T>
T* tryGet(const NodePtr node);

Also, an upside of these templated version, is that we do not need to define & pass a name. If for some reasons, we need to add a struct multiple, we could introduce the concept of "relation". We would have the same api, that accepts another templated type, for example :

// get<Input, T>(node)
template <typename R, typename T>
T& get(const NodePtr node);

Also, if we could also support "runtime type/relations"

// get<T>(node)
template <typename T>
T& get(const Id relation, const NodePtr node);

And finally, if these were added to Node, we could do :

// node.get<T>(relation)
template <typename T>
T& get(const Id relation);

There is also an helper to quickly defines their python bindings :

template<typename T>
void defineAttributesPythonBindings(pybind11::module& m, const char* name) {
	m.def(fmt::format("fibnlr_add_{}", name).c_str(), &Aidge::fibnlr::add<T>, py::arg("node"));
	m.def(fmt::format("fibnlr_set_{}", name).c_str(), static_cast<void(*)(const Aidge::NodePtr, const T&)>(&Aidge::fibnlr::set<T>), py::arg("node"), py::arg("value"));
	m.def(fmt::format("fibnlr_has_{}", name).c_str(), &Aidge::fibnlr::has<T>, py::arg("node"));
	m.def(fmt::format("fibnlr_get_{}", name).c_str(), &Aidge::fibnlr::get<T>, "Returns a non-owning reference to the attribute of the given node, or asserts if the node doesn't have it.", py::arg("node"), py::return_value_policy::reference);
	m.def(fmt::format("fibnlr_try_get_{}", name).c_str(), &Aidge::fibnlr::tryGet<T>, py::arg("node"), py::return_value_policy::reference);
}
// e.g.         
// defineAttributesPythonBindings<Aidge::fibnlr::NPR>(m, "npr");
// defineAttributesPythonBindings<Aidge::fibnlr::NNPR>(m, "nnpr");
// defineAttributesPythonBindings<Aidge::fibnlr::ActivityTracker>(m, "activity_tracker");
// defineAttributesPythonBindings<Aidge::fibnlr::Group>(m, "group");

Conv Helpers

Again, these functions were also defined to reduce boilerplate. They are mostly relevant for merging convolutions, but I still use some even if we are not merging them in this recipe.

/*
* @brief Returns `true` if @c node 's type is of type Conv_OP<DIM>.
*/
bool isConv(const NodePtr node);

/*
* @brief Returns `true` if @c node 's operator is of type OperatorType::Tensor
*/
bool isTensor(const NodePtr node);

/*
* @brief Returns `true` if @c child is a child of @c parent.
*/
bool isChildOf(const NodePtr child, const NodePtr parent);

/**
* @brief Returns kernel dimensions of given node @c node, if it is a convolution.
*/
template<DimIdx_t DIM>
std::array<DimSize_t, DIM> getKernelDims(const NodePtr node);

/**
* @brief Returns kernel strides of given node @c node, if it is a convolution.
*/
template<DimIdx_t DIM>
std::array<DimSize_t, DIM> getKernelStrides(const NodePtr node);

/**
* @brief Returns inChannels of given node @c node, if it is a convolution.
*/
template<DimIdx_t DIM>
DimSize_t inChannels(const NodePtr node);

/**
* @brief Returns inChannels of given node @c node, if it is a convolution.
*/
template<DimIdx_t DIM>
DimSize_t outChannels(const NodePtr node);

/**
 * @brief Compute new kernel dimensions from two other kernels.
 *
 * The formula for a single dimension is: k = (k1 - 1) * s2 + k2
 */
template <DimIdx_t DIM>
std::array<DimSize_t, DIM> computeKernelDims(
	const std::array<DimSize_t, DIM>& aDims,
	const std::array<DimSize_t, DIM>& bDims,
	const std::array<DimSize_t, DIM>& bStrides
);

/**
 * @brief Compute new kernel strides from two other kernels.
 *
 * The formula for a single dimension is: s = s1 * s2
 */
template <DimIdx_t DIM>
std::array<DimSize_t, DIM> computeKernelStrides(
	const std::array<DimSize_t, DIM>& aStrides,
	const std::array<DimSize_t, DIM>& bStrides
);

/**
* @brief Create a new convolution node from two convolution nodes, without affecting graph views.
*
* Input and Output nodes are not set, meaning weights and bias are not set !
*
* Merged convolution node will have the following properties:
* - inChannels is equal to the @c a nbInputs.
* - outChannels is equal to the @c b nbOutputs.
* - A merged Kernel, with its dimensions and stride computed by compute_kernel_size.
*
* @return A NodePtr if merge could happend, `nullptr` otherwise.
*/
template<DimIdx_t DIM>
NodePtr convFrom(const NodePtr a, const NodePtr b);

Python bindings

All bindings are prefixed with fibnlr_. Maybe I could instead a sub module ? (linked issue#197)

Tests

11 tests cases have been added, with a total of 298 assertions. Coverage should be good enough, even if some functions require a backend. When pertinent, I added dummy implementations. However, the most relevant tests is the associated python project (https://gitlab.eclipse.org/tblauwe/fibnlr-test).

Edited Sep 30, 2025 by Tristan de Blauwe

Feature: Fast Input-Based Non Linearity Pruning