Feature: Fast Input-Based Non Linearity Pruning
Feature: Fast Input-Based Non-Linearity Pruning (FIBNLR)
This MR is a proposal for aidge#305, by adding a new recipe in aidge_core.
This MR is dependent on !526 (merged).
Overview
This recipe is a 3-step process:
- Preparation: add hooks and attributes
- Computation: compute metrics by running inferences until convergence.
- Pruning: replace prunable nodes with identity layers.
Due to the nature of the method, there is no one-size-fits-all strategy. That is why you can tweak its behaviour through a parameters struct. You will likely need to experiment quite a bit to obtain results relevant to your model.
Some metrics
-
Accuracy before -
Accuracy after recipes -
Accurcay after recipes & training
Parameters
| Parameter | Values | Description |
|---|---|---|
| method | Method:: | |
| Strict |
There are no exceptions to how each method below is applied. | |
| Loose | Group of size one are not normalized and keep their value. This is to mimic Baptiste algorithm. | |
| normalisation | NormalisationMethod:: | How data should be normalized ? |
| None | No normalization is applied, so ablation is using NPR. | |
| MinMax | Min-Max normalisation which is useful if you want to rank layers in a group, where a NNPR value of 0 is the least useful and a value of 1 is the most useful. | |
| Sum | Sum normalisation which is useful if you want to see how much did a layer participate in its group. 0 means 0% and 1 means 100%. | |
| Average | Values are divided by the average of their group. Required to mimic paper's algorithm | |
| group | GroupMethod:: | How non-linearity layers are grouped ? |
| None | No grouping is done, which means all layer will be in the same default group. | |
| ImageSize | Layers are grouped by their image size (height * width). This is relevant if you want to group layers by their "stage". | |
| Manual | You must specify a group for each relevant nodes, through the group_assignments parameter. |
|
| ablation_criteria | AblationCriteria:: | How should layers be selected for removal ? |
| None | No selection is performed. | |
| XLeastPerGroup | The X first layer per each group are eligible for pruning if they are under the threshold | |
| Threshold | All layers with a NNPR value below this threshold will be eligible. The exact value is highly dependent of your model and the other parameters. If you have no normalisation, this threshold will most likely result in non-sense ablation. | |
| ablation_method | AblationMethod:: | How should eligible layers should be removed ? |
| None | No eligible layers are removed | |
| Identity | Eligible layers are replaced by an Identity layer | |
| Remove | Eligible layers are removed | |
| ablation_threshold | 0.1 | The threshold value used in the ablation process. Any value below this threshold might be considered for ablation, depending on the chosen AblationMethod. |
| ablation_x_least | 1 | A minimum number of elements that must be kept during the ablation process, preventing all elements from being ablated. |
| max_iteration | 100 | The maximum number of iterations the algorithm will run. Setting it to 0 will disable the iteration limit, allowing the process to run until convergence or indefinitely. |
| convergence_threshold | 1,00E-06 | The threshold used to determine if the algorithm has converged. When the change between iterations is smaller than this value, the algorithm is considered to have converged and will stop. |
| group_assignments |
{} | A map used to store explicit group assignments for specific nodes. |
Some Results
First of all, like said in the paper, converge is really quick. On average for ResNet18 on CIFAR100, it takes 8 to 9 iterations (with a batch size of 64).
Now, to illustrate how the differents parameters affect the pruning, here a few plots showcasing which layers are pruned for a given configuration. Here grouping is done by the image size and the ablation criteria is the threshold.
For each layer, you can read, below their labels, their groups (G:1) -> layer belong to group 1 and how many neurons from previous conv (N:256) -> Number of neurons from preceding conv layer.
The main thing to notice is that layers are pruned quite differently given the configuration, and that you need to be very careful with how you set the ablation_threshold, given the configuration, but also your model and dataset !
Source Code Example (Python)
Here is an example on how to use this recipe, with :
- ResNet18 v1-7 (from an onnx model)
- Cifar10 (from pytorch)
import aidge_core
import aidge_backend_cpu
#-------------------------------------------------------------------------------
# INITIALISATION
#-------------------------------------------------------------------------------
aidge_database = aidge_cifar10()
cifar10_dataprovider = aidge_core.DataProvider(aidge_database,
backend="cpu", batch_size=64, shuffle=True, drop_last=True)
model = load_model("data/resnet18/resnet18-v1-7.onnx")
# Retrieve first input tensor to forward dims.
(tensor, lbl) = next(iter(provider))
model.set_mandatory_inputs_first()
model.compile("cpu", datatype= aidge_core.dtype.float32, dims=[tensor.dims])
#-------------------------------------------------------------------------------
# Fast Input-Based Non-Linearity Pruning
#-------------------------------------------------------------------------------
# Recipe can be tweaked by altering the default parameters.
params = aidge_core.fibnlr_Parameters()
pruned_nodes = aidge_core.fibnlr(model, provider, params)
# ^
# |
# This function is equivalent to apply the following steps
#
# Step 1: Model preparation
# Add necessary attributes and hooks to relevant nodes, to compute NPR and NNPR.
# Group relevant nodes according to `params.group`
prepared_nodes = aidge_core.fibnlr_prepare(model, params)
# Step 2: Compute NPR and NNPR
# Run inferences until :
# - NPR values converged (fairly quick < 100 iterations)
# - or max iterations has been reached
# - or provider has no data left
# Compute NNPR according to `params.normalization`
aidge_core.fibnlr_compute(model, provider, params)
# Step 3: Prune non-linearity layers
# They are replaced with identity layers.
# Pruning strategy can be tweaked with params.pruning
pruned_nodes = aidge_core.fibnlr_prune(model, params)
Full source code can be found here: https://gitlab.eclipse.org/tblauwe/fibnlr-test
Changes
New files
- New header file :
aidge_core\include\recipes\fibnlr.hpp - New sources files :
aidge_core\src\recipes\fibnlr.cppaidge_core\unit_test\recipes\test_fibnlr.cppaidge_core\python_bindings\recipes\pybind_fibnlr.cpp
Attribute Helpers
Quite a few helpers have been added.
If you found them useful, maybe they could be added directly to Aidge:: ?
For now, I added them to Aidge::fibnlr
For example, to simplify the usage of attributes, the following API has beed added:
// Add attribute @c T to @c node
template <typename T>
void add(const NodePtr node);
// Set attribute @c T with @c value to @c node
template <typename T>
void set(const NodePtr node, T&& value);
// Set attribute @c T with @c value to @c node
template <typename T>
void set(const NodePtr node, const T& value);
// Returns true if @c node has attribute @c T
template <typename T>
bool has(const NodePtr node);
/**
* Returns a @c T& if @c node has it, otherwise asserts.
* Use @c tryGet instead if unsure @c node has it.
*/
template <typename T>
T& get(const NodePtr node);
/**
* Returns a @c T* if @c node has it, otherwise a nullptr.
* Use @c get instead if sure @c node has it.
*/
template <typename T>
T* tryGet(const NodePtr node);
Also, an upside of these templated version, is that we do not need to define & pass a name. If for some reasons, we need to add a struct multiple, we could introduce the concept of "relation". We would have the same api, that accepts another templated type, for example :
// get<Input, T>(node)
template <typename R, typename T>
T& get(const NodePtr node);
Also, if we could also support "runtime type/relations"
// get<T>(node)
template <typename T>
T& get(const Id relation, const NodePtr node);
And finally, if these were added to Node, we could do :
// node.get<T>(relation)
template <typename T>
T& get(const Id relation);
There is also an helper to quickly defines their python bindings :
template<typename T>
void defineAttributesPythonBindings(pybind11::module& m, const char* name) {
m.def(fmt::format("fibnlr_add_{}", name).c_str(), &Aidge::fibnlr::add<T>, py::arg("node"));
m.def(fmt::format("fibnlr_set_{}", name).c_str(), static_cast<void(*)(const Aidge::NodePtr, const T&)>(&Aidge::fibnlr::set<T>), py::arg("node"), py::arg("value"));
m.def(fmt::format("fibnlr_has_{}", name).c_str(), &Aidge::fibnlr::has<T>, py::arg("node"));
m.def(fmt::format("fibnlr_get_{}", name).c_str(), &Aidge::fibnlr::get<T>, "Returns a non-owning reference to the attribute of the given node, or asserts if the node doesn't have it.", py::arg("node"), py::return_value_policy::reference);
m.def(fmt::format("fibnlr_try_get_{}", name).c_str(), &Aidge::fibnlr::tryGet<T>, py::arg("node"), py::return_value_policy::reference);
}
// e.g.
// defineAttributesPythonBindings<Aidge::fibnlr::NPR>(m, "npr");
// defineAttributesPythonBindings<Aidge::fibnlr::NNPR>(m, "nnpr");
// defineAttributesPythonBindings<Aidge::fibnlr::ActivityTracker>(m, "activity_tracker");
// defineAttributesPythonBindings<Aidge::fibnlr::Group>(m, "group");
Conv Helpers
Again, these functions were also defined to reduce boilerplate. They are mostly relevant for merging convolutions, but I still use some even if we are not merging them in this recipe.
/*
* @brief Returns `true` if @c node 's type is of type Conv_OP<DIM>.
*/
bool isConv(const NodePtr node);
/*
* @brief Returns `true` if @c node 's operator is of type OperatorType::Tensor
*/
bool isTensor(const NodePtr node);
/*
* @brief Returns `true` if @c child is a child of @c parent.
*/
bool isChildOf(const NodePtr child, const NodePtr parent);
/**
* @brief Returns kernel dimensions of given node @c node, if it is a convolution.
*/
template<DimIdx_t DIM>
std::array<DimSize_t, DIM> getKernelDims(const NodePtr node);
/**
* @brief Returns kernel strides of given node @c node, if it is a convolution.
*/
template<DimIdx_t DIM>
std::array<DimSize_t, DIM> getKernelStrides(const NodePtr node);
/**
* @brief Returns inChannels of given node @c node, if it is a convolution.
*/
template<DimIdx_t DIM>
DimSize_t inChannels(const NodePtr node);
/**
* @brief Returns inChannels of given node @c node, if it is a convolution.
*/
template<DimIdx_t DIM>
DimSize_t outChannels(const NodePtr node);
/**
* @brief Compute new kernel dimensions from two other kernels.
*
* The formula for a single dimension is: k = (k1 - 1) * s2 + k2
*/
template <DimIdx_t DIM>
std::array<DimSize_t, DIM> computeKernelDims(
const std::array<DimSize_t, DIM>& aDims,
const std::array<DimSize_t, DIM>& bDims,
const std::array<DimSize_t, DIM>& bStrides
);
/**
* @brief Compute new kernel strides from two other kernels.
*
* The formula for a single dimension is: s = s1 * s2
*/
template <DimIdx_t DIM>
std::array<DimSize_t, DIM> computeKernelStrides(
const std::array<DimSize_t, DIM>& aStrides,
const std::array<DimSize_t, DIM>& bStrides
);
/**
* @brief Create a new convolution node from two convolution nodes, without affecting graph views.
*
* Input and Output nodes are not set, meaning weights and bias are not set !
*
* Merged convolution node will have the following properties:
* - inChannels is equal to the @c a nbInputs.
* - outChannels is equal to the @c b nbOutputs.
* - A merged Kernel, with its dimensions and stride computed by compute_kernel_size.
*
* @return A NodePtr if merge could happend, `nullptr` otherwise.
*/
template<DimIdx_t DIM>
NodePtr convFrom(const NodePtr a, const NodePtr b);
Python bindings
All bindings are prefixed with fibnlr_.
Maybe I could instead a sub module ? (linked issue#197)
Tests
- 11 tests cases have been added, with a total of 298 assertions. Coverage should be good enough, even if some functions require a backend. When pertinent, I added dummy implementations. However, the most relevant tests is the associated python project (https://gitlab.eclipse.org/tblauwe/fibnlr-test).





