[Add] benchmark scripts (!360) · Merge requests · Eclipse Projects / aidge / aidge_core

Maxence Naud requested to merge feat_benchmark into dev Mar 07, 2025

Context

The aim is to add an universal system to measure the accuracy and time performances of each Aidge module, wheter they are backend or export modules. It is part of #262 (moved).

Usage

To explain how this benchmarking system works, we will generate the benchmark of the current Conv2D operator implementations.

1. Create a test configuration file

The first step to benchmark an Operator is to list paramters and input configurations that aims at being tested. For that, we use a JSON configuration file.

Several test config files are provided in this MR, among which one for Conv2D. A test configuration is made of 4 parts:

1. Description of the tested Operator

This description provides parameters for the ONNX file generation.

"operator": "Conv",     // operator type
"opset_version": 21,    // opset
"initializer_rank": 1,  // number of data inputs (does not include initializers)

2. Meta-data about the test

Exports are designed to infer a stream of data so their current implementation does not support multiple batchs. For now this part only tells if inputs have multiple batchs.

"test_meta_data": {
    "multiple_batchs": false
},

3. Base configuration

Each individual test should tweak a single parameter to assess performance and output impact of this parameter (for example the number of input channels). Any other parameter that is not impacted should be set to its base configuration value.

"base_configuration": {
    "input_shapes": [
        ["input_0", [1, 10, 200, 200]],
        ["weight_1", [10, 10, 3, 3]],
        ["bias_2", [10]]
    ],
    "attributes": {
        "kernel_shape": [3, 3],
        "strides": [1, 1],
        "dilations": [1, 1]
    }
},

4. Tested parameters

This part lists the values of tested parameter, grouped by parameter names. Each value leads to an independant test. Bellow, the other_parameters section allows to update operator arguments and input shapes accordingly.

"test_configuration": {
    "main_parameters": {
        "feature_map_size": [
            10,100,500
        ],
        "kernel_shape": [
            [1, 1],
            [3, 3],
            [5, 5]
        ],
        "strides": [
            [1, 1],
            [2, 2],
            [3, 3]
        ],
        "dilations": [
            [1, 1],
            [2, 2],
            [3, 3]
        ]
    },
    "other_parameters": {
        "feature_map_size": {
            "10": {
                "attributes": {},
                "input_shapes": [
                    ["input_0", [1, 10, 10, 10]]
                ]
            },
            "100": {
                "attributes": {},
                "input_shapes": [
                    ["input_0", [1, 10, 100, 100]]
                ]
            },
            "500": {
                "attributes": {},
                "input_shapes": [
                    ["input_0", [1, 10, 500, 500]]
                ]
            }
        },
        "kernel_shape": {
            "[1, 1]": {
                "attributes": {},
                "input_shapes": [
                    ["weight_1", [10, 10, 1, 1]]
                ]
            },
            "[3, 3]": {
                "attributes": {},
                "input_shapes": [
                    ["weight_1", [10, 10, 3, 3]]
                ]
            },
            "[5, 5]": {
                "attributes": {},
                "input_shapes": [
                    ["weight_1", [10, 10, 5, 5]]
                ]
            }
        },
        "strides": {
            "[1, 1]": {
                "attributes": {},
                "input_shapes": []
            },
            "[2, 2]": {
                "attributes": {},
                "input_shapes": []
            },
            "[3, 3]": {
                "attributes": {},
                "input_shapes": []
            }
        },
        "dilations": {
            "[1, 1]": {
                "attributes": {},
                "input_shapes": []
            },
            "[2, 2]": {
                "attributes": {},
                "input_shapes": []
            },
            "[3, 3]": {
                "attributes": {},
                "input_shapes": []
            }
        }
    }
}

2. Assess modules

The main script to run the benchmark is benchmark/benchmark.py.

usage: benchmark.py [-h] --config-file CONFIG_FILE --module-to-bench MODULE_TO_BENCH
                    [--compare-with-onnxruntime] [--time] --results-directory RESULTS_DIRECTORY
                    [--results-filename RESULTS_FILENAME]

Operator Kernel Performance Benchmarking

options:
  -h, --help            show this help message and exit
  --config-file CONFIG_FILE, -cf CONFIG_FILE
                        Path to configuration JSON with operator type, attributes, and input sizes.
  --module-to-bench MODULE_TO_BENCH, -mtb MODULE_TO_BENCH
                        Name of the module containing the inference functions
  --compare-with-onnxruntime, -cwo
                        Compare output with ONNXRuntime
  --time, -t            Compute inference time
  --results-directory RESULTS_DIRECTORY
                        Directory to save the results
  --results-filename RESULTS_FILENAME
                        Name of the saved result file. If not provided, it will default to the
                        '<operator_name>_<module_to_bench>.json'. If a file with that nae and at tha location
                        already exists, it will be overrided with elements individually replaced only if new
                        ones are computed

Here, lets assess aidge_backend_cpu, aidge_backend_cuda, aidge_export_cpp, onnxruntime and torch libraries , I will only show it for aidge_backend_cpu:

python aidge/aidge_core/benchmark/benchmark.py \
    --time \
    --compare-with-onnxruntime \
    --config-file ./aidge/aidge_core/benchmark/operator_config/conv2d_config.json \
    --results-directory ./aidge/aidge_core/benchmark/results/ \
    --module-to-bench aidge_backend_cpu

This outputs the following:

'aidge_backend_cpu' module successfully imported
Starting tests...
▷ feature_map_size -- 10
 ├┬─Measuring kernel inference time...
 │└─[ time = 1.80e-05 ± 1.21e-06 seconds ]
 └┬─Assessing results are the same as 'onnxruntime' module...
  └─[ o ]

▷ feature_map_size -- 100
 ├┬─Measuring kernel inference time...
 │└─[ time = 7.82e-04 ± 9.67e-06 seconds ]
 └┬─Assessing results are the same as 'onnxruntime' module...
  └─[ o ]

▷ feature_map_size -- 500
 ├┬─Measuring kernel inference time...
 │└─[ time = 2.02e-02 ± 3.05e-03 seconds ]
 └┬─Assessing results are the same as 'onnxruntime' module...
  └─[ o ]

▷ kernel_shape -- [1, 1]
 ├┬─Measuring kernel inference time...
 │└─[ time = 5.70e-04 ± 1.45e-04 seconds ]
 └┬─Assessing results are the same as 'onnxruntime' module...
  └─[ o ]

▷ kernel_shape -- [3, 3]
 ├┬─Measuring kernel inference time...
 │└─[ time = 2.95e-03 ± 4.63e-06 seconds ]
 └┬─Assessing results are the same as 'onnxruntime' module...
  └─[ o ]

▷ kernel_shape -- [5, 5]
 ├┬─Measuring kernel inference time...
 │└─[ time = 4.74e-02 ± 3.76e-04 seconds ]
 └┬─Assessing results are the same as 'onnxruntime' module...
  └─[ o ]

▷ strides -- [1, 1]
 ├┬─Measuring kernel inference time...
 │└─[ time = 2.99e-03 ± 2.19e-04 seconds ]
 └┬─Assessing results are the same as 'onnxruntime' module...
  └─[ o ]

▷ strides -- [2, 2]
 ├┬─Measuring kernel inference time...
 │└─[ time = 1.96e-03 ± 1.91e-05 seconds ]
 └┬─Assessing results are the same as 'onnxruntime' module...
  └─[ o ]

▷ strides -- [3, 3]
 ├┬─Measuring kernel inference time...
 │└─[ time = 8.89e-04 ± 3.84e-06 seconds ]
 └┬─Assessing results are the same as 'onnxruntime' module...
  └─[ o ]

▷ dilations -- [1, 1]
 ├┬─Measuring kernel inference time...
 │└─[ time = 2.97e-03 ± 5.48e-05 seconds ]
 └┬─Assessing results are the same as 'onnxruntime' module...
  └─[ o ]

▷ dilations -- [2, 2]
 ├┬─Measuring kernel inference time...
 │└─[ time = 2.00e-02 ± 6.51e-05 seconds ]
 └┬─Assessing results are the same as 'onnxruntime' module...
  └─[ o ]

▷ dilations -- [3, 3]
 ├┬─Measuring kernel inference time...
 │└─[ time = 1.96e-02 ± 9.84e-05 seconds ]
 └┬─Assessing results are the same as 'onnxruntime' module...
  └─[ o ]

Printing results to JSON './aidge/aidge_core/benchmark/results/conv2d_aidge_backend_cpu.json'

As requested, you get the average time over 50 iterations and the result of the comparison with ONNXRuntime. If results are not equal, you [ x ] instead.

3. Generate a graph to view results

Using generated results JSON files, you can generate a bar plot comparing implementations relative performances for each parameter with generate_graph.py python file.

You must have results for the reference library and for each other library. Here is the command line and the generated image

python benchmarks/generate_graph.py \
    --operator-config benchmarks/operator_config/conv2d_config.json \
    --ref  benchmarks/results/conv2d_onnxruntime.json \
    --libs benchmarks/results/conv2d_torch.json \
           benchmarks/results/conv2d_aidge_backend_cpu.json
           benchmarks/results/conv2d_aidge_backend_cuda.json
           benchmarks/results/conv2d_aidge_export_cpp.json

Major modifications

add: some first operator configuration files
add: benchmark python script benchmark.py
add: inference and output scripts for torch and onnxruntime libraries
upd: new main cpp scripts for export
add: generate_graph.py to generate a bar plot of the results

Next improvements

choose the number of warmup and iterations in the command line parameters of benchmark.py
add compatibility with complete models
allow to run benchmark.py file with several libraries at once to avoid creating and loading an ONNX model which can be heavy with big input Tensors

Edited Apr 10, 2025 by Maxence Naud

[Add] benchmark scripts