CIF benchmark scripts overview has bad relative averages and unbalanced weighted averages
The CIF benchmarks have scripts to generate an overview. The are two problems:
bad relative averages
- For absolute metric values is computed as the average of the absolute metric values of the row.
- To change absolute values into relative values, we divide all values by the best value of the column.
- We thus compute the average relative value also as the relativized value of the absolute average. This is confusing, as these values are not shown then. One would expect the average still to be the average of the values in the row.
- I propose to make the averages always the row average, also for relative values.
unbalanced weighted averages
- The weighted averages are based on weights, which are currently based on the best absolute value of the row.
- Some models take a lot longer than others, and thus get a huge weight, that completely overshadows all other models (i.e.,
wafer_scanner_n1). - I propose to use different weights: the model with lowest best row value gets weight
1, the next one2, etc, until the highest best row value get the number of models as weight. Then still the more complex models get higher weights, but it is all not as extreme anymore. - This also fixes another issue I had in the past (only once though): if you configure really bad configurations, then the multiplication of all values with the weights, and that summed up, goes out of long value range. With much lower weights, that shouldn't be much of an issue anymore.