Skip to content

CIF data-based synthesis: variable ordering may be slow

I applied data-based synthesis to the following CIF model anonymous.cif, and noticed that it takes ~8 minutes before synthesis even start with reachability computations. I looked a bit deeper, and found that the bottleneck is in VarOrderHelper#createGraph, in particular in the three nested for-loops at the top of that method to populate Map<Pair<Integer, Integer>, Integer> graphEdges. By default data-based synthesis executes createGraph four times, and the times it took to populate graphEdges is:

  • First: 88931.2296ms
  • Second: 137134.374ms
  • Third: 88294.367101ms
  • Fourth: 142364.6363ms

I looked further into why this takes so long. It seems that graphEdges.merge(pair(i, j), 1, (a, b) -> a + b) uses the hash function of pair(i, j) internally, to efficiently store the keys of the mapping. And the type of keys is Pair<Integer, Integer>. Now it seems that Pair#hashCode has a very poor hash distribution when dealing with Integers, since their hash values are the integer values themselves. As a result, graphEdges.merge has to deal with many hash collisions, which means that it effectively becomes a linear-time operation instead of a contant-time one. (Other map operations, like put, would have the same problem.)

A solution could be to improve Pair#hashCode. I changed it to:

@Override
public int hashCode() {
    long bits = Objects.hashCode(left);
    bits ^= (long)Objects.hashCode(right) * 31;
    return (((int)bits) ^ ((int)(bits >> 32)));
}

which I adapted from java.awt.geom.Point2D#hashCode. With that, the times it took to populate graphEdges became:

  • First: 8165.9029ms
  • Second: 8903.2127ms
  • Third: 8521.618201ms
  • Fourth: 9969.175401ms

Which is a 10x improvement for this example. Another solution would be to not use Pair<Integer, Integer> for the keys, but some other, locally-defined data type with a good hash function.