KR'23: research paper

Active Disjunctive Constraint Acquisition


General topic

constraint acquisition, machine learning, contract inference

Motivation

Precondition inference. Function annotations under the form of function pre/postconditions are crucial for many software engineering, program verification and more generaly reverse engineering applications. Unfortunately, such annotations are rarely available and must be retrofit by hand. In prior work, we explored how Constraint Acquisition (CA), a learning framework from Constraint Programming, can be leveraged to automatically infer program preconditions in a black-box manner, from input-output observations. Such preconditions specify on which inputs a function execution terminates without crashing. This led to our PreCA proposal.

Constraint Acquisition (CA) limitations. CA is a method for learning users’ concepts by representing them as a conjunction of constraints. To do so, it takes as input a set of variables and a set of constraints (called the bias) over these variables. From such knowledge, it extracts membership queries (i.e., full assignments of the problem variables), and asks an oracle to classify them. Such an oracle is usually the user, but it can also be an automated one like in precondition inference. A positive classification means that the assignment is a solution of the target concept, while a negative one means the opposite. CA will thus generate a finite number of membership queries to converge to a unique solution as a conjunction of the constraints from the bias.

While this approach works well for many combinatorial problems over finite domains, some applications like precondition inference require the acquisition of disjunctive constraints, possibly coming from logical implications or negations. For example, the precondition of the sum function is: size <= 0 or list != NULL.

int sum(int* list, int size) {
    int res = 0;

    for (int i = 0; i < size; i++) {
        res += list[i];
    }

    return res;
}

What happens if we feed CA with the variables {list, size} and the bias {list != NULL, list = NULL, size <= 0, size > 0} over this example? In fact, it will not be able to infer the correct precondition and may end up learning list != NULL. Indeed, CA cannot infer disjunctions. The workaround proposed in prior work is to add all disjunctions up to some fixed size into the bias. Still, it is not satisfactory. Indeed, it leads to the bias size explosion, utlimately impeding inference.

DCA: Disjunctive Constraint Acquisition

In this paper, we propose the first CA algorithm tailored to the automatic inference of disjunctive constraints, named DCA. Especially, over the previous example, DCA easily infers the correct precondition size <= 0 or list != NULL, automatically generating the following four queries:

list size classification
NULL 1 no
NULL 0 yes
[1, 2] 1 yes
[1, 3] 0 yes

How does it work? DCA relies on MSS enumeration to infer the target concept. Given an input set of constraints (where the negation of each constraint is also included), it infers the correct concept by asking only membership queries. If a membership query e is classified negatively, DCA excludes from the result the best approximation available for e – computed through MSS enumeration. As usual CA, we prove that DCA always terminates, generates informative queries only, and returns a result that agrees with all tested queries. Moreover, if the target concept can be expressed as a conjunction of disjunctive constraints from the input set of constraints, then DCA infers it.

Experimental results. We compared DCA to usual CA to infer random, domain, and ultra-metric constraints. We also show that DCA outperforms our prior PreCA proposal. Especially, DCA is able to infer preconditions with disjunctions of size up to 7 while PreCA is limited to disjunctions of size 3. Overall, DCA is able to infer more concepts than basic CA-based approches even if we give it the good disjunction size.

Major Contributions

In summary, this paper makes the following major contributions:

  • We propose DCA the first inference framework extending CA to infer conjunctions of disjunctive constraints. It elegantly leverages maximal satisfiable subsets (MSS) enumeration to render CA more expressive to efficiently handle disjunctions;
  • We prove that DCA enjoys good theoretical properties. Especially, it shares the same guarantees as usual CA, showing that DCA is an appropriate generalization of CA for disjunctive contexts.
  • We evaluate our new learning framework DCA over two different benchmarks, including precondition inference tasks. We show that it highly outperforms CA-based approaches, being faster and more expressive.

Further information