Sophie Acceptance Heuristic

Overview

The Sophie heuristic, introduced by Ermelinda DeLaViña, filters conjectures based on the novelty of their hypothesis coverage. It encourages exploration by ensuring that each accepted conjecture contributes something new to the covered portion of the dataset.

This heuristic evaluates a candidate conjecture by checking whether it covers any rows not already covered by previously accepted ones.

Acceptance Criteria

Let \(H\) be the hypothesis of a new conjecture. Define:

  • \(\text{cover}(H)\) — the set of rows where H(df) evaluates to True.

Then the Sophie heuristic accepts a new conjecture if:

\[\text{cover}(H_{\text{new}}) \not\subseteq \bigcup_i \text{cover}(H_i),\]

where \(H_i\) are the hypotheses of all previously accepted conjectures.

Function Signature

txgraffiti.heuristics.delavina.sophie_accept(new_conj, accepted, df)[source]

Decide whether to accept a new conjecture based on its cover set.

A conjecture’s cover set is the set of rows where its hypothesis holds. Under the Sophie heuristic, we accept new_conj only if its cover set includes at least one row not already covered by the union of cover sets of all accepted conjectures.

Parameters:
  • new_conj (Conjecture) – The candidate conjecture whose hypothesis cover set is tested.

  • accepted (list of Conjecture) – Previously accepted conjectures. Their hypothesis masks are unioned to form the existing coverage.

  • df (pandas.DataFrame or KnowledgeTable) – The data on which hypotheses are evaluated.

Returns:

True if new_conj covers at least one additional row beyond the union of all accepted cover sets, False otherwise.

Return type:

bool

Examples

>>> import pandas as pd
>>> from txgraffiti.logic import Property, Predicate, Conjecture
>>> from txgraffiti.heuristics.delavina import sophie_accept
>>> df = pd.DataFrame({
...     'alpha':     [1, 2, 3, 4],
...     'connected': [True, False, True, False],
... })
>>> A = Property('alpha', lambda df: df['alpha'])
>>> P = Predicate('connected', lambda df: df['connected'])
>>> # conj1 covers rows 0 and 2
>>> conj1 = P >> (A <= 10)
>>> # conj2 covers the same rows → no new coverage
>>> conj2 = P >> (A >= 0)
>>> sophie_accept(conj2, [conj1], df)
False
>>> # conj3 covers row 0,2, plus row 1 (connected=False so hypothesis False)
>>> # so still no new coverage
>>> sophie_accept(conj3:= (P | ~P) >> (A >= 0), [conj1], df)
True

Notes

  • This heuristic works well in conjunction with significance-based heuristics like Dalmatian Acceptance Heuristic or generality-based heuristics like Morgan Acceptance Heuristic.

  • It can be used to encourage diversity in the discovered conjectures, by demanding nonredundant hypothesis domains.

See Also