DeLaViña’s Sophie Heuristic
Overview
The Sophie heuristic was developed by Ermilinda DeLaViña and Bill Waller and first appeared in DeLaViña’s Graffiti.pc. It governs which new conjectures to keep in a discovery pipeline by examining the cover set of each hypothesis: a conjecture is accepted only if it “covers” at least one data point not already covered by any previously accepted conjecture.
How It Works
Cover set For a conjecture
cwith hypothesis predicateH, its cover set on a pandas DataFramedfis the Boolean mask:cover_c = c.hypothesis(df)
Union of existing covers Given a list
acceptedof conjectures already kept, form their union:if accepted: old_union = pd.concat( [c.hypothesis(df) for c in accepted], axis=1 ).any(axis=1) else: old_union = pd.Series(False, index=df.index)
Acceptance rule Accept the new conjecture
c_newif and only if there exists at least one row wherec_newcovers a point not inold_union:new_cover = c_new.hypothesis(df) delta = new_cover & ~old_union accept = bool(delta.any())
Example
Here is a complete example demonstrating Sophie in action on a small dataset:
import pandas as pd
from txgraffiti.logic import Property, Predicate
from txgraffiti.heuristics import sophie_accept
# 1) Sample data
df = pd.DataFrame({
'alpha': [1, 2, 3, 4],
'p': [True, True, False, False],
'q': [False, True, True, False],
})
# 2) Lift into TxGraffiti objects
A = Property('alpha', lambda df: df['alpha'])
P = Predicate('p', lambda df: df['p'])
Q = Predicate('q', lambda df: df['q'])
# 3) Build three conjectures
# c1 covers rows 0 and 1
c1 = P >> (A <= 10)
# c2 covers rows 1 and 2
c2 = Q >> (A <= 10)
# c3 covers rows 0,1,2 but adds no new row beyond c1,c2
P_or_Q = P | Q
c3 = P_or_Q >> (A <= 10)
# 4) Apply Sophie
accepted = []
# c1 is first → covers new rows ⇒ accept
assert sophie_accept(c1, accepted, df) is True
accepted.append(c1)
# c2 adds row 2 beyond c1’s cover ⇒ accept
assert sophie_accept(c2, accepted, df) is True
accepted.append(c2)
# c3 adds no new rows beyond {0,1,2} ⇒ reject
assert sophie_accept(c3, accepted, df) is False