Interface contracts

February 27, 2026

TL;DR — Prophet, my operating system, had a bin/doctor that checked whether each module was alive. It did not check whether each module produced the right output shape. Adding a protocol_version field and per-module output-shape probes to each doctor extended the contract from liveness to correctness-of-shape. Seven probes now run across seven modules. Five pass cleanly, one warns (a module called trick extracted no memories from a minimal transcript), one skips (a module called peep lacked the CRIB_DB environment variable (crib database) in the aggregator context). Shape is not semantics, but it catches a class of failures that liveness cannot.

The gap

Prophet has seven modules: crib (memory), hooker (policy engine), screen (classifier), book (task queue), spill (structured log), trick (memory extraction), and peep (external intelligence). Each module has a doctor subcommand. Before this work, every doctor answered one question: can this module start?

A doctor check for crib verified that Ruby, SQLite, ollama, and the embedding model were present. It did not verify that a write-then-retrieve round-trip produced XML in the expected <memory context_time="..."> envelope. A doctor check for screen verified that ollama was reachable. It did not verify that a trivially true classification produced yes on stdout.

The observable-by-default post established bin/doctor as the system’s primary health surface. But a liveness check is a necessary condition, not a sufficient one. A module can be alive and produce output in a format its callers do not expect.

Two things were missing: a version contract (does this module speak the same protocol as its caller?) and a shape contract (does the output look right?).

The method

Protocol version

Prophet’s aggregator, the central module health orchestrator, already declared an expected protocol version per module (version 1 for all seven) and checked each doctor’s response for a protocol_version field. No module reported one. Every module doctor produced a warning: no protocol_version reported.

Fix: one line per module. report['protocol_version'] = 1 before the report['ok'] calculation. Seven files, seven one-line changes.

Output-shape probes

Each module’s doctor gained a --probe flag. When present, the doctor runs a self-contained test after the existing liveness checks and adds a probe key to its JSON report. Prophet’s aggregator calls each module’s doctor --probe and collects results under probe:module_name keys.

Design constraints:

Model-dependent probes are gated on ollama (an inference server) reachability. Screen and trick require a running model. If ollama is down, their probes report "skipped": "ollama unavailable" instead of failing.
Write probes use temp databases. Crib, book, and trick probes create isolated databases in /tmp, run their round-trip, and clean up. Production data is never touched.
Probe failures are warnings, not hard failures. A probe that fails sets "ok": true with a "warn" key. The overall doctor health is not affected. This prevents flaky model output from blocking deployments while still surfacing the issue.

The probe table:

Module	Probe	Validates
crib	Write to temp DB, retrieve via FTS (full-text search)	Output contains `<memory context_time=`
hooker	Pipe minimal PreToolUse event (policy evaluation input)	Exit 0; if output present, JSON with `hookSpecificOutput` key
screen	Feed trivially true classification	Exactly `yes` or `no` on stdout
book	Init temp DB, add task, call `next`	JSON with `id` and `description` keys
spill	Query last 5 log entries	Each entry is JSON-serializable
trick	Feed single-fact transcript with temp DB	At least one entry written to temp DB
peep	Run with `--dry-run`	Exit 0 with stdout (empty is valid)

Aggregator

Prophet’s bin/doctor already ran a liveness loop calling each module’s doctor subcommand. A second pass now calls each module’s doctor --probe and extracts the probe key from the response.

Results

Running bin/doctor with probes enabled:

Module	Liveness	Protocol	Probe
crib	ok	1	ok
hooker	ok	1	ok (allow, no matching policies)
screen	ok	1	ok (answer: `yes`)
book	ok	1	ok
spill	ok	1	ok (5 lines)
trick	ok	1	warn: no entries extracted
peep	ok	1	skipped: CRIB_DB unavailable

Five of seven probes pass cleanly. Two produce expected non-failures:

Trick warns that no memories were extracted from a one-sentence transcript. This is an expected limitation of gemma3:1b — a single-line input often falls below the extraction threshold. The probe confirms trick can receive input, call the model, and write to crib. The zero-entry result reflects model capability, not a shape violation.

Peep skips because the aggregator runs without CRIB_DB set in its environment. Peep requires a crib database to load interests for classification. When run in an environment with CRIB_DB set, the probe executes normally.

The protocol version gap — seven modules, all missing protocol_version — was the kind of silent drift that liveness checks cannot catch. Every doctor reported ok: true while the aggregator warned no protocol_version reported on every module. A seven-line fix closed a seven-module gap.

Limits

Shape is not semantics. A probe that verifies <memory context_time= in the output does not verify the memory is relevant. A probe that verifies yes on stdout does not verify the classification is correct. Semantic correctness requires the evaluation harness, not a doctor check.

Model-dependent flakiness. Screen and trick probes depend on ollama model output. A model upgrade, a different quantization, or GPU memory pressure could cause a probe that passed yesterday to warn today. The warning-not-failure design absorbs this, but it means a probe warning is a signal to investigate, not a guarantee of breakage.

Hooker context dependency. The hooker probe sends a minimal event with the current working directory. If no policies exist in that directory, hooker exits 0 with no output — which the probe treats as success. This means the probe validates parsing and exit behavior but not the full policy-evaluation pipeline. A deeper probe would require a test policy file.

Peep environment dependency. The peep probe requires CRIB_DB in the environment. Running bin/doctor from a context without this variable causes the probe to skip. This is an environment gap, not a code gap, but it means the aggregator’s probe coverage depends on how the aggregator is invoked.