Merkle Proofs
Merkle proofs let you prove that a specific relationship exists in a knowing snapshot without sharing the full graph. A proof is a compact JSON object (~3KB) that anyone can verify offline with a few SHA-256 computations.
Why
Three use cases:
-
Audit and compliance. "Prove that service A called service B at the snapshot we deployed on Tuesday." The proof is cryptographic: if the edge existed, the proof verifies. If it didn't, no valid proof can be constructed.
-
CI gates. "This PR introduces a cross-repo dependency. Prove it." A CI step generates the proof and attaches it to the PR. Reviewers verify without database access.
-
Federated trust. Two knowing instances exchange proofs about shared edges. Each side verifies against its own snapshot root. Agreement on the root means agreement on the graph.
How It Works
The proof follows the hierarchical Merkle tree from a specific edge up to the repo root. At each level, the verifier needs the target hash and its sibling to recompute the parent.
Level 1: edge hash -> edge-type root
(sibling edge hashes within the same package + edge type)
Level 2: edge-type root -> package root
(sibling edge-type roots within the same package)
Level 3: package root -> repo root
(sibling package roots)
Each level is a standard binary Merkle tree. The proof records the sibling hash and whether it's on the left or right at each step.
Proof Format
{
"source": "ForTask",
"target": "ComputeHITS",
"edge_type": "calls",
"snapshot_hash": "f479567d...",
"proof": {
"edge_hash": "6ea630b9...",
"package_path": "github.com/blackwell-systems/knowing/internal/context",
"edge_type": "calls",
"edge_to_edge_type_root": [
{"sibling": "6e917a7f...", "is_left": true},
{"sibling": "c5ad6cbb...", "is_left": false}
],
"edge_type_root": "420eb578...",
"edge_type_to_package_root": [
{"sibling": "a1b2c3d4...", "is_left": false}
],
"package_root": "1c5c84e7...",
"package_to_repo_root": [
{"sibling": "de194914...", "is_left": true},
{"sibling": "b759198e...", "is_left": false}
],
"repo_root": "1cc3b4c6..."
}
}
Fields
| Field | What it is |
|---|---|
edge_hash |
The content-addressed hash of the edge being proved |
package_path |
The source symbol's package |
edge_type |
The relationship type (calls, imports, etc.) |
edge_to_edge_type_root |
Binary proof steps from the edge to its edge-type Merkle root |
edge_type_root |
The Merkle root of all edges of this type in this package |
edge_type_to_package_root |
Binary proof steps from the edge-type root to the package root |
package_root |
The Merkle root of all edge-type roots in this package |
package_to_repo_root |
Binary proof steps from the package root to the repo root |
repo_root |
The Merkle root of the entire repository snapshot |
Proof Steps
Each step contains:
- sibling: the hash of the node paired with the target at this tree level
- is_left: whether the sibling is on the left (target is right) or right (target is left)
The verifier combines the target with the sibling using ComputeMerkleNodeHash(left, right) (which applies the "merkle\0" domain prefix) and uses the result as the target for the next level.
Verification
Verification is O(proof steps): typically 10-20 SHA-256 computations.
computed = edge_hash
for each step in edge_to_edge_type_root:
if step.is_left:
computed = hash("merkle\0" || step.sibling || computed)
else:
computed = hash("merkle\0" || computed || step.sibling)
assert computed == edge_type_root
(repeat for edge_type_to_package_root -> package_root)
(repeat for package_to_repo_root -> repo_root)
If all three levels match, the edge existed in the snapshot.
Domain separation for single-group levels: When a tree level has exactly one element (e.g., a package with only one edge type, or a repo with only one package), the proof includes a self-paired step where the sibling equals the target itself. This ensures the tree root is always distinct from its single child root, preventing structural collisions across tree levels. The verification logic handles this transparently (the self-paired step is just another combine operation).
Performance
Measured on the knowing live graph (~30,242 edges, 68 packages):
| Metric | Value |
|---|---|
| Proof generation | 72us median |
| Proof verification | 1.2us median |
| Proof size | 16-18 steps, ~3KB JSON |
| Proof generation contract | < 10ms |
| Proof verification contract | < 100us |
CLI
# Generate a proof
knowing prove -source "%ForTask" -target "%ComputeHITS" -type calls -o proof.json
# Verify offline (no database)
knowing verify proof.json
See CLI reference for full flag documentation.
Batch Proofs via knowing audit
knowing audit -proofs generates Merkle proofs for all cross-package edges in one call, not just individual edges. This is the preferred path when you need a complete compliance artifact rather than a proof for a single relationship.
knowing audit -proofs -o quarterly-audit.json
The output JSON includes the integrity check, graph summary, all cross-package edges with provenance and confidence, and a Merkle proof for each one. Performance: 5 proofs in ~210us. For the full flag reference, see knowing audit in the CLI reference.
Proof of Absence
knowing prove-absent proves that a specific relationship does NOT exist in the current snapshot. No tree restructuring was needed: the sorted binary tree already provides the ordering invariant required for absence proofs.
How It Works
BuildMerkleTree sorts all edge hashes by bytes.Compare before building the tree. To prove edge hash H is absent:
- Find the two adjacent sorted leaves
H_prevandH_nextsuch thatH_prev < H < H_next. - Generate standard inclusion proofs for both
H_prevandH_nextagainst the same repo root. - Because the tree is sorted and
H_prevandH_nextare adjacent (no hash between them),Hcannot exist in the tree.
Both neighbor proofs are verified against the same root, so an auditor can confirm the adjacency claim without database access.
CLI
knowing prove-absent -source "%PaymentService" -target "%UserDataService" -type calls
knowing prove-absent -source "%AuthHandler" -target "%StripeClient" -type calls -o absence.json
See knowing prove-absent in the CLI reference for full flag documentation.
Implementation
internal/snapshot/proof.go:GenerateProof,VerifyProof,binaryProofinternal/snapshot/proof_test.go: 9 tests including tamper detectionbench/merkle-diff/proof_bench_test.go: performance benchmark with contractscmd/knowing/prove.go,cmd/knowing/verify.go: CLI commands
What Proofs Do Not Cover
- Cross-snapshot proofs. A proof is valid for one snapshot. To prove a relationship persisted across snapshots, you need proofs from each.
- Semantic correctness. A proof proves the edge hash existed in the tree, not that the edge correctly represents the source code. Correctness depends on the extractor that produced the edge.
Relationship to Other Features
| Feature | How proofs interact |
|---|---|
knowing fsck |
Fsck verifies the entire graph; proofs verify one edge. Complementary. |
| Federated sync | Proofs are the trust primitive: "prove your graph includes this edge." |
| Bisection | Binary search uses proof verification at each step. |
| Context pack dedup | PackRoot is a different kind of proof: "this context selection is identical." |
| Incremental RWR cache | Per-package Merkle roots in cache keys. When a package changes, only walks seeded from that package miss. Same hierarchical tree, different consumer. |
| Vocab association expiration | Per-package roots anchored to vocab associations at recording time. When a package changes, only that package's learned associations expire. |
| Feedback expiration | neighborhood_root on feedback records. Same Merkle-based expiration pattern applied to implicit feedback. |