Skip to content

Merkle Proofs

Merkle proofs let you prove that a specific relationship exists in a knowing snapshot without sharing the full graph. A proof is a compact JSON object (~3KB) that anyone can verify offline with a few SHA-256 computations.

Why

Three use cases:

  1. Audit and compliance. "Prove that service A called service B at the snapshot we deployed on Tuesday." The proof is cryptographic: if the edge existed, the proof verifies. If it didn't, no valid proof can be constructed.

  2. CI gates. "This PR introduces a cross-repo dependency. Prove it." A CI step generates the proof and attaches it to the PR. Reviewers verify without database access.

  3. Federated trust. Two knowing instances exchange proofs about shared edges. Each side verifies against its own snapshot root. Agreement on the root means agreement on the graph.

How It Works

The proof follows the hierarchical Merkle tree from a specific edge up to the repo root. At each level, the verifier needs the target hash and its sibling to recompute the parent.

Level 1: edge hash -> edge-type root
  (sibling edge hashes within the same package + edge type)

Level 2: edge-type root -> package root
  (sibling edge-type roots within the same package)

Level 3: package root -> repo root
  (sibling package roots)

Each level is a standard binary Merkle tree. The proof records the sibling hash and whether it's on the left or right at each step.

Proof Format

{
  "source": "ForTask",
  "target": "ComputeHITS",
  "edge_type": "calls",
  "snapshot_hash": "f479567d...",
  "proof": {
    "edge_hash": "6ea630b9...",
    "package_path": "github.com/blackwell-systems/knowing/internal/context",
    "edge_type": "calls",
    "edge_to_edge_type_root": [
      {"sibling": "6e917a7f...", "is_left": true},
      {"sibling": "c5ad6cbb...", "is_left": false}
    ],
    "edge_type_root": "420eb578...",
    "edge_type_to_package_root": [
      {"sibling": "a1b2c3d4...", "is_left": false}
    ],
    "package_root": "1c5c84e7...",
    "package_to_repo_root": [
      {"sibling": "de194914...", "is_left": true},
      {"sibling": "b759198e...", "is_left": false}
    ],
    "repo_root": "1cc3b4c6..."
  }
}

Fields

Field What it is
edge_hash The content-addressed hash of the edge being proved
package_path The source symbol's package
edge_type The relationship type (calls, imports, etc.)
edge_to_edge_type_root Binary proof steps from the edge to its edge-type Merkle root
edge_type_root The Merkle root of all edges of this type in this package
edge_type_to_package_root Binary proof steps from the edge-type root to the package root
package_root The Merkle root of all edge-type roots in this package
package_to_repo_root Binary proof steps from the package root to the repo root
repo_root The Merkle root of the entire repository snapshot

Proof Steps

Each step contains: - sibling: the hash of the node paired with the target at this tree level - is_left: whether the sibling is on the left (target is right) or right (target is left)

The verifier combines the target with the sibling using ComputeMerkleNodeHash(left, right) (which applies the "merkle\0" domain prefix) and uses the result as the target for the next level.

Verification

Verification is O(proof steps): typically 10-20 SHA-256 computations.

computed = edge_hash
for each step in edge_to_edge_type_root:
    if step.is_left:
        computed = hash("merkle\0" || step.sibling || computed)
    else:
        computed = hash("merkle\0" || computed || step.sibling)
assert computed == edge_type_root

(repeat for edge_type_to_package_root -> package_root)
(repeat for package_to_repo_root -> repo_root)

If all three levels match, the edge existed in the snapshot.

Domain separation for single-group levels: When a tree level has exactly one element (e.g., a package with only one edge type, or a repo with only one package), the proof includes a self-paired step where the sibling equals the target itself. This ensures the tree root is always distinct from its single child root, preventing structural collisions across tree levels. The verification logic handles this transparently (the self-paired step is just another combine operation).

Performance

Measured on the knowing live graph (~30,242 edges, 68 packages):

Metric Value
Proof generation 72us median
Proof verification 1.2us median
Proof size 16-18 steps, ~3KB JSON
Proof generation contract < 10ms
Proof verification contract < 100us

CLI

# Generate a proof
knowing prove -source "%ForTask" -target "%ComputeHITS" -type calls -o proof.json

# Verify offline (no database)
knowing verify proof.json

See CLI reference for full flag documentation.

Batch Proofs via knowing audit

knowing audit -proofs generates Merkle proofs for all cross-package edges in one call, not just individual edges. This is the preferred path when you need a complete compliance artifact rather than a proof for a single relationship.

knowing audit -proofs -o quarterly-audit.json

The output JSON includes the integrity check, graph summary, all cross-package edges with provenance and confidence, and a Merkle proof for each one. Performance: 5 proofs in ~210us. For the full flag reference, see knowing audit in the CLI reference.

Proof of Absence

knowing prove-absent proves that a specific relationship does NOT exist in the current snapshot. No tree restructuring was needed: the sorted binary tree already provides the ordering invariant required for absence proofs.

How It Works

BuildMerkleTree sorts all edge hashes by bytes.Compare before building the tree. To prove edge hash H is absent:

  1. Find the two adjacent sorted leaves H_prev and H_next such that H_prev < H < H_next.
  2. Generate standard inclusion proofs for both H_prev and H_next against the same repo root.
  3. Because the tree is sorted and H_prev and H_next are adjacent (no hash between them), H cannot exist in the tree.

Both neighbor proofs are verified against the same root, so an auditor can confirm the adjacency claim without database access.

CLI

knowing prove-absent -source "%PaymentService" -target "%UserDataService" -type calls
knowing prove-absent -source "%AuthHandler" -target "%StripeClient" -type calls -o absence.json

See knowing prove-absent in the CLI reference for full flag documentation.


Implementation

  • internal/snapshot/proof.go: GenerateProof, VerifyProof, binaryProof
  • internal/snapshot/proof_test.go: 9 tests including tamper detection
  • bench/merkle-diff/proof_bench_test.go: performance benchmark with contracts
  • cmd/knowing/prove.go, cmd/knowing/verify.go: CLI commands

What Proofs Do Not Cover

  • Cross-snapshot proofs. A proof is valid for one snapshot. To prove a relationship persisted across snapshots, you need proofs from each.
  • Semantic correctness. A proof proves the edge hash existed in the tree, not that the edge correctly represents the source code. Correctness depends on the extractor that produced the edge.

Relationship to Other Features

Feature How proofs interact
knowing fsck Fsck verifies the entire graph; proofs verify one edge. Complementary.
Federated sync Proofs are the trust primitive: "prove your graph includes this edge."
Bisection Binary search uses proof verification at each step.
Context pack dedup PackRoot is a different kind of proof: "this context selection is identical."
Incremental RWR cache Per-package Merkle roots in cache keys. When a package changes, only walks seeded from that package miss. Same hierarchical tree, different consumer.
Vocab association expiration Per-package roots anchored to vocab associations at recording time. When a package changes, only that package's learned associations expire.
Feedback expiration neighborhood_root on feedback records. Same Merkle-based expiration pattern applied to implicit feedback.