Semantic PR Diff
knowing generates a relationship-level diff for pull requests: not what text changed, but what the change does to the system graph. This is exposed as MCP tools, a CLI command, and a CI integration (GitHub Actions workflow).
Why
Code review today is text review. A reviewer sees that 40 lines changed in auth/middleware.go and makes a judgment about blast radius based on experience and intuition. They might grep for callers, or they might not. They almost certainly do not check cross-repo impact.
Semantic PR diff makes relationship impact visible without effort. It answers the questions reviewers should ask but often do not: Does this change add new cross-repo dependencies? Does it increase the blast radius of a critical function? Does it affect symbols owned by other teams?
This is the most visible feature knowing can ship. Developers see it on every PR. It demonstrates the value of the graph without requiring anyone to change their workflow or learn a new tool.
Output Format
The knowing diff command produces a graph-level summary:
knowing diff --base main --head feature/auth-refactor
Graph impact for PR #482: refactor auth middleware
Symbols changed: 4
Edges added: 3
Edges removed: 1
Edges modified: 2
+ auth-service -> user-service.GetUser (calls, confidence 1.0)
New cross-repo dependency. user-service is owned by @platform-team.
+ auth-service -> billing-service.ValidateSubscription (calls, confidence 1.0)
New cross-repo dependency. billing-service is owned by @billing-team.
+ auth-service -> notification-service.SendAlert (calls, confidence 0.8)
New cross-repo dependency (inferred from import, no direct call site found).
- auth-service -> legacy-session-store.Lookup (calls, confidence 1.0)
Cross-repo dependency removed.
~ AuthMiddleware.Validate blast radius: 12 callers -> 47 callers
Gained 35 transitive callers via new edges to user-service and billing-service.
~ AuthMiddleware.TokenRefresh signature changed
8 direct callers across 3 repos. 2 callers are in repos not owned by PR author.
Ownership impact:
Before: consumers in 1 team (@auth-team)
After: consumers in 3 teams (@auth-team, @platform-team, @billing-team)
Staleness:
2 edges in the blast radius were last verified > 14 days ago.
Run `knowing index --repo github.com/org/billing-service` to refresh.
How It Works
1. PR opened (or push to PR branch)
|
v
2. knowing indexes the PR branch, producing a new snapshot
|
v
3. Merkle diff between base snapshot and PR snapshot
(only changed subtrees are traversed)
|
v
4. For each changed edge:
- Classify: added, removed, modified
- Look up ownership for affected symbols
- Compute blast radius delta (before vs. after)
|
v
5. Format and post as PR comment or check annotation
The Merkle diff (via DiffHierarchicalTrees in internal/snapshot/hierarchical.go) compares package roots first and only descends into edge-type roots for packages that changed. This makes the diff fast even for large graphs.
Removed-edge correctness: Migration 013 (add_edge_event_data.sql) added source_hash, target_hash, edge_type, confidence, and provenance columns to edge_events. SnapshotDiff uses COALESCE to read from the event record first, falling back to the edges table for pre-migration events. Removed-edge diffs return full edge data, not just hashes.
Implementation
The implementation lives in internal/diff/:
semantic.go:SemanticDiffcomputes the relationship-level diff between two snapshots. Classifies edges as added, removed, or modified. Annotates with ownership and blast radius delta.impact.go:ImpactAnalysiscomputes per-symbol blast radius before and after, identifying new and lost transitive callers.types.go:SemanticDiffResult,EdgeChange,BlastRadiusDelta,OwnershipDeltatypes.ci.go: CI-specific helpers for the GitHub Actions integration.
Key types:
type SemanticDiffResult struct {
BaseSnapshot Hash
HeadSnapshot Hash
SymbolsChanged int
EdgesAdded []EdgeChange
EdgesRemoved []EdgeChange
EdgesModified []EdgeChange
BlastRadiusDelta []BlastRadiusDelta
OwnershipImpact *OwnershipDelta
StaleEdges []Edge
}
type EdgeChange struct {
Edge Edge
SourceRepo string
TargetRepo string
CrossRepo bool // true if source and target are in different repos
OwnerTeam string
}
type BlastRadiusDelta struct {
Symbol Node
CallersBefore int
CallersAfter int
NewCallers []Node
LostCallers []Node
}
type OwnershipDelta struct {
TeamsBefore []string
TeamsAfter []string
NewTeams []string // teams newly affected by this change
}
MCP Tools
Three MCP tools expose semantic diff functionality to agents:
| Tool | Purpose |
|---|---|
snapshot_diff |
Raw edge-level diff between any two snapshot hashes |
semantic_diff |
Relationship-level diff with ownership and blast radius annotations |
pr_impact |
Semantic diff specialized for a PR: resolves base/head from git, formats for review |
Agents use pr_impact before committing to verify a change does not introduce unexpected cross-repo dependencies or blast radius growth.
CLI Command: knowing diff
knowing diff computes the semantic diff between two snapshots:
# Diff between two snapshot hashes
knowing diff -db graph.db <base-hash> <head-hash>
# JSON output for programmatic use
knowing diff -db graph.db -format json <base-hash> <head-hash>
# GCF output (token-efficient, for agent consumption)
knowing diff -db graph.db -format gcf <base-hash> <head-hash>
The output format matches the PR comment format. Edge changes include added/removed status annotations in both GCF and JSON output.
CI Integration
.github/workflows/pr-semantic-diff.yml implements the GitHub Actions integration. It runs on every PR against main:
- Checks out the repo with full history (
fetch-depth: 0). - Builds the
knowingbinary. - Indexes the base branch commit into
base.db. - Indexes the head branch commit into
head.db. - Merges base graph data into
head.db(soSnapshotDiffhas both snapshots in one database). - Runs
knowing diffto producediff-result.json. - Posts or updates a PR comment with the diff summary (nodes added/removed/modified, edges added/removed, with formatted lists truncated at 20 nodes and 15 edges).
The workflow uses GOWORK=off to isolate module resolution during CI indexing.
What This Does Not Do
- Does not block PRs by default. The diff is informational. Teams can configure thresholds in
knowing audit-diffflags to enforce constraints, but the default is comment-only. - Does not replace code review. It augments it with information reviewers cannot easily get on their own.
- Does not require a running daemon in CI. The GitHub Action builds a fresh
knowingbinary and operates on temporary database files created during the job.
Retrofit Cost
Low. Semantic diff is a read-only consumer of the snapshot chain and Merkle diff. It can be added at any time after SnapshotDiff is implemented. The key prerequisite is migration 013: without full edge data in edge_events, removed-edge diffs return incomplete information.