Data Flow
This section traces a single change from developer commit to fully-enriched graph state.
End-to-End: One Commit, One Graph Update
Developer commits code
│
▼
┌───────────────────────────────────────────────────────┐
│ 1. GitWatcher detects .git/HEAD change (fsnotify) │
│ ├── Debounce timer fires after 500ms of quiet │
│ ├── Read new HEAD commit hash from .git/HEAD │
│ ├── Compare to last known commit (stored in repos) │
│ └── If different: resolve file diff via git │
└───────────────────────────┬───────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────┐
│ 2. GitDiffFiles resolves changed/added/deleted files │
│ ├── Runs: git diff --name-status oldCommit newCommit│
│ ├── Parses status codes: M (modified), A (added), │
│ │ D (deleted), R (renamed → delete old + add new) │
│ └── Returns three slices: changed, added, deleted │
└───────────────────────────┬───────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────┐
│ 3. CommitEvent sent to watchLoop via GitWatcher.events │
│ ├── watchLoop combines changed + added + deleted │
│ │ into a single indexRequest │
│ └── Sends indexRequest to indexCh (non-blocking) │
└───────────────────────────┬───────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────┐
│ 4. indexWorker receives indexRequest from indexCh │
│ ├── Resolves HEAD commit hash │
│ └── Acquires daemon write lock (d.mu.Lock()) │
└───────────────────────────┬───────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────┐
│ 5. IndexFunc runs (write lock held) │
│ │
│ For deleted files: │
│ ├── EdgesBySourceFile() to capture "removed" set │
│ ├── DeleteEdgesBySourceFile() │
│ ├── DeleteNodesByFile() │
│ └── Record "removed" edge events │
│ │
│ For changed files: │
│ ├── Delete old nodes/edges (same as deleted) │
│ ├── Re-extract via tree-sitter worker pool │
│ ├── Compute edge diff (old vs. new) │
│ └── Record "added" and "removed" edge events │
│ │
│ For added files: │
│ ├── Extract via tree-sitter worker pool │
│ └── Record "added" edge events │
│ │
│ Batch insert all new nodes, edges, and files │
│ Compute new snapshot (hierarchical Merkle tree: │
│ repo root -> package roots -> edge-type roots) │
│ Link snapshot to parent; store commit hash │
│ Resolve cross-repo dangling edges │
└───────────────────────────┬───────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────┐
│ 6. Release write lock (d.mu.Unlock()) │
│ Graph is now queryable with ast_inferred edges. │
│ MCP queries resume immediately. │
└───────────────────────────┬───────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────┐
│ 7. Trigger scoped LSP enrichment (background goroutine)│
│ No write lock held; enrichment uses SQLite WAL mode │
│ │
│ ├── For each detected language server (gopls, │
│ │ pyright, tsserver, rust-analyzer, jdtls, etc.): │
│ ├── Open changed/added files (textDocument/didOpen) │
│ ├── Edge upgrade pass: │
│ │ For each ast_inferred edge in changed files: │
│ │ Query GetDefinition at call-site position │
│ │ If confirmed: delete old edge, insert │
│ │ lsp_resolved edge (confidence 0.9) │
│ ├── Edge discovery pass: │
│ │ For each changed file: │
│ │ GetDocumentSymbols │
│ │ For types: GetImplementation → implements │
│ │ For funcs: GetReferences → references │
│ ├── Close all files │
│ └── Shutdown language server, repeat for next │
└───────────────────────────────────────────────────────┘
┌───────────────────────────────────────────────────────┐
│ 8. Embedding vector indexing (if --embeddings enabled) │
│ Runs in MCP server background at startup. │
│ │
│ ├── Load all nodes via NodesByName("%") │
│ ├── Filter noise (vendor, dist, mocks) │
│ ├── Batch embed in chunks of 64 via hugot ONNX │
│ │ (nomic-code, 768 dims, ~13ms/text batched) │
│ ├── Add vectors to in-memory HNSW index │
│ ├── Persist vectors to SQLite embeddings table │
│ │ (keyed by node_hash + model for cache reuse) │
│ └── Subsequent queries read cached vectors │
│ (re-rank disabled; gap-fill neutral, session 23)│
└───────────────────────────────────────────────────────┘
Timing Summary
| Phase | Duration (6,000-node repo) | Lock held | Queries blocked |
|---|---|---|---|
| Git diff resolution | ~10ms | None | No |
| Tier 1 extraction (tree-sitter, parallel) | ~1.8s | Write lock | Yes |
| Snapshot computation (hierarchical Merkle tree) | ~5ms | Write lock | Yes |
| Tier 2 enrichment (LSP) | ~8s | None (WAL) | No (background) |
| Embedding index (if enabled) | ~65s (5K nodes) | None (WAL) | No (background) |
The write lock is held only during Tier 1 extraction and snapshot computation. Queries are blocked for approximately 1.5 seconds per commit. Enrichment and embedding indexing run in the background without blocking anything. Embedding vectors are cached in SQLite; subsequent MCP server startups skip re-embedding for nodes whose vectors are already cached.