Roadmap
Next Up
| Item | Priority | Description |
|---|---|---|
mcp-assert ui command |
High | Web-based GUI served by the Go binary. Three modes: Explorer (connect to a server, browse tools, call them interactively), Tracer (live tool call timeline between agent and server via WebSocket), Debugger (visual assertion failure inspector with request/response diff). Frontend embedded via embed.FS, no separate install. Reuses existing createMCPClient, generateArgsFromSchema, runAssertion, auditSingleTool. This is the foundation for hosted audit and the quality registry. |
| Blog post | Ready | "We tested 38 MCP servers from Anthropic, Google, OpenAI, Microsoft, and AWS. Here's what we found." The scorecard data is the content; needs prose around it. Publish on docs site (mkdocs already deployed). |
| MCP server leaderboard | High | Static page on docs site ranking servers by coverage score and pass rate. Data exists for 39 servers. Becomes valuable once there's external traffic (blog post drives traffic). |
| antvis CI integration PR | Blocked on #292 merge | antvis maintainer asked us to add mcp-assert to their CI. Submit follow-up PR with evals/ directory (25 assertions) + GitHub Actions workflow using mcp-assert-action@v1. This is the first external adoption. |
| C# server suites | Medium | modelcontextprotocol/csharp-sdk has examples. Last major language gap (7th language). |
| Reference suite registry | Medium | Canonical protocol conformance assertions any MCP server can run. Independent of server-specific fixtures. "Does this server speak MCP correctly?" |
| Nix flake | Low | Nix users are quality-focused and vocal. |
mcp-assert ui Design
Architecture
mcp-assert ui --server "npx my-server" --port 7890
┌─────────────────────────────────┐
│ Go binary (mcp-assert) │
│ ├─ HTTP server (embed.FS) │
│ ├─ WebSocket (live trace) │
│ ├─ REST API (/api/tools, │
│ │ /api/call, /api/run) │
│ └─ MCP client (reuses all │
│ existing runner code) │
└─────────────────────────────────┘
Four modes (two phases)
Phase 1 (launch): Explorer + Debugger. Self-contained, no LLM key needed, demonstrates core value.
Explorer: Connect to any MCP server. See all tools, prompts, resources in a tree. Click a tool to see its JSON Schema. Click "Call" to invoke with editable args. Response displayed with syntax highlighting. "Save as assertion" button turns any call into a YAML test case. Interactive version of the audit command.
Debugger: Run a suite from the UI. Failures appear in a list. Click a failure: see request, actual response, expected values, specific expectation that failed. Side-by-side diff view. "Fix" button suggests YAML edits (visual version of --fix mode). "Export suite" generates YAML + GitHub Actions workflow.
Phase 2 (after launch): Agent + Tracer. Require LLM config and WebSocket proxy infrastructure.
Agent: Connect an LLM (OpenAI, Anthropic, etc.), let it drive the server's tools via ReAct loop. Watch the tool call chain in real time. Tool confirmation mode (approve/deny before execution). Record the full session as a trajectory YAML for CI regression testing. This is ProtoMCP's agent mode plus assertions.
Tracer: Proxy between an external agent (Claude Code, Cursor, etc.) and an MCP server. Every tool call appears in a live timeline via WebSocket. Click to expand: request args, response body, duration, isError. Filter by tool name, status, duration. Export session as trajectory YAML. Builds on the existing intercept command.
The funnel: Explorer ("does my server work?") leads to Debugger ("why did this fail?") leads to Agent ("how does an LLM use my tools?") leads to Tracer ("what is my production agent doing?"). Each mode feeds the next.
Frontend stack
Preact + Tailwind CSS, compiled via esbuild to a single bundle.js, embedded in the Go binary via //go:embed. Same API as React (JSX, useState, useEffect), 3KB instead of 45KB. esbuild compiles in ~50ms.
Dev workflow: edit JSX, run esbuild (one command, 50ms), built JS committed to repo. Users never run a build step; the frontend is already inside the Go binary they download.
Why Preact over alternatives: - vs React: same API, 1/15th the size. Matters for an embedded binary. - vs Vanilla JS: component reuse (ToolCard, SchemaForm, TraceEntry), reactive state for WebSocket streams, list rendering. Vanilla JS becomes unmanageable at 10+ interactive components. - vs HTMX: wrong fit for real-time WebSocket data streams and complex client-side state (trace timeline, form editing).
Inspiration from ProtoMCP (SahanUday/ProtoMCP): three-column layout (server list | main content | JSON-RPC log panel), auto-generated forms from JSON Schema, real-time trace timeline with color-coded events, tool confirmation mode for destructive calls. Our differentiation: "Save as assertion" button, expected vs actual diff, CI export, all three transports (stdio/SSE/HTTP), and the testing/assertion layer ProtoMCP completely lacks.
Scaling path
The single binary with embedded UI scales for the local tool (one user, localhost, 1-10 servers). Grafana uses the same pattern at millions of lines of frontend TypeScript.
For the hosted platform (multi-user, persistent storage, queued jobs, billing), the same Go engine (internal/runner, internal/assertion, internal/report) gets wrapped in a production web service with a database, auth, and CDN-served frontend. No rewrite; the local UI is both a standalone product and a prototype for the hosted version.
Phase 1: mcp-assert ui → single binary, localhost, embedded frontend
Phase 2: mcp-assert-cloud → deployed service, same Go engine, production frontend
Build local first. Adoption proves demand. Demand justifies hosted.
Platform Direction
The ui command is the local version. The platform is the hosted version of the same UI, with accounts and persistence.
Monetization sequence
OSS CLI (free) → UI local (free) → hosted audit (freemium) → registry (paid) → monitoring (SaaS)
| Tier | What | Pricing |
|---|---|---|
| Free (OSS) | CLI, GitHub Action, all assertion types, local UI | Free forever |
| Hosted audit | Paste a server URL, get results in the browser. No CLI install. | Free: 5 audits/month. Paid: unlimited. |
| Quality registry | Public leaderboard. Server authors claim listings, add verified badge, show CI status. | Free listing. Verified badge: paid. |
| Continuous monitoring | Run assertion suite on schedule against live servers. Alert on regression (Slack, email, PagerDuty). | $29/mo per server, $99/mo teams |
| Team dashboard | Shared view of org's MCP servers, coverage, pass rates, trends. Role-based access, audit logs. | Enterprise pricing |
The quality registry (mcp-assert.dev) becomes the "npm audit for MCP": users check before adopting a server, authors add the badge for trust. Revenue comes from verified listings and continuous monitoring.
Viability depends on MCP ecosystem growth. If MCP becomes the standard agent-to-tool protocol (Anthropic, OpenAI, Google all pushing it), the quality layer is infrastructure.
Open PRs and Issues
| PR/Issue | Repo | Status | What happens when it merges |
|---|---|---|---|
| antvis/mcp-server-chart#292 | Fix: isError on chart failures | Open, maintainer engaged | Submit CI integration PR immediately |
| grafana/mcp-grafana#793 | Fix: timestamp validation | Open, CLA signed | Update scorecard, unskip assertion |
| mark3labs/mcp-go#828 | Fix: stderr hooks | Open | Update scorecard |
| modelcontextprotocol/servers#4044 | Fix: blob content type (community) | Open | Update scorecard, unskip filesystem assertion |
| modelcontextprotocol/servers#4051 | Fix: puppeteer_navigate isError | Open (archived branch) | Update scorecard, unskip assertion |
| sammcj/mcp-devtools#258 | Fix: isError instead of internal error | Open | Update scorecard, unskip assertions |
| steipete/Peekaboo#108 | Issue: internal error on missing perms | Open | Swift fix, not pursuing PR |
Coverage Expansion Opportunities
| Server | Current | Potential | Blocker |
|---|---|---|---|
| Playwright | 67% (14/21) | ~85% | click/hover/drag need snapshot element refs (multi-step chaining) |
| Google Storage | 35% (6/17) | ~80% | Needs GCP credentials (use skip_unless_env) |
| Grafana | 34% (17/50) | ~60% | Needs running Grafana instance (docker-compose with service container) |
| git-mcp (idosal) | 39% (14/36) | ~60% | Many write tools need valid repo state |
| Perplexity | 100% auth errors only | 100% real | Needs API key ($5 free credits) |
MCP Protocol Coverage
10 of 12 MCP protocol methods covered. Two gaps remain (low priority, rarely used):
| Protocol area | Status |
|---|---|
Cancellation ($/cancelRequest) |
Not covered |
| Ping keepalive | Not covered |
Assertion Engine
| Item | Priority | Description |
|---|---|---|
| Structured recovery actions | Medium | When an assertion fails, return machine-readable guidance. Agents consuming mcp-assert output could self-correct. |
| Invariant drift detection | Medium | Snapshot state before a tool call, compare after. |
Recently Shipped
| Item | Version | Description |
|---|---|---|
| getsentry/XcodeBuildMCP suite | Unreleased | 10 assertions, 27 tools discovered, 100% clean. First macOS-specific server. Server #39. |
mcp-assert audit command |
Unreleased | Zero-config quality audit. Connects, discovers tools, calls each with schema-generated inputs, reports quality score. Generates starter YAML for CI. Discovery on-ramp to the YAML workflow. |
skip_unless_env field |
Unreleased | Conditional assertion skipping based on env vars. Live-backend and no-auth assertions coexist in same suite. |
| Per-assertion Docker isolation | Unreleased | docker: field in server YAML. Fresh container per assertion for safe write testing. |
| Coverage expansion | Unreleased | SQLite 100%, Memory 100%, engram 100%. Anthropic git 92%, Playwright 67%. |
| Perplexity, Peekaboo, CodeGraphContext, deep-research suites | Unreleased | 39 servers, 472 assertions, 6 languages, 15 bugs. |
| pytest plugin | 0.5.0 | pip install pytest-mcp-assert. Published to PyPI via release pipeline. |
| Badge snippet on pass | 0.5.0 | CLI and GitHub Action output ready-to-paste badge markdown. |
| SSE transport fix | 0.4.0 | Start() missing for SSE/HTTP clients. Found by dogfooding. |