Skip to content

Roadmap

Next Up

Item Priority Description
mcp-assert ui command High Web-based GUI served by the Go binary. Three modes: Explorer (connect to a server, browse tools, call them interactively), Tracer (live tool call timeline between agent and server via WebSocket), Debugger (visual assertion failure inspector with request/response diff). Frontend embedded via embed.FS, no separate install. Reuses existing createMCPClient, generateArgsFromSchema, runAssertion, auditSingleTool. This is the foundation for hosted audit and the quality registry.
Blog post Ready "We tested 38 MCP servers from Anthropic, Google, OpenAI, Microsoft, and AWS. Here's what we found." The scorecard data is the content; needs prose around it. Publish on docs site (mkdocs already deployed).
MCP server leaderboard High Static page on docs site ranking servers by coverage score and pass rate. Data exists for 39 servers. Becomes valuable once there's external traffic (blog post drives traffic).
antvis CI integration PR Blocked on #292 merge antvis maintainer asked us to add mcp-assert to their CI. Submit follow-up PR with evals/ directory (25 assertions) + GitHub Actions workflow using mcp-assert-action@v1. This is the first external adoption.
C# server suites Medium modelcontextprotocol/csharp-sdk has examples. Last major language gap (7th language).
Reference suite registry Medium Canonical protocol conformance assertions any MCP server can run. Independent of server-specific fixtures. "Does this server speak MCP correctly?"
Nix flake Low Nix users are quality-focused and vocal.

mcp-assert ui Design

Architecture

mcp-assert ui --server "npx my-server" --port 7890

┌─────────────────────────────────┐
│  Go binary (mcp-assert)        │
│  ├─ HTTP server (embed.FS)     │
│  ├─ WebSocket (live trace)     │
│  ├─ REST API (/api/tools,      │
│  │   /api/call, /api/run)      │
│  └─ MCP client (reuses all     │
│      existing runner code)      │
└─────────────────────────────────┘

Four modes (two phases)

Phase 1 (launch): Explorer + Debugger. Self-contained, no LLM key needed, demonstrates core value.

Explorer: Connect to any MCP server. See all tools, prompts, resources in a tree. Click a tool to see its JSON Schema. Click "Call" to invoke with editable args. Response displayed with syntax highlighting. "Save as assertion" button turns any call into a YAML test case. Interactive version of the audit command.

Debugger: Run a suite from the UI. Failures appear in a list. Click a failure: see request, actual response, expected values, specific expectation that failed. Side-by-side diff view. "Fix" button suggests YAML edits (visual version of --fix mode). "Export suite" generates YAML + GitHub Actions workflow.

Phase 2 (after launch): Agent + Tracer. Require LLM config and WebSocket proxy infrastructure.

Agent: Connect an LLM (OpenAI, Anthropic, etc.), let it drive the server's tools via ReAct loop. Watch the tool call chain in real time. Tool confirmation mode (approve/deny before execution). Record the full session as a trajectory YAML for CI regression testing. This is ProtoMCP's agent mode plus assertions.

Tracer: Proxy between an external agent (Claude Code, Cursor, etc.) and an MCP server. Every tool call appears in a live timeline via WebSocket. Click to expand: request args, response body, duration, isError. Filter by tool name, status, duration. Export session as trajectory YAML. Builds on the existing intercept command.

The funnel: Explorer ("does my server work?") leads to Debugger ("why did this fail?") leads to Agent ("how does an LLM use my tools?") leads to Tracer ("what is my production agent doing?"). Each mode feeds the next.

Frontend stack

Preact + Tailwind CSS, compiled via esbuild to a single bundle.js, embedded in the Go binary via //go:embed. Same API as React (JSX, useState, useEffect), 3KB instead of 45KB. esbuild compiles in ~50ms.

Dev workflow: edit JSX, run esbuild (one command, 50ms), built JS committed to repo. Users never run a build step; the frontend is already inside the Go binary they download.

Why Preact over alternatives: - vs React: same API, 1/15th the size. Matters for an embedded binary. - vs Vanilla JS: component reuse (ToolCard, SchemaForm, TraceEntry), reactive state for WebSocket streams, list rendering. Vanilla JS becomes unmanageable at 10+ interactive components. - vs HTMX: wrong fit for real-time WebSocket data streams and complex client-side state (trace timeline, form editing).

Inspiration from ProtoMCP (SahanUday/ProtoMCP): three-column layout (server list | main content | JSON-RPC log panel), auto-generated forms from JSON Schema, real-time trace timeline with color-coded events, tool confirmation mode for destructive calls. Our differentiation: "Save as assertion" button, expected vs actual diff, CI export, all three transports (stdio/SSE/HTTP), and the testing/assertion layer ProtoMCP completely lacks.

Scaling path

The single binary with embedded UI scales for the local tool (one user, localhost, 1-10 servers). Grafana uses the same pattern at millions of lines of frontend TypeScript.

For the hosted platform (multi-user, persistent storage, queued jobs, billing), the same Go engine (internal/runner, internal/assertion, internal/report) gets wrapped in a production web service with a database, auth, and CDN-served frontend. No rewrite; the local UI is both a standalone product and a prototype for the hosted version.

Phase 1: mcp-assert ui       → single binary, localhost, embedded frontend
Phase 2: mcp-assert-cloud    → deployed service, same Go engine, production frontend

Build local first. Adoption proves demand. Demand justifies hosted.

Platform Direction

The ui command is the local version. The platform is the hosted version of the same UI, with accounts and persistence.

Monetization sequence

OSS CLI (free) → UI local (free) → hosted audit (freemium) → registry (paid) → monitoring (SaaS)
Tier What Pricing
Free (OSS) CLI, GitHub Action, all assertion types, local UI Free forever
Hosted audit Paste a server URL, get results in the browser. No CLI install. Free: 5 audits/month. Paid: unlimited.
Quality registry Public leaderboard. Server authors claim listings, add verified badge, show CI status. Free listing. Verified badge: paid.
Continuous monitoring Run assertion suite on schedule against live servers. Alert on regression (Slack, email, PagerDuty). $29/mo per server, $99/mo teams
Team dashboard Shared view of org's MCP servers, coverage, pass rates, trends. Role-based access, audit logs. Enterprise pricing

The quality registry (mcp-assert.dev) becomes the "npm audit for MCP": users check before adopting a server, authors add the badge for trust. Revenue comes from verified listings and continuous monitoring.

Viability depends on MCP ecosystem growth. If MCP becomes the standard agent-to-tool protocol (Anthropic, OpenAI, Google all pushing it), the quality layer is infrastructure.

Open PRs and Issues

PR/Issue Repo Status What happens when it merges
antvis/mcp-server-chart#292 Fix: isError on chart failures Open, maintainer engaged Submit CI integration PR immediately
grafana/mcp-grafana#793 Fix: timestamp validation Open, CLA signed Update scorecard, unskip assertion
mark3labs/mcp-go#828 Fix: stderr hooks Open Update scorecard
modelcontextprotocol/servers#4044 Fix: blob content type (community) Open Update scorecard, unskip filesystem assertion
modelcontextprotocol/servers#4051 Fix: puppeteer_navigate isError Open (archived branch) Update scorecard, unskip assertion
sammcj/mcp-devtools#258 Fix: isError instead of internal error Open Update scorecard, unskip assertions
steipete/Peekaboo#108 Issue: internal error on missing perms Open Swift fix, not pursuing PR

Coverage Expansion Opportunities

Server Current Potential Blocker
Playwright 67% (14/21) ~85% click/hover/drag need snapshot element refs (multi-step chaining)
Google Storage 35% (6/17) ~80% Needs GCP credentials (use skip_unless_env)
Grafana 34% (17/50) ~60% Needs running Grafana instance (docker-compose with service container)
git-mcp (idosal) 39% (14/36) ~60% Many write tools need valid repo state
Perplexity 100% auth errors only 100% real Needs API key ($5 free credits)

MCP Protocol Coverage

10 of 12 MCP protocol methods covered. Two gaps remain (low priority, rarely used):

Protocol area Status
Cancellation ($/cancelRequest) Not covered
Ping keepalive Not covered

Assertion Engine

Item Priority Description
Structured recovery actions Medium When an assertion fails, return machine-readable guidance. Agents consuming mcp-assert output could self-correct.
Invariant drift detection Medium Snapshot state before a tool call, compare after.

Recently Shipped

Item Version Description
getsentry/XcodeBuildMCP suite Unreleased 10 assertions, 27 tools discovered, 100% clean. First macOS-specific server. Server #39.
mcp-assert audit command Unreleased Zero-config quality audit. Connects, discovers tools, calls each with schema-generated inputs, reports quality score. Generates starter YAML for CI. Discovery on-ramp to the YAML workflow.
skip_unless_env field Unreleased Conditional assertion skipping based on env vars. Live-backend and no-auth assertions coexist in same suite.
Per-assertion Docker isolation Unreleased docker: field in server YAML. Fresh container per assertion for safe write testing.
Coverage expansion Unreleased SQLite 100%, Memory 100%, engram 100%. Anthropic git 92%, Playwright 67%.
Perplexity, Peekaboo, CodeGraphContext, deep-research suites Unreleased 39 servers, 472 assertions, 6 languages, 15 bugs.
pytest plugin 0.5.0 pip install pytest-mcp-assert. Published to PyPI via release pipeline.
Badge snippet on pass 0.5.0 CLI and GitHub Action output ready-to-paste badge markdown.
SSE transport fix 0.4.0 Start() missing for SSE/HTTP clients. Found by dogfooding.