New User Comprehensibility Audit
Date: 2026-04-23 Auditor: Simulated new user (developer, TypeScript/Go, MCP server author)
Overall Score: 8.5/10
The documentation is comprehensive, well-structured, and technically accurate. A new user with MCP experience can go from zero to running assertions in under 5 minutes. However, there are several gaps that would cause confusion, especially around the recommended onboarding path and the relationship between commands.
Path 1: README -> Getting Started
Strengths
-
Immediate value proposition: The opening line ("The testing standard for deterministic MCP tools") establishes positioning in 3 seconds. The scope statement ("Works with any language, any transport") addresses the first question a polyglot MCP developer would ask.
-
Accurate Quick Start: The three-command quick start works as advertised:
bash go install github.com/blackwell-systems/mcp-assert/cmd/mcp-assert@latest mcp-assert init evals mcp-assert run --suite evals/ --fixture evals/fixturesAll commands exist, flags are correct, and the path creates a runnable assertion. -
Clear positioning vs LLM-as-judge: The "When to use what" table immediately answers a question every MCP server author would have. This is excellent. The table format makes it scannable.
-
Zero-Effort Coverage is compelling: The generate/snapshot/run flow is a legitimate 30-second path from "I want tests" to "I have tests." This should be the hero feature.
-
Getting Started doc structure: The progression from template scaffold → manual assertion → server-based generation → zero-effort coverage is logical. Each step builds on the previous one.
-
Accurate examples: The YAML examples in
getting-started.mdmatch the actual schema and work when copied. The{{fixture}}substitution is explained early and consistently.
Gaps
-
Quick Start in README omits fixtures: The Quick Start shows:
bash mcp-assert init evals mcp-assert run --suite evals/ --fixture evals/fixturesButinit(without--server) createsevals/fixtures/hello.txtautomatically. A new user might think they need to createevals/fixturesthemselves. This isn't a blocker (the command will succeed), but it's a minor source of confusion. -
Two "recommended" paths with no guidance: The README Quick Start shows the template path (
init evals), while Getting Started §1 ("One-step suite generation (recommended)") shows the server-based path (init evals --server). Both are labeled as the default or recommended path. A new user doesn't know which to choose. The README should say "If you have a running server, use--server; otherwise, start with the template." -
Missing context on when to write assertions manually: Getting Started §3 ("Write an assertion by hand") doesn't explain when you'd choose this over
initorgenerate. The answer: when you know your server config and want more control than the template provides, but don't want to rungenerate. This is a valid path but needs a one-sentence "when to use" callout. -
"Zero-Effort Coverage" appears twice: It's in the README (lines 67-78) and Getting Started (lines 121-147). The duplication isn't harmful, but the README version is more polished (includes expected output). The Getting Started version adds the individual-step breakdown, which is useful. Consider: README gets the one-liner version, Getting Started gets the detailed version with rationale.
-
No mention of
--fixin Quick Start paths: The--fixflag is documented in cli.md and writing-assertions.md, but never surfaces in the initial "your first run" flow. A new user will hit position-sensitive failures (especially in LSP servers) and won't know--fixexists. Add a one-line callout in Getting Started §2 after the firstrunexample: "If a position-sensitive assertion fails, pass--fixto get a suggested correction." -
Fixture directory ambiguity: Getting Started line 14 says
initwith--serveraccepts--fixture ./fixtures, but the example output (lines 19-39) doesn't show where fixtures are created or expected. Does--servermode create a fixture directory? Does it require one to exist? The answer (from code review): it doesn't create fixtures; it uses--fixtureonly for{{fixture}}substitution in generated YAMLs. If--fixtureis omitted,{{fixture}}appears literally in the generated files. This should be clarified.
Suggestions
-
Add a one-sentence "which path should I use?" decision tree at the top of Getting Started:
Which path should I use? - Have a running MCP server? →
init evals --server "my-server"(generates stubs + baselines) - Want to start from a template? →init evals(creates one commented YAML) - Know your server config already? → Write the YAML directly (§3) -
In the README Quick Start, add one line clarifying fixture creation:
bash mcp-assert init evals # Creates evals/read_file.yaml and evals/fixtures/hello.txt -
Consolidate "Zero-Effort Coverage": README keeps the polished one-liner version with output, Getting Started keeps the detailed breakdown. Add a cross-reference in the README: "See Getting Started for step-by-step details."
-
Add
--fixto the "Next steps" output ofinit. Currently it prints:Next steps: Run the suite: mcp-assert run --suite evals --server "my-server"Should also print:Fix position failures: mcp-assert run --suite evals --server "my-server" --fix
Path 2: CI Integration
Strengths
-
One-liner GitHub Action: The example at the top (lines 6-9) is copy-paste ready and works. The dedicated action (
blackwell-systems/mcp-assert-action@v1) is mentioned but not required, which is correct. -
All flags documented:
--threshold,--fail-on-regression,--baseline,--save-baseline,--junit,--markdown,--badgeare all explained with examples. -
Regression detection is clear: The baseline/compare flow (lines 61-71) explains exactly what counts as a regression ("PASS to FAIL") and what doesn't ("new tests that fail"). This is critical for CI adoption.
Gaps
-
Auto-detection of
$GITHUB_STEP_SUMMARYis mentioned but not explained: Line 44 says "cimode auto-detects$GITHUB_STEP_SUMMARYfor markdown output," but doesn't say what happens (markdown is written there automatically) or how to override it. A new user might wonder if they need--markdownin GitHub Actions. The answer: no, it's automatic incimode. Add one sentence: "Incimode on GitHub Actions, markdown is written to$GITHUB_STEP_SUMMARYautomatically (no--markdownflag needed)." -
No example of
--fail-on-regressionwith--threshold: What happens if you set both? Does a regression fail the build even if the threshold is met? The answer (from cli.md and typical CI semantics): both are checked independently; either can fail the build. Clarify: "You can combine--thresholdand--fail-on-regression. Both are checked; either can cause a failure." -
Badge example is incomplete: Line 56 shows the badge output format but doesn't explain where to host
badge.jsonor how to use the URL. A new user would need to look up shields.io endpoint syntax. Add one sentence: "Hostbadge.jsonat a public URL (GitHub Pages, GitHub releases, or a CDN), then usehttps://img.shields.io/endpoint?url=<badge-url>."
Suggestions
- Add a "Typical CI workflow" section that combines all the pieces: ```yaml
-
name: Run assertions run: | go install github.com/blackwell-systems/mcp-assert/cmd/mcp-assert@latest mcp-assert ci --suite evals/ --threshold 95 --baseline baseline.json --fail-on-regression --junit results.xml
-
name: Upload results if: always() uses: actions/upload-artifact@v4 with: name: test-results path: results.xml ``` This shows how all the flags work together in a realistic workflow.
-
Clarify the
--markdownbehavior in GitHub Actions (see Gap #1 above).
Path 3: CLI Reference
Strengths
-
Command table is accurate: All 10 commands (audit, init, run, ci, matrix, coverage, generate, snapshot, watch, intercept) are listed, plus
version. I verified these againstmain.goand all exist. -
Flag tables are complete: Checked
runflags againstcommands.goand all documented flags exist. The--fixflag is documented in bothrunandci(lines 61, 78), which is correct. -
interceptis documented: Lines 178-192 explain the proxy behavior,--trajectoryflag, and use case. This is a non-obvious command and the explanation is clear. -
"YAML-level feature" callouts: Lines 245-339 explain which features (client_capabilities, assert_resources, assert_prompts, trajectory, progress, transport) are YAML-only and have no CLI equivalents. This prevents a new user from searching for a
--client-capabilitiesflag that doesn't exist. -
Docker isolation caveats: Line 241 notes Docker only works with stdio transport, not HTTP/SSE. This is critical for HTTP server authors to know.
Gaps
-
--fixis not in the command usage summary: Lines 6-14 show the usage for each command, but--fixis never listed. It appears in the flag table (line 61) and the description (line 78), but not in therunorciusage lines. The usage should show:mcp-assert run --suite <path> [--fix] [flags] mcp-assert ci --suite <path> [--fix] [flags](This is consistent with the main.go printUsage output, which also omits--fix. It should be added there too.) -
intercept --trajectoryis listed as required but description says "optional": Line 14 shows--trajectory <path>with no brackets (implying required), but line 188 in the description says "YAML file containing trajectory assertions to validate on disconnect" (no "required" marker). The code (intercept.go line 22) shows it IS required. The description should say "required" explicitly. -
No explanation of what happens when
--serverCLI flag overrides YAMLserver:block: Line 52 says "--serveroverrides server command from CLI instead of per-YAML" but doesn't explain what happens toserver.argsorserver.envfrom the YAML. Does the CLI override everything, or justcommand? The answer (from code review): CLI--serverreplaces the entire server block (command + args). The YAMLenvis ignored. This should be clarified with an example. -
--intervalfor watch mode is listed but not explained: Line 171 shows--interval <duration>with default2s, but doesn't say what it does (polling interval for file changes). This is mentioned in the intro (line 173: "Polls for changes"), but the flag description should be explicit: "How often to check for YAML file changes (default: 2s)." -
Server Override section (lines 206-211) duplicates
--serverflag description: This section repeats information from the flag table. It adds a concrete example, which is useful, but the duplication is noticeable. Consider: move the example up to the flag table, or frame this section as "Example: Override server config for an entire suite." -
Reliability metrics example is excellent but misplaced: Lines 342-366 show a multi-trial run with pass@k/pass^k output. This is one of the best examples in the entire doc. However, it's at the end of a long reference doc. Consider: move this to a "Key Features" section near the top, or add a callout in the
--trialsflag description (line 53) pointing to the example.
Suggestions
-
Add
[--fix]to therunandciusage lines (lines 7, 8) and the main.go printUsage output. -
Mark
--trajectoryas required in the flag table (line 188): "Path to YAML file with trajectory assertions (required)." -
Expand the
--serverflag description (line 52) with a behavioral note:Override server command for all assertions. Replaces the entire
server:block in each YAML (command + args). The YAML'senv:block is preserved. Example:--server "agent-lsp go:gopls". -
Expand
--intervaldescription (line 171):How often to poll the
--suitedirectory for YAML file changes (default: 2s). -
Add a "Key Features & Examples" section at the top of cli.md, before the command table, highlighting reliability metrics, Docker isolation, and fix mode with their example outputs. Then link to them from the flag descriptions.
Path 4: Examples
Strengths
-
Coverage is excellent: 18 suites (17 server + 1 trajectory), 12 different servers, 3 languages, 174 assertions. This is the best examples section I've seen in any testing tool.
-
Each example has clear setup instructions: Every suite shows the install command (npm, uvx, git clone) and the
mcp-assert runinvocation. A new user can copy-paste and run any example in under 60 seconds. -
Annotations explain what's being tested: Each suite description lists the tools/features covered and notes coverage percentage ("92% tool coverage (13/14 tools)"). This is extremely helpful for understanding what a comprehensive test suite looks like.
-
Trajectory suite is well-explained: Lines 215-236 explain that trajectory assertions validate skill protocols, run without a server, and can use inline traces or audit logs. The table showing constraints per skill is a great reference.
-
Real-world servers: filesystem, memory, sqlite, fastmcp, agent-lsp, mcp-go, github-mcp. These are the servers people actually use. The examples aren't toy demos.
-
Transport diversity: stdio (most suites), HTTP (mcp-go-everything-http). This shows the tool works with both transports.
Gaps
-
fastmcp note about
/tmp/fastmcpis easy to miss: Lines 75-79 have a note saying you need to clone the fastmcp repo to/tmp/fastmcpbefore running. This is critical (the assertions will fail without it), but it's in a note block that could be skipped. The "install" line should be:bash git clone --depth 1 https://github.com/PrefectHQ/fastmcp.git /tmp/fastmcp mcp-assert run --suite examples/fastmcp-testing-demo(This is shown in lines 75-76, so not wrong, just easy to miss.) -
agent-lsp example requires fixtures but doesn't show where to get them: Line 88 shows
--fixture /path/to/go/fixturesbut doesn't say where these fixtures are or how to create them. Are they in the agent-lsp repo? Should the user create them? The answer (from code review): the agent-lsp repo includes test fixtures intest/fixtures/go. This should be stated: "Use the fixtures from the agent-lsp repo:--fixture /path/to/agent-lsp/test/fixtures/go." -
No example showing how to adapt an example to your own server: A new user might think they need to copy the entire
examples/filesystem/directory and modify every YAML. They don't; they can just rungenerateorinit --server. But the examples doc never says this. Add a "Adapting Examples" section at the end: "To test your own server, usemcp-assert generate --server "your-server" --output evals/instead of copying these examples." -
Summary table doesn't show which examples use advanced features: The table (lines 7-26) shows server, language, transport, and assertion count, but not which examples demonstrate setup steps, capture, client_capabilities, trajectory, etc. A new user looking for "an example with setup steps" would need to read every description. Add columns for "Key Features" showing: setup, capture, client_capabilities, negative tests, stateful, trajectory.
-
mcp-go longRunningOperation is skipped but reason isn't in the main examples table: Line 95 mentions the known bug (stdio transport issue) but this isn't visible in the summary table. If a new user runs
mcp-assert run --suite examples/mcp-go-everything, they'll see a SKIP and wonder why. Add a note in the suite description: "One assertion (longRunningOperation) is skipped due to a known mcp-go stdio bug." -
GitHub MCP Server example requires a token but doesn't explain how to get one: Line 207 shows
GITHUB_PERSONAL_ACCESS_TOKEN=$GITHUB_TOKEN mcp-assert run ...but doesn't explain what scopes the token needs or where to create it. Add one sentence: "Create a token at https://github.com/settings/tokens withrepoandread:userscopes."
Suggestions
-
Add a "Key Features" column to the summary table showing which suites demonstrate: setup, capture, client_capabilities, negative tests, stateful, trajectory, multi-file, auth.
-
Expand agent-lsp fixture note (line 88):
bash # Clone agent-lsp if not already present git clone https://github.com/blackwell-systems/agent-lsp.git /tmp/agent-lsp mcp-assert run --suite examples/agent-lsp-go --fixture /tmp/agent-lsp/test/fixtures/go -
Add an "Adapting Examples to Your Server" section at the end:
Adapting Examples to Your Server
These examples are for reference. To test your own MCP server, use:
bash mcp-assert generate --server "your-mcp-server" --output evals/This queries your server'stools/list, generates stub YAMLs, and captures baselines automatically. -
Add GitHub token scope note (line 207):
bash # Create a token at https://github.com/settings/tokens with repo + read:user scopes GITHUB_PERSONAL_ACCESS_TOKEN=$GITHUB_TOKEN mcp-assert run --suite examples/github-mcp
Critical Gaps (would block a new user)
1. No guidance on choosing between init (template) and init --server (generate)
Impact: A new user reading the README will use init evals, then read Getting Started and see init evals --server described as "recommended." They'll wonder if they took the wrong path.
Fix: Add a decision tree at the top of Getting Started (see Path 1 suggestions above).
2. --fixture behavior with init --server is unclear
Impact: A new user running init evals --server "my-server" without --fixture will get YAMLs with literal {{fixture}} strings. They won't know this is wrong until they run the assertions and get cryptic errors.
Fix: Document in Getting Started §1 that --fixture is optional but recommended: "If omitted, {{fixture}} appears literally in generated YAMLs. Pass --fixture ./fixtures to substitute real paths."
3. No mention of --fix in initial onboarding
Impact: A new user writing LSP server assertions will hit "no identifier found at line X, column Y" errors and won't know how to fix them. They'll manually adjust line/column values, which is tedious and error-prone.
Fix: Add --fix to the "Next steps" output of init and mention it in Getting Started §2 after the first run command.
4. agent-lsp and fastmcp examples require external fixtures/repos but this isn't obvious
Impact: A new user running mcp-assert run --suite examples/agent-lsp-go --fixture /path/to/go/fixtures will get a "directory not found" error if they don't have the agent-lsp repo cloned.
Fix: Expand the setup instructions for agent-lsp and fastmcp to show the git clone step explicitly (see Path 4 suggestions above).
Nice-to-haves (would improve experience)
1. Consolidate "Zero-Effort Coverage" duplication
The README and Getting Started both explain this flow. The duplication isn't harmful, but it's noticeable. Consolidate as suggested in Path 1.
2. Add "Key Features" to examples summary table
Helps new users find relevant examples faster. See Path 4, Gap #4.
3. Move reliability metrics example higher in cli.md
The pass@k/pass^k example (lines 342-366) is one of the best features but it's buried at the end. Surface it earlier.
4. Explain --server CLI override behavior more clearly
What happens to args and env when you override from CLI? See Path 3, Gap #3.
5. Add a "Common Workflows" section to cli.md
Show how flags combine in realistic scenarios:
- One-shot local run: run --suite evals/ --fixture ./fixtures
- CI with threshold: ci --suite evals/ --threshold 95 --junit results.xml
- Regression detection: ci --suite evals/ --baseline baseline.json --fail-on-regression
- Multi-trial reliability: run --suite evals/ --trials 5 --json
- Position fix: run --suite evals/ --fix
6. Add shields.io badge hosting example
The --badge flag is documented but doesn't explain where to host the JSON or how to use the URL. See Path 2, Gap #3.
7. Clarify $GITHUB_STEP_SUMMARY auto-detection
Mention it in ci-integration.md: "In ci mode on GitHub Actions, markdown is written to $GITHUB_STEP_SUMMARY automatically."
8. Add a "Troubleshooting" section
Common issues a new user might hit:
- "no identifier found at line X" → use --fix
- "{{fixture}} not substituted" → pass --fixture flag
- "SKIP: destructive tool" → edit the YAML and remove skip: true
- "timeout: context deadline exceeded" → increase timeout: in YAML
Summary
The documentation is excellent for a new testing tool. A motivated developer can go from "never heard of mcp-assert" to "running assertions on my server" in under 10 minutes. The writing is clear, examples are realistic, and technical accuracy is high.
The main friction points are:
- Two "recommended" onboarding paths with no decision guidance (template vs generate)
--fixis never surfaced in initial onboarding (will cause pain for LSP server authors)- Fixture behavior with
init --serveris ambiguous (what happens if you omit--fixture?) - Examples requiring external repos don't show the full setup (agent-lsp, fastmcp)
Fixing these four issues would eliminate all blocking friction. The nice-to-haves are polish: consolidating duplication, surfacing advanced features earlier, and adding troubleshooting guidance.
Recommendation: Fix the four critical gaps before the next release. The nice-to-haves can wait. Even without them, this is already better than 90% of OSS testing tools.