CLI Reference

Commands

mcp-assert audit    --server <cmd> [--output <dir>] [--docker <image>] [--include-writes]
mcp-assert init     [dir] [--server <cmd>] [--fixture <dir>] [--timeout <duration>]
mcp-assert run      --suite <path> [--fix] [flags]
mcp-assert ci       --suite <path> [--fix] [flags]
mcp-assert matrix   --suite <dir> --languages <list> [flags]
mcp-assert coverage --suite <dir> --server <cmd> [flags]
mcp-assert generate --server <cmd> --output <dir> [flags]
mcp-assert snapshot --suite <dir> --server <cmd> [flags]
mcp-assert watch    --suite <dir> [flags]
mcp-assert intercept --server <cmd> [--trajectory <path>] [--timeout <duration>]
mcp-assert version

`mcp-assert audit`

Zero-config quality audit. Connects to a server, discovers all tools, calls each one with schema-generated inputs, and reports which tools are healthy vs. which crash. No YAML required.

mcp-assert audit --server "npx my-mcp-server" [--output evals/]

Flag	Description
`--server <cmd>`	Server command (stdio) or URL (http/sse) (required)
`--transport <type>`	Transport type: `stdio` (default), `http`, `sse`
`--headers <pairs>`	Custom headers as `key=value` pairs, comma-separated
`--docker <image>`	Run destructive tools in fresh Docker containers (stdio only)
`--timeout <duration>`	Per-tool call timeout (default: `15s`)
`--output <dir>`	Generate starter assertion YAML files in this directory
`--include-writes`	Also call destructive/write tools without Docker isolation (skipped by default)
`--json`	Output results as JSON

What it tests: Crash resistance and error handling. The audit calls every tool with valid-shaped inputs generated from JSON Schema. Tools that respond (whether with data or a proper isError: true) are scored as healthy. Tools that crash with internal errors (-32603), stack traces, or panics are scored as crashed.

What it doesn't test: Business logic, expected output content, multi-step workflows, or state verification. For those, use the YAML assertion workflow (see run, ci).

Destructive tool handling: By default, tools annotated as destructive are skipped. Two ways to test them:

--docker <image>: Spins up a fresh Docker container per destructive tool. Each tool gets an isolated environment; the container is destroyed afterward. Safe for write/delete tools.
--include-writes: Calls destructive tools directly on the host, without isolation. Use only when you understand the side effects.

Generating YAML for CI: Pass --output <dir> to generate one assertion YAML per tool. These stubs use not_error: true as the default expectation. Edit them to add expected content checks, setup steps, and multi-step flows, then run them in CI with mcp-assert ci --suite <dir>.

`mcp-assert init`

Scaffold an assertion template, or generate a complete working suite from a live server.

# Template mode (no server required)
mcp-assert init [dir]

# One-step suite generation (queries the server, creates stubs, captures snapshots)
mcp-assert init [dir] --server <cmd> [--fixture <dir>]

Flag	Description
`--server <cmd>`	Server command to query for `tools/list`. When provided, runs generate + snapshot in one step
`--fixture <dir>`	Fixture directory for `{{fixture}}` substitution in generated assertions
`--timeout <duration>`	Timeout for `tools/list` call (default: `15s`)

Without --server: Creates <dir>/read_file.yaml (a commented assertion template) and <dir>/fixtures/hello.txt (a fixture file). Default directory is evals.

With --server: Connects to the server, queries tools/list, generates one stub YAML per tool, then runs snapshot capture with --update to record baseline responses. The result is a complete working suite with 100% tool coverage and zero manual assertion writing. Destructive tools are generated with skip: true by default.

`mcp-assert run`

Execute assertions against an MCP server.

mcp-assert run --suite <path> [flags]

Flag	Description
`--suite <path>`	Directory or single YAML file containing assertions (required)
`--fixture <dir>`	Fixture directory for `{{fixture}}` substitution
`--server <cmd>`	Override server command from CLI instead of per-YAML
`--trials <n>`	Run each assertion N times for reliability metrics
`--docker <image>`	Run each assertion in a fresh Docker container
`--json`	Output full result array as JSON to stdout
`--junit <path>`	Write JUnit XML results
`--markdown <path>`	Write GitHub Step Summary markdown
`--badge <path>`	Write shields.io endpoint JSON
`--baseline <path>`	Compare against saved baseline
`--save-baseline <path>`	Save current results as baseline JSON
`--fix`	Scan nearby positions when position-sensitive assertions fail and suggest corrections
`--timeout <duration>`	Per-assertion timeout (default: `30s`)

Exit codes: 0 = all passed, 1 = one or more failures.

`mcp-assert ci`

Run with CI-specific exit codes and reporting. Supports all run flags plus CI-specific flags:

mcp-assert ci --suite <path> [flags]

Flag	Description
`--threshold <n>`	Minimum pass percentage (e.g., `95`)
`--fail-on-regression`	Exit 1 if a previously-passing assertion now fails (requires `--baseline`)
`--fix`	Scan nearby positions when position-sensitive assertions fail and suggest corrections

Auto-detects $GITHUB_STEP_SUMMARY for markdown output.

`mcp-assert matrix`

Run assertions across multiple language servers.

mcp-assert matrix --suite <dir> --languages <list> [--fixture <dir>]

mcp-assert matrix \
  --suite evals/ \
  --languages go:gopls,typescript:typescript-language-server,python:pyright-langserver

Output:

                     hover           definition        references     completions
Go (gopls)           PASS            PASS              PASS           PASS
TypeScript (tsserver) PASS            PASS              PASS           PASS
Python (pyright)     PASS            PASS              SKIP           PASS

`mcp-assert coverage`

Report which server tools have assertions and which don't.

mcp-assert coverage --suite <dir> --server <cmd> [--coverage-json <path>]

Starts the server, calls tools/list, compares against assertion tool names, and reports coverage percentage with covered/uncovered tool lists.

Server exposes 50 tools, 50 have assertions (100% coverage)

Covered (50):
  + add_workspace_folder (1 assertion)
  + call_hierarchy (1 assertion)
  ...

Not covered (0):
  (none)

`mcp-assert generate`

Auto-generate stub assertions from a server's tools/list response.

mcp-assert generate --server <cmd> --output <dir> [--fixture <dir>] [--include-writes]

Flag	Description
`--server <cmd>`	Server command to query for `tools/list` (required)
`--output <dir>`	Directory to write generated YAML files (required)
`--fixture <dir>`	Fixture directory for `{{fixture}}` substitution in generated stubs
`--include-writes`	Include destructive/write tools that are skipped by default

Queries tools/list, reads input schemas, and creates one YAML per tool with sensible defaults. Edit the generated YAMLs to replace TODO placeholders with real values.

Destructive tool handling: Tools annotated as destructive (destructiveHint: true) or not explicitly read-only (readOnlyHint: false) are skipped by default. This prevents accidentally running tools that modify state during testing. The skipped tools are generated with skip: true in their YAML. Pass --include-writes to include all tools without the skip marker.

Auth detection: If a tool's input schema includes properties with names like token, api_key, or password, the generated YAML includes a comment hinting that authentication may be required. Review these stubs and configure credentials via environment variables before running.

`mcp-assert snapshot`

Capture or compare tool response snapshots.

mcp-assert snapshot --suite <dir> --server <cmd> [--fixture <dir>] [--update] [--docker <image>]

Flag	Description
`--update`	Capture actual outputs and save as `.snapshots.json`
(no `--update`)	Assert current outputs match saved snapshots

`mcp-assert watch`

Rerun assertions automatically when YAML files change.

mcp-assert watch --suite <dir> [--server <cmd>] [--fixture <dir>] [--interval <duration>]

Flag	Description
`--interval <duration>`	Polling interval (default: `2s`)

Polls for changes, clears terminal between runs. The assertion development loop: edit YAML, save, see result.

When an assertion's status changes between iterations (e.g., PASS to FAIL), watch mode displays a unified diff of the expected vs actual response to help diagnose the change.

`mcp-assert intercept`

Proxy stdio between an agent and an MCP server, capturing every tool call in real time.

mcp-assert intercept --server <cmd> [--trajectory <path>] [--timeout <duration>]

Flag	Description
`--server <cmd>`	MCP server command to proxy traffic to (required)
`--trajectory <path>`	YAML file containing trajectory assertions to validate on disconnect
`--timeout <duration>`	Timeout for the proxy session (default: `30s`)

Sits between your agent (on stdin/stdout) and the MCP server, forwarding all JSON-RPC messages transparently while recording every tools/call invocation. When the agent disconnects, intercept validates the captured call sequence against any trajectory assertions in the --trajectory file and reports the results. Use this as an alternative to trace: or audit_log: when you want to validate a real agent session without modifying the agent itself.

`mcp-assert version`

Print the installed version.

mcp-assert version

mcp-assert v0.1.3

Server Override

Override the server config from CLI instead of repeating it in every YAML file:

mcp-assert run --suite evals/ --server "agent-lsp go:gopls" --fixture test/fixtures/go

Skipping Assertions

Add skip: true to any assertion YAML to exclude it from run and ci execution:

name: dangerous tool that modifies state
skip: true
server:
  command: my-server
assert:
  tool: delete_everything
  args: {}
  expect:
    not_error: true

Skipped assertions appear as SKIP in output and do not count toward pass or fail totals. This is useful for temporarily disabling flaky tests, for assertions that require external services, or for destructive tools that should not run in CI.

The generate command automatically sets skip: true on tools detected as destructive. Use --include-writes to generate stubs without the skip marker.

Docker Isolation

Run each assertion in a fresh Docker container for reproducibility:

mcp-assert run --suite evals/ --docker ghcr.io/blackwell-systems/agent-lsp:go --fixture /workspace

The fixture directory is mounted into the container. Each assertion gets a clean environment: no cross-test contamination, no "works on my machine."

Docker isolation is only supported with stdio transport (the default). HTTP/SSE transports connect to an already-running server and do not use Docker wrapping.

Client Capabilities

Client capabilities are configured per-assertion in YAML, not via CLI flags. Set client_capabilities in the server block to make mcp-assert respond to server-initiated requests (roots, sampling, elicitation):

server:
  command: /path/to/server
  client_capabilities:
    roots:
      - "{{fixture}}"
    sampling:
      text: "mock response"
    elicitation:
      content:
        confirmed: true

See Writing Assertions for full examples of each capability type.

Resource Assertions

assert_resources: is a YAML-level feature with no CLI flag equivalent. It replaces assert: to test resources/list or resources/read instead of tools/call:

assert_resources:
  list: {}          # or: read: "uri://resource"
  expect:
    not_empty: true
    contains: ["expected-resource"]

See Writing Assertions for full examples.

Prompt Assertions

assert_prompts: is a YAML-level feature with no CLI flag equivalent. It replaces assert: to test prompts/list or prompts/get instead of tools/call:

assert_prompts:
  list: {}          # or: get: {name: "my_prompt", arguments: {key: val}}
  expect:
    not_empty: true
    contains: ["expected_prompt"]

See Writing Assertions for full examples including pagination.

Trajectory Assertions

trace: and audit_log: are YAML-level features that replace server: for trajectory-based assertions. No CLI flag equivalent. No server is started.

trace:
  - tool: prepare_rename
    args: { file_path: "main.go", line: 6, column: 6 }
  - tool: rename_symbol
    args: { file_path: "main.go", new_name: "Entity" }
trajectory:
  - type: order
    tools: ["prepare_rename", "rename_symbol"]
  - type: absence
    tools: ["apply_edit"]

Replace trace: with audit_log: path/to/agent.jsonl to validate real agent behavior from a recorded JSONL log.

See Writing Assertions for the full format and all four assertion types.

Progress Capture

capture_progress and min_progress are YAML-level features on the assert: block, not CLI flags:

assert:
  tool: long_operation
  args: {}
  capture_progress: true
  expect:
    min_progress: 3

See Writing Assertions for details.

HTTP/SSE Transport

Transport is configured per-assertion in YAML, not via CLI flags. Set transport: sse or transport: http with a url field to connect to HTTP-based MCP servers instead of launching a subprocess:

server:
  transport: sse
  url: "http://localhost:8080/sse"

See Writing Assertions for full examples.

Reliability Metrics

Run multiple trials to measure consistency:

mcp-assert run --suite evals/ --trials 5

PASS  hover returns type info                 690ms
PASS  hover returns type info                 650ms
PASS  hover returns type info                 710ms
FAIL  get_references finds cross-file callers 90001ms
      tool call get_references failed: context deadline exceeded
PASS  get_references finds cross-file callers 27305ms

Reliability:
  Assertion                                     Trials  Passed    pass@k  pass^k
  ------------------------------------------    ------  ------  --------  ------
  hover returns type info                            3       3       YES     YES
  get_references finds cross-file callers            2       1       YES      NO

  pass@k: 2/2 capable, pass^k: 1/2 reliable

pass@k (capability): Did the assertion pass at least once? If NO, the tool is broken.
pass^k (reliability): Did the assertion pass every time? If NO, the tool is flaky.

Regression Detection

Save a baseline, then detect regressions on future runs:

# Save current results as baseline
mcp-assert run --suite evals/ --save-baseline baseline.json

# Later: compare against baseline
mcp-assert ci --suite evals/ --baseline baseline.json --fail-on-regression

Regressions detected (1):
  get_references finds cross-file callers: was PASS, now FAIL
error: 1 regression(s) detected

Only flags transitions from PASS to FAIL. Previously-failing tests that still fail are not regressions. New tests that fail are not regressions.

Terminal Output

mcp-assert uses color in interactive terminals: green for pass, red for fail, yellow for skip. A progress counter ([1/21], [2/21], ...) prints to stderr while assertions run. The summary line only shows non-zero counts.

Color and progress are automatically disabled in pipes and CI environments. Set NO_COLOR=1 to force plain PASS/FAIL/SKIP output explicitly.

Structured Reporting

# JUnit XML for CI test result tabs (GitHub Actions, Jenkins, CircleCI)
mcp-assert run --suite evals/ --junit results.xml

# GitHub Step Summary (auto-detects $GITHUB_STEP_SUMMARY in ci mode)
mcp-assert ci --suite evals/ --markdown summary.md

# shields.io badge endpoint
mcp-assert run --suite evals/ --badge badge.json
# Then use: ![mcp-assert](https://img.shields.io/endpoint?url=<badge-url>)