Architecture
Overview
mcp-assert is a command-line testing tool for MCP servers. It reads test definitions from YAML files, starts each MCP server as a subprocess (or connects to a remote one), sends requests using the MCP protocol, and checks the responses against expectations you define. If the response matches, the test passes. If not, it fails with a clear error message.
The problem it solves: MCP servers expose tools, prompts, and resources to AI agents, but there is no built-in way to verify that those capabilities return correct results. mcp-assert fills that gap by providing deterministic, repeatable correctness tests that run in CI or locally, for any MCP server written in any language.
The tool is a single Go binary with no runtime dependencies. You write YAML files describing what to call and what to expect, point mcp-assert at them, and get pass/fail results in the terminal, JUnit XML, markdown, or JSON.
How MCP Works (Brief Primer)
The Model Context Protocol (MCP) is a standard for AI agents to interact with external services. If you are already familiar with MCP, skip to the next section.
Servers and clients. An MCP server is a program that exposes capabilities (tools, prompts, resources) over a well-defined protocol. An MCP client connects to the server and makes requests. In mcp-assert's case, mcp-assert itself is the client.
Tools are functions the server offers. A client calls a tool by name with JSON arguments and receives a text response. For example, a filesystem server might expose a read_file tool that accepts {"path": "/tmp/foo.txt"} and returns the file contents.
Prompts are reusable prompt templates the server provides. A client can list available prompts and retrieve a specific one (optionally with arguments that fill template variables).
Resources are data the server exposes for reading. A client can list available resources and read a specific one by URI, similar to a REST GET endpoint.
JSON-RPC is the wire format. Every MCP message is a JSON-RPC 2.0 request or response. The client sends {"jsonrpc":"2.0", "method":"tools/call", "params":{...}, "id":1} and the server replies with {"jsonrpc":"2.0", "result":{...}, "id":1}.
Transports determine how JSON-RPC messages travel between client and server:
- stdio (default): The client launches the server as a subprocess. JSON-RPC messages flow over the subprocess's stdin and stdout. This is the most common transport for local development.
- SSE (Server-Sent Events): The client connects to an HTTP endpoint. The server pushes responses as SSE events. This is a legacy transport.
- Streamable HTTP: The client sends HTTP POST requests and receives streamed responses. This is the modern remote transport.
The handshake. Before any tool calls, the client sends an initialize request declaring its name, version, and the protocol version it supports. The server responds with its own capabilities (which tools, prompts, and resources it offers). After this exchange, the client can make requests.
Bidirectional requests. MCP is not strictly client-to-server. The server can also request things from the client: the list of filesystem roots (roots/list), an LLM completion (sampling/createMessage), or user input (elicitation/create). mcp-assert supports mocking all three of these via the client_capabilities YAML field.
Lifecycle of an Assertion
This section walks through exactly what happens when you run:
mcp-assert run --suite evals/ --fixture ./test-data
Step 1: CLI dispatch
cmd/mcp-assert/main.go reads the first argument (run) and calls runner.Run(), passing the remaining flags. The Run function in internal/runner/commands.go parses --suite, --fixture, --trials, output flags, and others.
Step 2: Load the suite
assertion.LoadSuite("evals/") (in internal/assertion/loader.go) reads the directory. It collects every .yaml and .yml file, recursing one level into subdirectories. Each file is parsed into an Assertion struct via Go's yaml.v3 library. If the name field is omitted, the filename becomes the assertion name. The result is a Suite containing a slice of Assertion values.
Step 3: Iterate and isolate
The runner loops over each assertion. Before executing, it calls isolateFixture() (in internal/runner/fixture.go), which copies the entire fixture directory to a temporary location. This ensures each assertion gets a pristine copy of the test data. The original fixture is never modified.
Step 4: Start the MCP server
createMCPClient() (in internal/runner/client.go) reads the assertion's server block and selects the transport:
- stdio: launches the server command as a subprocess, piping stdin/stdout for JSON-RPC.
- sse: connects to the server's URL via SSE.
- http: connects via streamable HTTP.
If client_capabilities is set (roots, sampling, or elicitation), the stdio path uses a lower-level construction (createStdioClientWithCapabilities) that registers bidirectional request handlers before the client starts. This ensures the handlers are active before the initialize handshake, since the server may immediately request roots or sampling.
Step 5: Initialize
The runner sends an initialize JSON-RPC request with the protocol version and client identity (mcp-assert v1.0). The server responds with its capabilities. This is the standard MCP handshake.
Step 6: Route to the correct handler
runAssertion() (in internal/runner/execute.go) inspects which block is present on the assertion and dispatches accordingly:
| Block present | Handler called | MCP method used |
|---|---|---|
trajectory: |
runTrajectoryAssertion |
None (no server) |
assert_resources: |
runResourceAssertion |
resources/list or resources/read |
assert_prompts: |
runPromptAssertion |
prompts/list or prompts/get |
assert_completion: |
runCompletionAssertion |
completion/complete |
assert_sampling: |
runSamplingAssertion |
tools/call (triggers server-side sampling) |
assert_logging: |
runLoggingAssertion |
logging/setLevel + tools/call |
assert: (default) |
inline in runAssertion |
tools/call |
Step 7: Run setup steps
If the assertion has a setup: block, the runner executes each setup tool call sequentially. Setup calls establish the state the assertion needs (for example, starting a language server or opening a document). Template substitution replaces {{fixture}} with the isolated fixture path and {{variable_name}} with values captured from prior setup responses.
Step 8: Snapshot (if needed)
If the assertion's expect block contains file_unchanged entries, the runner reads those files from disk before the tool call and stores their contents in memory. After the tool call, it compares the files to detect modifications.
Step 9: Call the tool under test
The runner sends the tools/call JSON-RPC request with the tool name and arguments (after template substitution). It captures the response text and the isError flag.
Step 10: Check expectations
assertion.Check() (in internal/assertion/checker.go) evaluates every expectation in the expect block against the response. Expectations are checked in a fixed order (see "Key Abstractions" below). The first failure short-circuits: only the first failing expectation is reported, to keep error messages actionable.
If capture_progress: true was set, the runner also checks min_progress via assertion.CheckProgress(), verifying that enough notifications/progress messages arrived during the tool call.
Step 11: Clean up
The MCP client is closed (which kills the subprocess for stdio transport). The temporary fixture directory is removed. The Result struct (pass, fail, or skip, plus timing) is appended to the results list.
Step 12: Report
After all assertions finish, the runner dispatches results to output sinks. The terminal always gets a human-readable table. Optional flags produce JUnit XML (--junit), markdown (--markdown), shields.io badge JSON (--badge), or raw JSON (--json). If --trials was greater than 1, reliability metrics (pass@k, pass^k) are also printed.
┌──────────┐ ┌──────────┐ ┌──────────────┐ ┌────────────┐
│ YAML │────>│ Loader │────>│ Runner │────>│ Reporter │
│ files │ │ │ │ (per assert)│ │ │
└──────────┘ └──────────┘ │ │ │ Terminal │
│ Isolate │ │ JUnit XML │
│ Start server│ │ Markdown │
│ Initialize │ │ Badge JSON │
│ Setup calls │ │ Raw JSON │
│ Tool call │ └────────────┘
│ Check │
│ Close │
└──────────────┘
Package Structure
The codebase is organized into three packages under internal/, plus the entry point.
cmd/mcp-assert/main.go
The binary entry point. It reads the first CLI argument and dispatches to the appropriate function in the runner package (Audit, Run, Matrix, CI, Init, Coverage, Generate, Snapshot, Watch, Intercept). There is no framework; it is a simple switch statement on os.Args[1]. This file also defines printUsage() for help text and exposes a Version variable set at build time.
internal/assertion/ (types, loading, checking)
This package defines the data model and all validation logic. It has no I/O beyond reading YAML files and checking files on disk for file_contains/file_unchanged. It does not import the runner or report packages.
| File | Responsibility |
|---|---|
types.go |
All core types: Suite, Assertion, ServerConfig, AssertBlock, Expect, Result, Status, and the block types for resources, prompts, completion, sampling, logging, and trajectory. |
loader.go |
LoadSuite() reads a directory (or single file) of YAML, parses each into an Assertion, and returns a Suite. Recurses one level into subdirectories. Defaults the name field to the filename if omitted. |
checker.go |
Check() evaluates all 18 expectation types against a response string. CheckWithSnapshots() adds file_unchanged comparison. CheckProgress() checks progress notification counts. Also contains jsonPathLookup() for simple $.dot.path[N] queries. |
trajectory.go |
CheckTrajectory() evaluates the 4 trajectory assertion types (order, presence, absence, args_contain) against a trace of tool calls. LoadAuditLog() parses JSONL files into trace entries. |
sampling_types.go |
SamplingAssertBlock type for assertions that test tools which trigger server-side LLM sampling. |
logging_types.go |
LoggingAssertBlock, LoggingExpect, and LogMessage types for assertions that test log level setting and message capture. |
logging_checker.go |
Logging-specific assertion checking logic. |
internal/runner/ (execution engine)
This package contains all the execution logic: CLI flag parsing, server lifecycle, assertion routing, fixture management, and every CLI command.
| File | Responsibility |
|---|---|
audit.go |
Audit(): zero-config quality audit. Connects to a server, discovers tools via tools/list, calls each with schema-generated inputs, classifies results (healthy/crash/timeout), reports a quality score, optionally generates starter YAML files. |
commands.go |
Run(), Matrix(), CI(): CLI entry points that parse flags, load suites, iterate assertions, collect results, and trigger reporting. |
runner.go |
Package doc comment only (the actual runner logic is in execute.go and commands.go). |
execute.go |
runAssertion(): the core execution function. Routes to the correct handler based on which block is present. Contains inline logic for the default assert: (tool call) path, plus runResourceAssertion, runPromptAssertion, runCompletionAssertion, and runTrajectoryAssertion. |
client.go |
createMCPClient(): transport selection (stdio/SSE/HTTP), Docker wrapping, and createStdioClientWithCapabilities() for bidirectional handlers. Also defines the static handler types for roots, sampling, and elicitation. |
substitute.go |
substituteAll(): recursive template replacement for {{fixture}} and captured variables in tool arguments. Also extractJSONPath() for pulling values from setup responses. |
fixture.go |
isolateFixture() and copyDir(): per-assertion fixture directory copying to a temp location. |
coverage.go |
Coverage() command: starts the server, calls tools/list, compares against assertion tool names, reports coverage percentage. |
generate.go |
Generate() command: connects to a server, queries tools/list, and writes stub YAML assertion files. |
init.go |
Init() command: scaffolds a template assertion directory, or generates a complete suite with --server. |
snapshot.go |
Snapshot() command: captures tool responses for regression comparison, similar to Jest snapshot testing. |
watch.go |
Watch() command: polls YAML files for changes and reruns assertions, showing unified diffs when assertion status flips. |
intercept.go |
Intercept() command: proxies stdio between an agent and MCP server, capturing tool calls for live trajectory validation. |
sampling.go |
runSamplingAssertion(): handles assertions that test tools triggering server-side sampling/createMessage. |
logging.go |
runLoggingAssertion(): handles logging/setLevel plus notifications/message capture. |
fix.go |
--fix mode: ScanNearbyPositions() tries nearby line/column values when position-sensitive assertions fail, and generates YAML patch suggestions. |
util.go |
Shared helpers: writeReports(), applyServerOverride(), countFails(), countPasses(), extractText(). |
internal/report/ (output formatting)
This package consumes []assertion.Result and produces output in various formats. It depends on the assertion package for types but nothing else. All write errors go to stderr and do not fail the run.
| File | Responsibility |
|---|---|
audit.go |
PrintAuditHeader(), PrintAuditResults(), PrintAuditSummary(), PrintAuditNextSteps(): audit-specific report formatting with quality score and CI guidance. |
report.go |
PrintResults(): terminal table with color (TTY) or plain text (pipe/CI). PrintMatrix(): cross-language comparison table. |
color.go |
ANSI color codes, TTY detection (os.ModeCharDevice), NO_COLOR env var support, progress indicator on stderr. |
diff.go |
FormatDiff(), FormatStatusChange(): unified diff output for the watch command when an assertion's status changes. |
junit.go |
JUnit XML serialization via encoding/xml. Includes pass@k/pass^k properties when --trials > 1. |
markdown.go |
GitHub Step Summary table. Includes a reliability section when --trials > 1. Auto-detects $GITHUB_STEP_SUMMARY in CI mode. |
badge.go |
shields.io endpoint JSON (schemaVersion, label, message, color). |
reliability.go |
pass@k (passed at least once in k trials) and pass^k (passed every time in k trials) computation. |
baseline.go |
Baseline JSON write/load and regression detection. Only PASS-to-non-PASS transitions count as regressions. |
coverage.go |
Coverage JSON serialization for the coverage command. |
snapshot.go |
Snapshot file read/write/compare for the snapshot command. |
Package dependency graph
cmd/mcp-assert/main.go
└── internal/runner
├── internal/assertion (types, loader, checker)
├── internal/report (all output formats)
├── mark3labs/mcp-go/client (MCP transport: stdio, SSE, streamable HTTP)
└── mark3labs/mcp-go/mcp (MCP protocol types)
No circular dependencies. The assertion and report packages do not import each other. report depends on assertion for the Result and Status types. Neither package imports runner.
Key Abstractions
Suite and Assertion
A Suite is a collection of Assertion values loaded from a directory. Each Assertion represents a single test: connect to a server, optionally run setup steps, make a request, check the response.
type Suite struct {
Assertions []Assertion
Dir string // directory the suite was loaded from
}
type Assertion struct {
Name string
Server ServerConfig // how to connect
Setup []ToolCall // prerequisite tool calls
Assert AssertBlock // the tool call to test (default block)
AssertResources *ResourceAssertBlock // or test resources
AssertPrompts *PromptAssertBlock // or test prompts
AssertCompletion *CompletionAssertBlock // or test completion
AssertSampling *SamplingAssertBlock // or test sampling
AssertLogging *LoggingAssertBlock // or test logging
Trace []TraceEntry // or validate a tool call trace
Trajectory []TrajectoryAssertion // trajectory checks (no server)
Timeout string
Skip bool
}
Exactly one block type is active per assertion. The runner checks them in priority order (trajectory first, then resources, prompts, completion, sampling, logging, and finally the default assert: block).
ServerConfig
Describes how to connect to the MCP server under test.
type ServerConfig struct {
Command string // executable to launch (stdio)
Args []string // arguments to the command
Env map[string]string // environment variables
Transport string // "stdio", "sse", or "http"
URL string // endpoint for sse/http
ClientCapabilities ClientCapabilities // mock bidirectional responses
}
Expect
The Expect struct holds all possible expectations. You set only the fields you care about; unset fields are skipped during checking.
type Expect struct {
Contains []string // response must contain these substrings
ContainsAny []string // response must contain at least one of these
NotContains []string // response must not contain these
Equals *string // exact match (whitespace-trimmed)
JSONPath map[string]any // $.field.path must equal expected value
MinResults *int // array must have at least N items
MaxResults *int // array must have at most N items
NotEmpty *bool // response must not be empty/null/[]/{}
NotError *bool // isError flag must be false
IsError *bool // isError flag must be true
MatchesRegex []string // response must match all patterns
FileContains map[string]string // file on disk must contain text
FileNotContains map[string]string // file on disk must NOT contain text
FileNotExists []string // file must NOT exist on disk
FileUnchanged []string // file on disk must not have changed
NetDelta *int // net_delta field must equal N
InOrder []string // substrings must appear in this order
MinProgress *int // at least N progress notifications
}
The checker evaluates expectations in this fixed order:
not_error/is_error(check the isError flag)not_empty(reject empty, null,[],{})equals(exact match, whitespace-trimmed)contains/contains_any/not_contains(substring checks)matches_regex(compiled regex matching)json_path(dot-notation lookup on parsed JSON)min_results/max_results(array length bounds)net_delta(numeric field comparison)file_contains/file_not_contains(read file from disk, check content)file_not_exists(verify file does not exist on disk)in_order(ordered substring search)
file_unchanged is handled separately by CheckWithSnapshots(). min_progress is handled by CheckProgress() after the main check.
First failure short-circuits: only the first failing expectation is reported.
Result
The outcome of running a single assertion.
type Result struct {
Name string // assertion name
Status Status // "PASS", "FAIL", or "SKIP"
Detail string // error message on failure
Duration time.Duration // wall-clock time
Language string // set in matrix mode
Trial int // trial number when --trials > 1
}
Results flow from the runner to the report package. Every output format (terminal, JUnit, markdown, badge, JSON) consumes the same []Result slice.
Transport Layer
How mcp-assert connects to servers
The createMCPClient() function in client.go is the single point where transport selection happens. It reads the transport field from the assertion's ServerConfig and creates the appropriate mcp-go client:
stdio (default). The function calls client.NewStdioMCPClient(command, env, args...) from the mcp-go library. This launches the server command as a child process and wires stdin/stdout for JSON-RPC. If --docker is set, the command is rewritten to docker run --rm -i -v fixture:fixture <image> <original-command>, so Docker's -i flag provides the same bidirectional pipe transport.
SSE. The function calls client.NewSSEMCPClient(url). The client connects to the server's SSE endpoint for receiving responses and sends requests over a standard HTTP POST channel.
Streamable HTTP. The function calls client.NewStreamableHttpClient(url). This is the modern remote transport where both requests and responses use HTTP with streaming.
All three return the same client.MCPClient interface, so the rest of the runner is transport-agnostic. After creation, the runner calls Initialize() on the client to perform the MCP handshake.
Client capabilities (bidirectional path)
When client_capabilities is configured in the YAML, the stdio transport takes a different construction path. Instead of the convenience NewStdioMCPClient, the runner uses the lower-level client.NewClient with explicit handler options:
- Create a raw
StdioTransportand start it. - Build handler options:
WithRootsHandler,WithSamplingHandler,WithElicitationHandler. - Call
client.NewClient(transport, opts...)to create the client with handlers registered. - Call
c.Start(ctx)to activate the bidirectional channel.
This ordering is critical. If handlers are registered after Start, the server's roots/list or sampling/createMessage requests would arrive before handlers exist, causing errors.
The three static handlers are simple:
staticRootsHandler: returns a fixed list of filesystem roots (with{{fixture}}substituted).staticSamplingHandler: returns a mock LLM response with configurable text, model name, and stop reason.staticElicitationHandler: returns preset form values, with support for accept/decline/cancel actions.
Fixture Isolation
The problem
MCP servers often modify files on disk. A filesystem server might create, edit, or delete files. A language server might apply refactoring edits. If two assertions share the same fixture directory, one assertion's side effects can break subsequent assertions or produce non-deterministic results.
The solution
Before each assertion executes, isolateFixture() (in fixture.go) copies the entire fixture directory to a unique temporary directory. The assertion receives the path to this copy. After the assertion finishes (pass or fail), the temporary directory is deleted.
Original fixture: ./test-data/
↓ (copy)
Temp copy: /tmp/mcp-assert-fixture-abc123/test-data/
↓ (used by assertion)
↓ (deleted after assertion)
The {{fixture}} template in YAML arguments is replaced with the temp copy path, not the original. This means any file paths the server sees point to the disposable copy.
When isolation is skipped
- If no
--fixtureis provided, there is nothing to isolate. - If
--dockeris used, Docker already provides isolation through fresh containers, so copying is redundant.
Implementation
copyDir() recursively walks the source directory, preserving file permissions and directory structure. Symlinks are not followed. The copy target is placed inside the temp directory with the same base name as the original, so relative paths within the fixture remain valid.
Block Types
Each YAML assertion file uses exactly one block type. The block type determines which MCP protocol method is called and how the response is processed.
1. assert: (tool calls)
The default and most common block. Calls tools/call with a named tool and JSON arguments.
assert:
tool: read_file
args:
path: "{{fixture}}/example.txt"
capture_progress: true # optional: collect notifications/progress
expect:
not_error: true
contains: ["hello world"]
The optional capture_progress: true field registers a notification listener before the tool call that counts notifications/progress messages. This enables the min_progress expectation.
2. assert_prompts: (prompt listing and retrieval)
Tests the prompts/list and prompts/get MCP methods. Set exactly one of list or get.
assert_prompts:
get:
name: "code_review"
arguments:
language: "go"
expect:
contains: ["review"]
For prompts/get, the response text is built by joining the prompt's description and message contents.
3. assert_resources: (resource listing and reading)
Tests resources/list, resources/read, and resource subscriptions.
assert_resources:
read: "test://static/resource"
expect:
not_empty: true
Supports subscribe/unsubscribe fields and expect_notification to verify resource update notifications arrive.
4. assert_completion: (autocompletion)
Tests completion/complete for argument autocompletion on prompts or resources.
assert_completion:
ref:
type: "ref/prompt"
name: "complex_prompt"
argument:
name: "style"
value: ""
expect:
contains: ["formal"]
5. assert_sampling: (sampling-triggered tools)
Tests tools that cause the server to request an LLM completion via sampling/createMessage. The block configures both the tool call and the mock LLM response in one place.
assert_sampling:
tool: ask_llm
args:
question: "What is the capital of France?"
mock_text: "The capital of France is Paris."
mock_model: mock-gpt
expect:
not_error: true
contains: ["Paris"]
This is a convenience wrapper. You can achieve the same result with assert: plus client_capabilities.sampling, but assert_sampling keeps the mock and assertion together.
6. assert_logging: (log level and message capture)
Tests logging/setLevel and captures notifications/message log events during a tool call.
assert_logging:
set_level: debug
tool: echo
args:
message: "test"
expect:
min_messages: 1
contains_level: ["debug"]
contains_data: ["test"]
The runner first calls logging/setLevel, then executes the tool while listening for notifications/message. The logging-specific expect fields (min_messages, contains_level, contains_data) are checked by a dedicated logging checker.
7. trajectory: (tool call sequence validation)
Validates a sequence of tool calls without starting any server. The trace comes from inline YAML (trace: field) or an external JSONL audit log (audit_log: field).
trace:
- tool: prepare_rename
args: { file_path: "main.go", line: 6, column: 6 }
- tool: rename_symbol
args: { file_path: "main.go", new_name: "Entity" }
trajectory:
- type: order
tools: ["prepare_rename", "rename_symbol"]
- type: absence
tools: ["apply_edit"]
Four trajectory assertion types are available:
| Type | What it checks |
|---|---|
order |
Listed tools appear in this sequence (not necessarily adjacent) |
presence |
All listed tools appear at least once |
absence |
None of the listed tools appear |
args_contain |
A specific tool was called with specific argument values (partial match) |
Routing logic
In execute.go, runAssertion() checks block types in priority order using nil checks on the optional block pointers:
trajectory → assert_resources → assert_prompts → assert_completion
→ assert_sampling → assert_logging → assert (default)
Each handler follows the same pattern: validate inputs, create MCP client, initialize, run setup, execute the protocol-specific call, check expectations, return result.
Reporting Pipeline
Results flow through a simple pipeline. The runner collects []assertion.Result from all assertions, then passes the slice to each output sink.
Terminal output (report.go, color.go)
Always produced. Each assertion prints as a single line: status icon, name, and duration. Failed assertions include the error detail on the next line. A summary line at the end shows total/passed/failed/skipped counts.
Color behavior adapts to the environment:
| Condition | Behavior |
|---|---|
| stdout is a TTY | Green checkmarks, red Xs, ANSI color |
| stdout is a pipe | Plain PASS/FAIL/SKIP, no escape codes |
NO_COLOR=1 set |
Plain output regardless of TTY |
TERM=dumb |
Plain output |
A progress indicator ([3/21] assertion name) prints to stderr during execution, so it does not interfere with piped stdout.
JUnit XML (junit.go)
Standard JUnit format consumed by CI systems (GitHub Actions, Jenkins, GitLab). Each assertion becomes a <testcase>. Failed assertions include <failure> elements with the error detail. When --trials > 1, pass@k and pass^k metrics are included as <property> elements.
Markdown (markdown.go)
A GitHub Step Summary table. The ci command auto-detects $GITHUB_STEP_SUMMARY and writes to it. When --trials > 1, a reliability section is appended. The table includes assertion name, status, and duration.
Badge JSON (badge.go)
A shields.io endpoint JSON file (schemaVersion, label, message, color). Host the file via GitHub Pages to get a live pass-rate badge in your README.
Raw JSON (--json)
The full []Result array serialized as JSON to stdout. Useful for programmatic consumption.
Reliability metrics (reliability.go)
When --trials N is used with N > 1, each assertion runs N times. Two metrics are computed:
- pass@k: passed at least once in k trials (measures capability).
- pass^k: passed every time in k trials (measures reliability).
These appear in terminal output, JUnit XML properties, and the markdown table.
Baseline and regression detection (baseline.go)
--save-baseline results.json persists the current results. --baseline results.json on a subsequent run compares against the saved baseline. Only PASS-to-non-PASS transitions are flagged as regressions. Previously-failing tests that still fail are not regressions. New tests not in the baseline are not regressions.
Best-effort writes
All report outputs are best-effort. If writing a JUnit file fails, the error goes to stderr but the run itself does not fail. This prevents flaky CI permissions from blocking test results.
Extension Points
Adding a new assertion type
To add a new expectation (like max_length to check response string length):
- Add the field to
Expectininternal/assertion/types.go:go MaxLength *int `yaml:"max_length"` - Add the check to
Check()ininternal/assertion/checker.go, placing it in the appropriate position in the evaluation order. - Add unit tests in
internal/assertion/checker_test.go.
The checker is pure: it takes a string and returns an error. No I/O, no state. This makes new assertion types trivially testable.
Adding a new block type
To add a new block type (like assert_notifications: for testing arbitrary server notifications):
- Define the block struct in
internal/assertion/types.go:go type NotificationAssertBlock struct { ... } - Add the field to
Assertion:go AssertNotifications *NotificationAssertBlock `yaml:"assert_notifications,omitempty"` - Add a handler function in a new file
internal/runner/notifications.go:go func runNotificationAssertion(a assertion.Assertion, ...) assertion.Result { ... } - Add the routing check in
runAssertion()inexecute.go, following the existing priority pattern:go if a.AssertNotifications != nil { return runNotificationAssertion(a, fixture, timeout, dockerImage, start) }
Adding a new CLI command
- Add the function in a new file in
internal/runner/(e.g.,mycommand.go):go func MyCommand(args []string) error { ... } - Add the dispatch in
cmd/mcp-assert/main.go:go case "mycommand": if err := runner.MyCommand(os.Args[2:]); err != nil { ... } - Update
printUsage()with the new command's documentation.
Adding a new output format
- Add a file in
internal/report/(e.g.,csv.go) with a function that consumes[]assertion.Result. - Add a flag in
commands.gofor the new format. - Call the new function from
writeReports()inutil.go.
Key Design Decisions
One server per assertion. Each assertion starts a fresh MCP server subprocess. This prevents state leakage between tests but means server startup cost is paid per assertion. For fast servers (filesystem, memory) this is negligible. For slow servers (gopls, jdtls) it dominates test duration. The setup block amortizes some of this by allowing warmup calls within a single assertion's server lifetime.
Checker is pure. Check() takes a string and returns an error. No I/O, no state, no side effects. CheckWithSnapshots() adds file comparison but the snapshots are passed in, not read internally. This makes the checker trivially testable.
Transport is pluggable. createMCPClient is a single function that all execution paths use. All three transports return the same MCPClient interface, so the runner never needs to know which transport is active.
Docker is a command wrapper. --docker <image> does not use the Docker SDK. It rewrites the server command to docker run --rm -i .... Since MCP uses stdio, Docker's -i flag provides bidirectional pipe transport. The server runs inside the container; assertions run outside. Docker is only supported with stdio transport.
Color degrades gracefully. TTY detection via os.ModeCharDevice. NO_COLOR env var. TERM=dumb. In CI (pipes), output is plain PASS/FAIL/SKIP with no escape codes.
Setup tools are not counted as "tested" by coverage. The coverage command only counts the assert.tool field, not tools that appear in setup: blocks. This correctly reflects that setup tools are prerequisites, not the subject of the test.