[RFC] Tests using the Everything server #582

richardkmichael · 2025-07-04T03:09:41Z

Add Playwright e2e tests which connect to the reference Everything server.

This implementation is in forks:

GitHub Action run of tests with the Everything server
New structuredContent tool being tested here

👋 @cliffhall @olaservo Your comments here seem to be discussing a similar idea.
👋 @jerome3o-anthropic In your MCP Dev conf talk, you mentioned working on example full-featured servers. This is a tiny contribution, but I thought you might be interested.

Motivation and Context

The Inspector lacks automated testing against real MCP servers, making it difficult to catch regressions and validate new functionality. (Much of the UI depends on a connected server.)

The Everything server provides example implementations of many MCP protocol features.

Together, these create an opportunity for a feedback loop to drive MCP specification validation and compliance:

Inspector needs functionality to test all aspects of MCP
Everything server needs complete implementation of all MCP aspects

I think there is a lot of potential in this direction.

I'm particularly interested in validation and compliance of MCP clients and servers, and how I can help. I know about the focus on validation in the Roadmap, and SDK compliance spec schema.

How Has This Been Tested?

Running in GitHub Actions, sample run

Request for Comments

Seeking feedback on the concept. But specifically:

Approach to e2e testing with external MCP servers
Test coverage priorities and scope
Auth handling approaches for testing scenarios

Current scope

Server connection via STDIO transport
Tool listing and execution of one tool only (structuredContent implements the new MCP 2025-06-18 specification feature)
Error handling scenarios
CI integration with GitHub Actions

Current limitations

Uses clone fork/branch for structuredContent tool (can't npx ... from a sub-package branch)
Separate Playwright config (needs merge with existing configuration)
Auth disabled (token capture not implemented; somewhat complex and unnecessary for initial benefit)

Next Steps

Switch from git clone to npx modelcontextprotocol/server-everything (once structuredContent
tool is merged; otherwise, change the tested tool)
Merge with existing Playwright configuration instead of separate config
Enable opt-in local testing with npx approach
Remaining code cleanups and issues

Future Test Coverage

Expand to cover more MCP specification
- Resources (incl templates and a Tool result with embedded resources), Prompts, Sampling,
  Elicitation, Roots, Change Notifications in various contexts
Expand to cover more Inspector functionality and behaviours
- History, Notifications, UI functionality (navigation, toasts, pane resizing)

Background

I'm curious about plans for the Inspector. I'd like to see it grow not only for debugging, but also learning and teaching MCP.

A few ideas:

tagging and grouping Inspector tests with areas of the spec
command line scripting of the Inspector to validate a given MCP server against [areas of] the spec
- e.g., out-of-repo use of the Inspector to validate an Everything server implemented in each MCP SDK
a web-based MCP "playground" with demos or walk-throughs
- especially once the MCP registry spec finishes; i.e., from the Inspector: search the registry, connect and test
- connect the Inspector to more teaching materials, e.g., "For Server Developers" examples

I switched from Claude.ai to Desktop last winter to use reference MCP servers (filesystem & git) to eliminate copy/paste from Claude to vim. This was huge boost. I read the MCP spec and started writing my own MCP servers, which included tools and dynamic resources. Watching Claude use my own tools got me super fired up about MCP and building. 😄

I was surprised by Claude's lack of automatic use of Resources and awareness of server-level instructions. I found the client feature matrix, where most clients, notably Claude, lack discoverability and various other aspects of MCP. Since I exclusively use Claude, I'd like to see it with complete compliance. As the initiator of MCP, I think it would be great if Anthropic's client(s) were leading in this area.

- Move auto open disable closer to server startup (`playwright test ...` doesn't need that env) - Emit stdout for operational clarity and debugging - Extract common inspector URL to a variable

Implements basic end-to-end tests for the Everything server, focusing on connection setup and tool functionality validation. Current test coverage includes: - Server connection via STDIO transport - Tools listing and discovery - Single tool execution (structuredOutput) with various input scenarios - Error handling for missing required inputs - Proper disconnect handling with expected network error filtering The Everything server implements many MCP protocol features and serves as a comprehensive example server, but this initial test implementation focuses on establishing the test framework and validating core tool execution. Auth is disabled in the test configuration because the server-generated token would need to be parsed from the emitted startup URL, which is complicated if possible. Includes dedicated Playwright configuration for Everything server testing with appropriate timeouts and debugging setup.

Adds GitHub Actions workflow to run e2e tests against the Everything MCP server. The workflow clones both inspector and servers repos, builds the Everything server, and runs Playwright tests against it. Required solving several CI-specific challenges: - Repository layout: Used explicit checkout paths to place inspector and servers repos as siblings, preventing GitHub Actions from nesting servers inside inspector - Server setup: Everything server needs npm install and build before testing - Dependency management: Created custom setup-playwright action to handle package.json location and dependency caching across multiple repos

olaservo · 2025-07-04T03:41:34Z

Hi @richardkmichael I like this idea. I wanted to see if you already knew about the Community Working Groups, there is more info here: https://github.com/modelcontextprotocol-community/working-groups

The reason why I mention it, is that we've been talking on that Discord about putting together a working group for community-driven reference implementation, validation etc. so I think it could be a good place to have more discussions around stuff like this.

richardkmichael · 2025-07-04T04:25:04Z

Hi @richardkmichael I like this idea. I wanted to see if you already knew about the Community Working Groups, there is more info here: https://github.com/modelcontextprotocol-community/working-groups

The reason why I mention it, is that we've been talking on that Discord about putting together a working group for community-driven reference implementation, validation etc. so I think it could be a good place to have more discussions around stuff like this.

Thank you! I didn't know about it, and I will take a look.

richardkmichael added 5 commits July 3, 2025 19:12

Adjust existing Playwright configuration

f51ad09

- Move auto open disable closer to server startup (`playwright test ...` doesn't need that env) - Emit stdout for operational clarity and debugging - Extract common inspector URL to a variable

Add npm scripts for Everything e2e tests

6464aab

Teach Claude how to explore the Inspector UI

1ece6da

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Tests using the Everything server #582

[RFC] Tests using the Everything server #582

Uh oh!

richardkmichael commented Jul 4, 2025

olaservo commented Jul 4, 2025

richardkmichael commented Jul 4, 2025

[RFC] Tests using the Everything server #582

Are you sure you want to change the base?

[RFC] Tests using the Everything server #582

Uh oh!

Conversation

richardkmichael commented Jul 4, 2025

Motivation and Context

How Has This Been Tested?

Request for Comments

Current scope

Current limitations

Next Steps

Future Test Coverage

Background

olaservo commented Jul 4, 2025

richardkmichael commented Jul 4, 2025