Skip to content

[RFC] Tests using the Everything server #582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

richardkmichael
Copy link
Contributor

Add Playwright e2e tests which connect to the reference Everything server.

This implementation is in forks:

👋 @cliffhall @olaservo Your comments here seem to be discussing a similar idea.
👋 @jerome3o-anthropic In your MCP Dev conf talk, you mentioned working on example full-featured servers. This is a tiny contribution, but I thought you might be interested.

Motivation and Context

The Inspector lacks automated testing against real MCP servers, making it difficult to catch regressions and validate new functionality. (Much of the UI depends on a connected server.)

The Everything server provides example implementations of many MCP protocol features.

Together, these create an opportunity for a feedback loop to drive MCP specification validation and compliance:

  • Inspector needs functionality to test all aspects of MCP
  • Everything server needs complete implementation of all MCP aspects

I think there is a lot of potential in this direction.

I'm particularly interested in validation and compliance of MCP clients and servers, and how I can help. I know about the focus on validation in the Roadmap, and SDK compliance spec schema.

How Has This Been Tested?

Running in GitHub Actions, sample run

Request for Comments

Seeking feedback on the concept. But specifically:

  1. Approach to e2e testing with external MCP servers
  2. Test coverage priorities and scope
  3. Auth handling approaches for testing scenarios

Current scope

  • Server connection via STDIO transport
  • Tool listing and execution of one tool only (structuredContent implements the new MCP 2025-06-18 specification feature)
  • Error handling scenarios
  • CI integration with GitHub Actions

Current limitations

  • Uses clone fork/branch for structuredContent tool (can't npx ... from a sub-package branch)
  • Separate Playwright config (needs merge with existing configuration)
  • Auth disabled (token capture not implemented; somewhat complex and unnecessary for initial benefit)

Next Steps

  1. Switch from git clone to npx modelcontextprotocol/server-everything (once structuredContent
    tool is merged; otherwise, change the tested tool)
  2. Merge with existing Playwright configuration instead of separate config
  3. Enable opt-in local testing with npx approach
  4. Remaining code cleanups and issues

Future Test Coverage

  • Expand to cover more MCP specification
    • Resources (incl templates and a Tool result with embedded resources), Prompts, Sampling,
      Elicitation, Roots, Change Notifications in various contexts
  • Expand to cover more Inspector functionality and behaviours
    • History, Notifications, UI functionality (navigation, toasts, pane resizing)

Background

I'm curious about plans for the Inspector. I'd like to see it grow not only for debugging, but also learning and teaching MCP.

A few ideas:

  • tagging and grouping Inspector tests with areas of the spec
  • command line scripting of the Inspector to validate a given MCP server against [areas of] the spec
    • e.g., out-of-repo use of the Inspector to validate an Everything server implemented in each MCP SDK
  • a web-based MCP "playground" with demos or walk-throughs
    • especially once the MCP registry spec finishes; i.e., from the Inspector: search the registry, connect and test
    • connect the Inspector to more teaching materials, e.g., "For Server Developers" examples

I switched from Claude.ai to Desktop last winter to use reference MCP servers (filesystem & git) to eliminate copy/paste from Claude to vim. This was huge boost. I read the MCP spec and started writing my own MCP servers, which included tools and dynamic resources. Watching Claude use my own tools got me super fired up about MCP and building. 😄

I was surprised by Claude's lack of automatic use of Resources and awareness of server-level instructions. I found the client feature matrix, where most clients, notably Claude, lack discoverability and various other aspects of MCP. Since I exclusively use Claude, I'd like to see it with complete compliance. As the initiator of MCP, I think it would be great if Anthropic's client(s) were leading in this area.

 - Move auto open disable closer to server startup (`playwright test ...` doesn't need that env) - Emit stdout for operational clarity and debugging - Extract common inspector URL to a variable
 Implements basic end-to-end tests for the Everything server, focusing on connection setup and tool functionality validation. Current test coverage includes: - Server connection via STDIO transport - Tools listing and discovery - Single tool execution (structuredOutput) with various input scenarios - Error handling for missing required inputs - Proper disconnect handling with expected network error filtering The Everything server implements many MCP protocol features and serves as a comprehensive example server, but this initial test implementation focuses on establishing the test framework and validating core tool execution. Auth is disabled in the test configuration because the server-generated token would need to be parsed from the emitted startup URL, which is complicated if possible. Includes dedicated Playwright configuration for Everything server testing with appropriate timeouts and debugging setup.
Adds GitHub Actions workflow to run e2e tests against the Everything MCP server. The workflow clones both inspector and servers repos, builds the Everything server, and runs Playwright tests against it. Required solving several CI-specific challenges: - Repository layout: Used explicit checkout paths to place inspector and servers repos as siblings, preventing GitHub Actions from nesting servers inside inspector - Server setup: Everything server needs npm install and build before testing - Dependency management: Created custom setup-playwright action to handle package.json location and dependency caching across multiple repos
@olaservo
Copy link
Member

olaservo commented Jul 4, 2025

Hi @richardkmichael I like this idea. I wanted to see if you already knew about the Community Working Groups, there is more info here: https://github.com/modelcontextprotocol-community/working-groups

The reason why I mention it, is that we've been talking on that Discord about putting together a working group for community-driven reference implementation, validation etc. so I think it could be a good place to have more discussions around stuff like this.

@richardkmichael
Copy link
Contributor Author

Hi @richardkmichael I like this idea. I wanted to see if you already knew about the Community Working Groups, there is more info here: https://github.com/modelcontextprotocol-community/working-groups

The reason why I mention it, is that we've been talking on that Discord about putting together a working group for community-driven reference implementation, validation etc. so I think it could be a good place to have more discussions around stuff like this.

Thank you! I didn't know about it, and I will take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants