Sampling

Sampling refers to the ability of an MCP server to pause execution and ask the MCP client to provide LLM completions to inform its response.

This is helpful to keep all inference on the client side.

Requirements

As with elicitation, you need to configure both a SessionAdapter and ClientRequestAdapter to make sampling work:

const transport = new StreamableHttpTransport({
  sessionAdapter: new InMemorySessionAdapter({ maxEventBufferSize: 1024 }),
  clientRequestAdapter: new InMemoryClientRequestAdapter({ defaultTimeoutMs: 30000 })
});

Example

const FrenchSchema = z.object({});

mcp.tool("frenchness_evaluation", {
  description: "Evaluates how French a host application is",
  inputSchema: FrenchSchema,
  handler: async (args, ctx) => {
    // Check if client supports sampling
    if (!ctx.client.supports("sampling")) {
      throw new Error("This tool requires a client that supports sampling");
    }

    // Request LLM completion through sampling
    const response = await ctx.sample({
      prompt: "What is the capital of France?",
      modelPreferences: {
        hints: [
          {
            "name": "claude-4.5-sonnet"
          }
        ],
        intelligencePriority: 0.8,
        speedPriority: 0.5
      },
      systemPrompt: "You are a wonky assistant.",
      maxTokens: 100
    });

    if ("result" in response && response.result.type === "text") {
      const { content } = response.result;
      const isFrench = content?.toLowerCase().includes("paris");
      return {
        content: [{
          type: "text",
          text: isFrench ? "Pas mal. You might be French enough" : "You are not very French my friend"
        }],
      };
    }

    if ("error" in response) {
      return {
        content: [{
          type: "text",
          text: `Sampling completion failed: ${response.error.message}`,
        }],
      };
    }

    // Unknown case, should not hit this
    throw new Error("Unexpected sampling response");
  },
});

Use Cases

Keep inference client-side: Don’t run your own LLM
Model preferences: Let clients use their preferred models
Cost control: Client handles LLM costs
Consistency: Use the same model the user is already talking to

Next Steps

Elicitation - Request user input
Adapters - Configure adapters