Skip to main content
← All posts

MCP Servers: Bridging AI Agents and Production Systems

by Royce Carbowitz
AI Engineering
MCP
Developer Tools
Automation

AI coding agents are remarkably capable at reading, writing, and reasoning about code. But they operate in a sealed environment by default. They can see the files in your repository, parse your terminal output, and generate new code based on what they observe. What they cannot do, without additional infrastructure, is reach beyond the repository boundary to interact with the live systems your code actually serves. They cannot query your bug tracker for the latest open issues, pull real-time analytics from your dashboard, or trigger a deployment pipeline to ship what they just built.

This gap between what AI agents can reason about and what they can act on has been a persistent limitation. The Model Context Protocol, or MCP, eliminates that gap entirely. Over the past several months I’ve built MCP servers for both Pinpoint and the SPOQ methodology, and the results have fundamentally changed how I think about AI-assisted development. This post walks through what MCP is, how it works architecturally, and the concrete lessons I’ve learned from putting MCP servers into production.

What Is the Model Context Protocol and Why Does It Matter?

MCP is an open standard created by Anthropic that defines how AI coding assistants connect to external tools and data sources. It provides a structured protocol for AI agents to discover, invoke, and receive results from tools that live outside the model’s native context window. Instead of an AI agent being limited to the files it can see in your repository, MCP lets it interact with production databases, project management systems, deployment pipelines, monitoring dashboards, and any other system you expose through an MCP server.

The protocol matters because it solves the “last mile” problem of AI-assisted development. Consider what happens when you ask an AI agent to fix a bug. Without MCP, the agent can read the code, identify a likely fix, and generate a patch. But it cannot verify whether the fix actually addresses the reported issue because it has no way to read the original bug report from your issue tracker. It cannot check whether similar bugs have been filed before. It cannot run the fix against a staging environment. It cannot update the ticket status after the fix ships. Every one of these actions requires reaching beyond the codebase, and MCP is the bridge that makes that reach possible.

What separates MCP from earlier approaches like custom tool integrations or API wrapper scripts is standardization. Before MCP, every team that wanted to give their AI agents access to external systems had to build bespoke integrations. Each integration used its own protocol, its own authentication model, its own error handling patterns. MCP standardizes all of that into a single protocol that any compatible AI assistant can consume. Build an MCP server once, and it works with Claude Code, Cursor, and any other client that speaks the protocol.

I first encountered MCP while looking for a way to let AI agents interact with Pinpoint’s testing platform during SPOQ orchestration runs. The agents were generating code, but they had no way to verify their work against the QA system that the engineering team actually relied on. MCP gave me a clean path to solve that problem without hacking together fragile shell scripts or API wrappers.

How Does an MCP Server Architecture Work?

An MCP server exposes three categories of capabilities to AI agents: tools, resources, and prompts. Each category serves a distinct purpose, and understanding the differences is essential to designing effective MCP integrations. Tools are executable actions the agent can invoke, like “create a bug report” or “fetch the latest test results.” Resources are data sources the agent can read from, like documentation files or configuration records. Prompts are structured workflow templates that guide the agent through complex multi-step operations.

The communication layer uses JSON-RPC, which provides a lightweight request-response protocol that maps naturally to how AI agents interact with tools. When an agent starts a session with an MCP server, it first performs a discovery step. The server responds with a manifest listing every tool, resource, and prompt it offers, along with typed schemas describing the inputs and outputs for each one. The agent uses these schemas to understand what each tool does, what parameters it requires, and what shape of data it will return. This discovery mechanism is what makes MCP self-documenting. The agent does not need prior knowledge of your server’s capabilities because the server advertises them at runtime.

Each tool definition includes a JSON Schema describing its input parameters. These schemas serve double duty: they validate incoming requests to catch malformed calls, and they provide the AI agent with enough context to construct correct invocations. A well-designed schema includes not just type constraints but also human-readable descriptions for each parameter. The agent reads these descriptions when deciding how to use the tool, so the quality of your schema descriptions directly affects how effectively the AI can leverage your server.

When the agent invokes a tool, the server receives the JSON-RPC request, executes the underlying logic (which might involve calling external APIs, querying databases, or triggering workflows), and returns a structured result. The result flows back into the agent’s context window, where it becomes part of the reasoning chain. This means the agent can use tool results to inform subsequent decisions, creating a feedback loop where the agent adapts its behavior based on real-world data rather than static code analysis alone.

What Tool Design Patterns Produce the Best AI Agent Interactions?

Tools should be atomic and self-describing. This is the single most important design principle I’ve learned from building MCP servers. An atomic tool does one thing completely. It does not require the agent to call a second tool to finish the job. It does not leave the system in a half-completed state if something fails. Self-describing means the tool’s name, parameter descriptions, and return value documentation are clear enough that an AI agent can figure out when and how to use it without any external documentation.

Consider the difference between a tool called “manageBugs” that accepts a “mode” parameter for create, read, update, and delete operations versus four separate tools: “createBug,” “getBug,” “updateBug,” and “listBugs.” The second approach produces dramatically better agent interactions. Each tool has a clear purpose, a focused parameter set, and an unambiguous return type. The agent never has to reason about which mode to pass or how the tool’s behavior changes based on a flag. Decomposition at the tool level mirrors the same principle that makes task decomposition effective in SPOQ: smaller, well-defined units of work produce better outcomes than large, overloaded ones.

Input schemas should include descriptions that the AI can read and use for decision-making. When I defined a tool for creating test requests in Pinpoint’s MCP server, I included descriptions like “The URL of the page to test. Must be a fully qualified URL including the protocol (https://)” rather than just typing the field as a string. That description gives the agent enough context to construct a valid URL rather than passing a bare domain name or relative path. Every parameter description is an opportunity to guide the agent toward correct usage, so I treat them with the same care I’d give to API documentation for human consumers.

Error messages are another critical design surface. When a tool invocation fails, the error message should guide the AI toward the correct usage rather than simply reporting what went wrong. Instead of returning “Invalid parameter,” return “The projectId parameter must be a UUID in the format xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx. You can find valid project IDs by calling the listProjects tool.” This kind of actionable error message lets the agent self-correct on the next attempt rather than getting stuck in a retry loop with the same invalid input.

Avoid tools that require multi-step state. If a tool creates a resource and returns an ID that must be passed to a second tool to activate it, you’ve introduced a stateful dependency that agents handle poorly. Agents work best when each tool call is independent and complete. If your workflow genuinely requires multiple steps, consider using an MCP prompt to define the full sequence. Prompts give the agent a structured playbook to follow, reducing the chance of it losing track of intermediate state across multiple tool calls.

How Should Credential and Security Concerns Be Handled?

Security in MCP servers requires a different mental model than traditional API security because the consumer is an AI agent rather than a human user. The agent will use whatever credentials it has access to, and it will use them in whatever way its reasoning suggests is appropriate. This means your security boundaries must be enforced at the server level rather than relying on the client to exercise judgment about what actions are appropriate.

Runtime credential configuration is the foundation. Credentials should never be hardcoded in the MCP server itself. Instead, they should be injected through environment variables or configuration files at startup time. This approach lets you run the same server with different permission levels depending on the context. A development instance might use credentials scoped to a sandbox environment, while a production instance uses credentials with access to live data. The server code remains identical in both cases because the credential boundary is external.

Scoped API tokens with minimum permissions are essential. When I built the Pinpoint MCP server, I created dedicated API tokens that could read bug reports and create test requests but could not modify user accounts, change billing settings, or delete production data. Even if the AI agent decided that deleting old test data would be a helpful optimization, the token simply would not allow it. This principle of least privilege is standard security practice, but it becomes even more important when the consumer is an autonomous agent that might reason itself into performing actions you did not anticipate.

Audit logging of all tool invocations provides visibility into exactly what the AI agent is doing with your production systems. Every tool call should be logged with the tool name, input parameters, timestamp, and result status. This logging serves two purposes: it creates an accountability trail for compliance and debugging, and it gives you data to analyze the agent’s usage patterns. If you notice the agent calling a particular tool with unexpected parameters, that’s a signal that either your tool design needs refinement or the agent’s reasoning about the tool is flawed.

Separation of read and write operations adds another layer of safety. I structure my MCP servers so that read-only tools and write tools are clearly delineated. Some deployment configurations expose only the read tools, giving the agent visibility into production data without the ability to modify anything. This is particularly useful during development and testing, where you want the agent to have access to real data for context but do not want it making changes to production systems while you iterate on the server implementation.

What Real-World Use Cases Demonstrate MCP Server Value?

Pinpoint’s MCP server was the first production implementation I built, and it demonstrates how MCP transforms QA workflows. Pinpoint is a testing platform that integrates expert manual testing into engineering CI/CD pipelines. Before the MCP server existed, AI agents building features through SPOQ orchestration had no way to interact with the QA layer. They would generate code, write unit tests, and pass validation gates, but verifying their work against manual QA required a human to log into Pinpoint, create test requests, wait for results, and relay findings back to the agent.

With the MCP server, that entire loop became autonomous. The agent can now call tools to list open bugs for a project, read the details of specific bug reports, create new test requests targeting the pages it modified, and check the status of pending test runs. When the QA team flags an issue, the agent can read the bug report directly and begin working on a fix without waiting for a human intermediary. This reduced the feedback cycle from hours to minutes because the agent no longer sits idle while a human manually shuttles information between systems.

The SPOQ MCP server serves a different but equally valuable purpose. SPOQ orchestration involves parsing epic YAML files, computing dependency graphs, dispatching agents in waves, and scoring validation results. Before the MCP server, all of these operations required running CLI commands or manually invoking scripts. The MCP server exposes each operation as a discrete tool: parse an epic file, validate a plan against the scoring rubric, compute wave assignments from the dependency graph, and manage task status transitions.

What made this particularly powerful was combining MCP prompts with tools. I defined prompts that walk an agent through complete SPOQ workflows. The “epic-planning” prompt, for example, guides an agent through the entire planning process: analyze the feature request, decompose it into tasks, define dependencies, generate the YAML file, and then validate the plan against the scoring rubric. Each step in the prompt references specific tools, so the agent knows exactly which tool to call at each stage. This structured guidance eliminates the ambiguity that causes agents to make poor decisions during complex multi-step operations.

Both implementations demonstrate the same core principle: MCP servers are most valuable when they give AI agents access to the operational systems that surround the codebase. The code itself is only one piece of the software delivery puzzle. Bug trackers, testing platforms, deployment pipelines, monitoring dashboards, and project management tools all contain information and capabilities that agents need to do their work effectively. MCP makes all of that accessible through a single standardized protocol.

What Lessons Emerged from Building MCP Servers for Both Pinpoint and SPOQ?

Choosing the right implementation language matters more than you might expect. I built Pinpoint’s MCP server in TypeScript because Pinpoint is a web platform with a Node.js backend, and sharing types, validation logic, and API client libraries between the main application and the MCP server eliminated significant duplication. The SPOQ MCP server, on the other hand, is written in Python because SPOQ’s data processing workflows involve YAML parsing, graph computation, and scoring algorithms that align more naturally with Python’s ecosystem. The lesson is that your MCP server should live in whatever language ecosystem your underlying system already occupies. Fighting that alignment creates friction during development and maintenance.

Tool naming conventions have a measurable impact on AI discoverability. When I initially named Pinpoint’s bug retrieval tool “fetchBugReport,” agents used it correctly about 70% of the time. After renaming it to “getBugById” and adding a companion “listBugsForProject” tool, correct usage jumped to over 95%. The pattern that works best follows REST-like conventions: verb-noun for actions (createTestRequest, updateTaskStatus) and list-noun-by-qualifier for queries (listBugsByProject, getEpicByName). AI agents have been trained on millions of API interactions using these conventions, so following them lets the agent leverage its existing knowledge.

Prompts as structured workflows turned out to be the most underappreciated feature of MCP. When I first built the servers, I focused almost entirely on tools and treated prompts as an afterthought. That was a mistake. Prompts transform MCP from a collection of discrete tools into a guided workflow engine. A well-designed prompt chains together multiple tool calls in a logical sequence, providing the agent with context about why each step matters and what to do with the results. Without prompts, the agent has to figure out the workflow on its own, which works for simple operations but breaks down for complex multi-step processes.

The SPOQ server’s epic-planning prompt is a good example. It defines a six-step sequence: understand the requirements, decompose into tasks, define dependencies, generate YAML, validate the plan, and iterate on any scoring failures. Each step includes guidance about what constitutes good output and what common mistakes to avoid. Agents following this prompt consistently produce higher-quality epic plans than agents that have access to the same tools without the prompt’s structured guidance. The prompt acts as institutional knowledge encoded in a format the agent can consume.

Error recovery patterns differ between MCP and traditional API development. In a typical API, the client is a human developer who reads error messages, diagnoses the issue, and adjusts their approach. In MCP, the client is an AI agent that will attempt to interpret the error and retry automatically. This means your error handling needs to be more prescriptive. Instead of returning generic HTTP status codes, return structured error objects that include a suggested corrective action. I learned this the hard way when an agent entered a retry loop calling the same tool with the same invalid parameters because the error message told it what went wrong but not how to fix it.

Finally, versioning your MCP server matters because AI agents cache the tool manifest from their initial discovery call. If you add a new tool or change the schema of an existing one, agents that connected before the change will not see the update until they reconnect. I now version my MCP servers explicitly and include the version in the server manifest. When I deploy a breaking change, I can detect clients running against an outdated manifest and return a clear error directing them to reconnect. This is a small detail that prevents confusing failures in production.

Related Posts

Interested in connecting your AI development tools to production systems? Schedule a conversation and I’ll walk you through how MCP servers can accelerate your team’s AI-assisted workflows.

← All posts