The role of intention in architecting MCP servers

The paradox of building with AI in the loop

As modern software engineers, we’re trained in the agile way: ship fast, iterate based on feedback, let design emerge. We start minimal and evolve based on what we learn from users.

But architecting robust, performant, and useful MCP servers demands the opposite: with LLMs in the loop, the system’s effectiveness is significantly improved by intentionally and explicitly defining constraints upfront. These constraints improve the predictability, discoverability, and usability of our tools. Why?

Because for an LLM to make reasonable choices, it must be carefully guided. They can’t read between the lines, infer unstated intentions, or choose between multiple paths to the same goal. They work purely with (their interpretation of) what we explicitly provide. The context we give them becomes their entire universe of what’s possible.

Because LLMs rely heavily on clarity, precision, and explicit structure, defining intentional constraints upfront is critical.

This fundamentally shifts how we measure success. Instead of evaluating how fast we ship and iterate, now we prioritize testing how reliably our constraints allow LLMs to discover and effectively compose our tools.

The Constraint Paradox

Traditional software development: minimize constraints, maximize flexibility

MCP server development: maximize constraints, maximize flexibility

The clearer our constraints, the more confidently LLMs can compose our tools into helpful solutions.

A competitive landscape amplifies this new reality. Given the great usability that MCP provides, users are able to seamlessly switch between different servers and tools. And, as the ecosystem matures, they will have an increasing number of options from which to choose.

Instead of thinking of this as me suggesting to abandon agile principles 😱, I’d rather 🙏 you saw it as a realization that building with AI in the loop has different rules. Where traditional software rewards emergent flexibility, MCP servers reward intentional constraints.

Absolutely nothing wrong about yolo’ing your way thru your first MCP servers. I’m just wanting to tell you when I did that, I had to reel myself back from a much deeper abyss than it was fun (or profitable) for me, all because I didn’t have this guide that is now in front of your eyeballs! 👁👁

What follows is a practical framework for architecting MCP servers that embraces constraints and upfront intention as competitive advantages. I’ll share concrete pointers for how to architect MCP servers intentionally enough to get considerable gains 💪 quickly, but not so much that it doesn’t feel playful or experimental or productive.

As my Gen Z kid says: “Trust.”

PS:

The core principles I discuss here apply to agentic systems in general, not only MCP servers/tools.

WARNING

No developer soul is meant to be hurt in the process of reading this article. While I know no one would architect a system in the way of some of the examples here, they are still very useful for the contrast needed to highlight the new (old?) thinking for building useful agentic systems.

PS 2:

This article helps developers design agentic systems/MCP servers that LLMs can use effectively. Security patterns are orthogonal to the usability patterns I’m covering here, they have a different intention (protecting systems from misuse) and deserve they own focused treatment. They were left out on purpose.

PS 3:

I don’t show you in this article how to plug business logic and tool definitions into the MCP API handlers. It is out of scope for this article. I find it actually quite trivial (especially with SDKs) once you sort out what is business logic from what is “hook up” code.

Why MCP makes clear intention so very critical

When developers encounter an unexpected behavior with a tool, we experiment, ask questions, read code, read the logs, read docs. I’m kidding!!! We never read the docs.

When machines (you know, the regular kind) encounter unknown inputs, they follow their programming: retry, retry with different options, follow alternative logic paths, or fail gracefully with error codes. How deterministic of them.

When LLMs encounter a vaguely described tool with seemingly ambiguous parameters, they will either:

Bypass the tool entirely (missing its value)
Misuse it with wrong parameters (wasting tokens)
Chain it incorrectly with other tools (breaking workflows)

The impact varies by client type

The unintended impact from tools designed with vague intentions manifests differently based on the type of MCP client.

When humans are in the loop

Every interaction tests clarity: ambiguous tools lead to failed attempts and user frustration
Trust compounds or erodes with each use: consistent behavior builds confidence, unpredictability drives users away
Users develop workarounds for what works, building fragile patterns around specific tool quirks

When MCP clients are agents

Failure modes cascade silently: agents propagate errors downstream without correction
Integration assumptions become permanent contracts: some agent developers might hard-code expectations that break with any change
Latency compounds dramatically: unclear tools trigger multiple attempts with different parameters, multiplying both time and token costs
There’s no forgiveness: agents interpret tools literally, without human ability to infer intent

Tool metadata: what LLMs actually see

When we build an MCP server, our primary users are LLMs. These “users” must:

Select appropriate tools based solely on metadata
Reason about when and how to use them
Compose them into solutions
Work within context window limits

If there is ONE THING that LLMs are terrible at, it’s handling ambiguity. Clear metadata is what enables confident tool selection. Vague metadata, then, forces expensive experimentation on the part of the LLMs.

Clear metadata with intentional boundaries

{
  "name": "search_documents",
  "description": "Search markdown documents by content or title",
  "inputSchema": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Search term to find in documents"
      },
      "searchIn": {
        "type": "string",
        "enum": ["content", "title", "both"],
        "description": "Where to search"
      }
    },
    "required": ["query"]
  }
}

Vague metadata that hurts LLM usage

{
  "name": "process", // Process what?
  "description": "Processes data", // How? What kind?
  "inputSchema": {
    "type": "object",
    "properties": {
      "data": {
        "type": "string",
        "description": "The data" // What format? What content?
      },
      "mode": {
        "type": "string", // What are valid modes?
        "description": "Processing mode"
      }
    }
  }
}

The difference:

Clear: LLMs know exactly when to use search_documents and what parameters to provide
Vague: LLMs must guess what “process” does, what “modes” exist, and what “data” format is expected

Beyond this point, I will demonstrate how to use intention to define boundaries, enable composition, and guide evolution.

The intention statement framework for agentic systems

The beauty of MCP development is we can start with a single tool and grow from there. We don’t need a 50-page specification (didn’t I tell you to trust??!). For maximum precision in your intention, I suggest using The agentic intention framework to define it more completely, and before you write any code:

This MCP server helps [WHO] to [DO WHAT] with [WHAT CONSTRAINTS]
so that [WHY THIS MATTERS]

Examples:

“This MCP server helps individual developers to manage their personal notes with minimal friction so that context-switching doesn’t kill productivity”
“This MCP server helps SREs to debug production issues with read-only access so that they can find root causes without risking further damage”
“This MCP server helps data analysts to explore CSV files with read-only safety so that they avoid costly accidental data corruption
“This MCP server helps developers and technical writers to validate any MCP-related content against official specifications with strict adherence to official definitions on a per-version basis so that they can have high confidence that MCP-related content they read or write is free from misinformation”

Note: The devil is in the precision.

Here’s how to know if your intention is strong:

Weak intention: “This server processes log files” (missing WHO and WHY)
Strong intention: “This server helps SREs to debug production issues faster so that they can reduce system downtime”

Test: Can you measure if you’re succeeding?

Processing log files → What does success look like? Who benefits? Why does it matter?
Helping SREs debug production issues → Measurable: time to root cause, escalation rates, downtime minutes ✓

Tip

I personally always start with the WHY, then play with the WHAT and WHO interchangeably.

A tale of three MCP servers

Let me tell you about Eduardo and Monica (😬 if you know, you know, lolz), and Bruno, three developers who learned this lesson very differently.

Keep in mind:

A tool (handler) can either contain all the code to be executed or simply call another function that serves as an entry point for the actual work. Below, I’m showing only these entry-point functions.

Monica’s FlexiServer: why just one thing?

Monica built FlexiServer, aiming to create the “Swiss Army knife of MCP servers” that could handle text processing tasks, and be extensible:

type OperationType string // stringly-typed
type ProcessConfig map[string]any // a grab bag
type ProcessResult struct {
    Data interface{} `json:"data"` // returns anything
    Type string      `json:"type"` // requires interpretation
}
 
// Process executes any text processing operation based on the operation type.
// The config parameter requirements vary by operation.
// Returns ProcessResult whose structure depends on the operation.
func Process(input string, operation OperationType, config ProcessConfig) (*ProcessResult, error)
 
// Apply runs a processing pipeline on text.
// Each step is executed in sequence with the output of one feeding the next.
// Returns a PipelineResult whose structure varies based on the final step.
func Apply(text string, steps []PipelineStep) (*PipelineResult, error)
 
// Execute performs text operations based on natural language commands.
// Interprets commands like "make this shorter" or "find key points".
// Returns an ExecutionResult with command-specific structure.
func Execute(content string, command NaturalCommand) (*ExecutionResult, error)
 
// Transform modifies text based on the provided ruleset.
// Rules can be regex patterns, templates, or custom transformers.
// Output structure depends on the rule type applied.
func Transform(input TextInput, rules TransformRules) (*TransformOutput, error)
 
// RegisterExtension adds new processing capabilities at runtime.
// Extensions are identified by ExtensionID like "sentiment" or "translate.spanish".
func RegisterExtension(id ExtensionID, handler ExtensionHandler) error

With unconstrained flexibility and ambiguous interface definitions, this design creates uncertainty for LLM usage:

• Tool selection token burn - With Process, Execute, and Transform all modifying text, LLMs waste tokens reasoning through which to use and explaining their choice to users

• Configuration discovery overhead - ProcessConfig map[string]any forces trial-and-error. Each failed attempt adds request + error + retry reasoning to the context window

• Verbose operation explanations - Since OperationType values like “enhance” have no clear meaning, LLMs burn tokens explaining what they think might happen

• Compound ambiguity costs - Execute with natural language commands like “make it better” requires tokens for: interpreting the command + explaining uncertainty + handling unpredictable results

• Runtime capability confusion - RegisterExtension means LLMs can’t cache which operations exist, requiring fresh discovery and adding explanation tokens each session

• Error cascade verbosity - When Process fails due to wrong config keys, the LLM needs tokens to explain the error, guess correct parameters, and retry

• Result interpretation overhead - With ProcessResult.Data interface{}, LLMs waste tokens explaining what type of data they received and how they’re interpreting it, instead of just presenting results

NOTE

The flexible design undermines reliability at every stage: selection (ambiguous boundaries), execution (unpredictable behavior), and integration (inconsistent outputs).

Bruno’s DocInspector: bring my own APIs, all of them

Bruno built DocInspector. He already had a comprehensive REST API for document analytics, so his intention was simple: be efficient and reuse all 47 existing endpoints by mapping each one directly to an MCP tool. Why reinvent the wheel when the APIs were already tested and deployed?

// DocumentWordCount returns the total word count
func DocumentWordCount(path string) (int, error)
 
// DocumentCharacterCount returns the total character count
func DocumentCharacterCount(path string) (int, error)
 
// DocumentLineCount returns the total line count
func DocumentLineCount(path string) (int, error)
 
// DocumentParagraphCount returns the total paragraph count
func DocumentParagraphCount(path string) (int, error)
 
// DocumentHeading1Count returns count of H1 headings
func DocumentHeading1Count(path string) (int, error)
 
// DocumentHeading2Count returns count of H2 headings
func DocumentHeading2Count(path string) (int, error)
 
// DocumentCodeBlockCount returns count of code blocks
func DocumentCodeBlockCount(path string) (int, error)
 
// DocumentPythonCodeBlockCount returns count of Python code blocks
func DocumentPythonCodeBlockCount(path string) (int, error)
 
// DocumentJavaScriptCodeBlockCount returns count of JavaScript code blocks
func DocumentJavaScriptCodeBlockCount(path string) (int, error)
 
// CheckDocumentHasTableOfContents returns true if doc has a TOC
func CheckDocumentHasTableOfContents(path string) (bool, error)
 
// CheckDocumentHasIntroduction returns true if doc has an intro section
func CheckDocumentHasIntroduction(path string) (bool, error)
 
// DocumentReadingTimeInSeconds returns estimated reading time
func DocumentReadingTimeInSeconds(path string) (int, error)
 
// DocumentFleschScore returns Flesch readability score
func DocumentFleschScore(path string) (float64, error)
 
// ... 30+ more ultra-specific tools

With excessive granularity and fragmented operations, this design creates orchestration overhead for LLM usage:

• Tool discovery overhead - Before executing any request, LLMs must parse and evaluate 50+ tool definitions (each 100-200 tokens), consuming 5,000-10,000 tokens just to understand available capabilities

• Orchestration complexity - A “document analysis” request requires the LLM to construct a directed acyclic graph of 15-30 tool calls, reason about dependencies, and maintain execution order

• Context window exhaustion - Each tool call adds ~200-500 tokens (request + response + reasoning). A 20-tool sequence consumes 4,000-10,000 tokens in intermediate state alone, leaving insufficient room for document content

• Latency amplification - Serial dependencies prevent parallelization. If ReadingTimeInSeconds depends on WordCount, and FleschScore depends on both, you have a critical path of 3+ sequential round-trips

• High failure surface area - More tools mean more failure points. When tool 15 of 20 fails, the LLM must decide whether partial results are acceptable or if the entire analysis is compromised

• Semantic ambiguity at scale - LLMs must infer relationships between DocumentHeading1Count, DocumentHeading2Count…DocumentHeading6Count versus a single DocumentHeadingStructure that returns hierarchical data

NOTE

The atomic design degrades user experience at every stage: selection (overwhelming choices), execution (excessive latency), and integration (fragmented results).

Eduardo’s DocQualityAdvisor: why so picky?

Eduardo built DocQualityAdvisor. He had one precise intention: help developers understand and improve their documentation quality with clear, actionable feedback, so that their tools become reliably discoverable and easy to use:

// AnalyzeDocumentationQuality examines technical documentation for common quality issues
// including missing sections, unclear explanations, and structural problems.
// Use this to get specific improvement recommendations for developer documentation.
// Returns a QualityReport with scored issues and actionable suggestions for each problem found.
func AnalyzeDocumentationQuality(path string) (QualityReport, error)
 
// CheckLinkValidity validates all links in documentation, identifying broken links,
// redirects, and invalid anchor references. Use this to ensure all references work correctly.
// Set checkExternal to true to also validate external URLs (slower but more thorough).
// Returns a LinkReport containing broken links, redirect chains, and anchor mismatches.
func CheckLinkValidity(path string, checkExternal bool) (LinkReport, error)
 
// CalculateReadabilityMetrics analyzes text complexity and reading difficulty.
// Use this to ensure documentation matches your target audience's reading level.
// Returns grade level, estimated reading time, and technical jargon density.
func CalculateReadabilityMetrics(path string) (ReadabilityMetrics, error)
 
// ExtractDocumentStructure parses the hierarchical organization of a document.
// Use this to analyze navigation flow and identify structural issues like
// missing sections or imbalanced content depth.
// Returns a DocumentStructure with heading hierarchy, section balance, and navigation tree.
func ExtractDocumentStructure(path string) (DocumentStructure, error)

With intentional constraints and boundaries, this design optimizes for LLM efficiency:

• Deterministic tool selection - Names like CheckLinkValidity and CalculateReadabilityMetrics create clear decision boundaries, minimizing selection tokens because LLMs don’t need exploratory reasoning

• Zero overlap design - Each tool owns exclusive functionality (links OR readability OR structure), preventing token waste because LLMs never compare similar tools

• Predictable parameter contracts - Fixed, typed parameters like (path string, checkExternal bool) eliminate configuration discovery overhead because LLMs can see exactly what’s required from the schema

• Structured return types - LinkReport, ReadabilityMetrics provide consistent schemas, allowing direct result formatting because LLMs know the exact structure in advance

• Composability without coupling - Tools can be called independently or together, enabling simpler orchestration because there are no hidden dependencies between tools

• Context window efficiency - Clear, focused tools need minimal description tokens while providing maximum clarity because each tool does one thing well

NOTE

The design optimizes for LLM effectiveness at every stage: selection (obvious choice), execution (predictable behavior), and integration (composable results).

Six months later

Monica’s FlexiServer:

47 open GitHub issues: “Process returns string but AI expects array”, “What does operation=‘enhance’ actually do?”, “ProcessConfig completely undocumented”
Monica spends more time explaining parameter combinations than adding features
3 frustrated blog posts: “Why I Gave Up on MCP After 2 Weeks”
Average token usage: 3x higher due to LLMs repeatedly guessing valid combinations

Bruno’s DocInspector:

12 open issues: “Why does analyzing a document require 47 API calls?”, “Timing out after 30 seconds”
Bruno’s server works perfectly for direct API users, but fails for LLM interactions
One viral tweet: “MCP servers: Death by a thousand tool calls”
90% of users only use the newly added AnalyzeDocument aggregate tool

Eduardo’s DocQualityAdvisor:

Integrated into 4 major documentation platforms
Pull request: “This is exactly what we needed, each tool does one thing perfectly!”
Featured in MCP showcase: “Example of thoughtful MCP server design that LLMs love”
Community fork expanded it following the same easy to follow The agentic intention framework

The key differences? Eduardo designed for agentic interaction. Monica designed for flexibility. Bruno designed for API reuse.

The pitfalls of unclear intention

After seeing Eduardo, Monica, and Bruno’s six-month outcomes, you might be ready to craft your own clear intention for your MCP servers and tools using The agentic intention framework.

But if you’re still unconvinced that unclear intentions create real problems, here are specific pitfalls across three critical categories. These aren’t theoretical concerns, they’re inevitable challenges for LLMs.

1) LLM interaction pitfalls

a) The tool granularity trap

Without clear intention, developers swing between extremes:

Too granular (Bruno): 50+ atomic tools force LLMs to orchestrate complex call sequences for simple requests.

Too broad (Monica): Ambiguous mega-tools leave LLMs guessing about parameters, outputs, and chaining behavior.

Just right (Eduardo): Each tool serves one specific need with clear boundaries.

b) The discovery paradox

The more flexible the tools, the less discoverable they become:

Monica’s “process any text” → When would an LLM choose this?
Bruno’s “get any metric” → Which of 50 tools to call?
Eduardo’s “find quality issues” → Clear purpose, obvious choice

The paradox: trying to be useful for everything makes you useful for nothing specific.

c) The composition breakdown

Unclear intention sabotages tool composition, imo MCP’s most powerful feature:

Monica: Unpredictable outputs (enhance → ??? → format → ???)
Bruno: Excessive orchestration (4 calls for what should be one)
Eduardo: Natural chaining (structured outputs enable parallelization)

Good composition requires:

Predictable outputs that become inputs
Independence (tool B doesn’t break if tool A fails)
Clear data flow (obvious what chains with what)

2) Technical design pitfalls

a) The context window waste

Every token counts, and unclear designs burn them recklessly:

Discovery: Bruno’s 50+ tool definitions consume 5,000+ tokens before work begins. Monica’s vague descriptions require lengthy explanations. Eduardo’s focused tools need minimal description.

Execution: Bruno needs 20 calls × 200 tokens = 4,000 tokens of orchestration. Monica multiplies tokens through failed attempts and retries. Eduardo uses single-purpose calls with predictable results.

Results: Bruno accumulates 20 small results. Monica requires explanation of ambiguous outputs. Eduardo’s structured results speak for themselves.

b) The hidden dependencies trap

Unclear intention creates non-obvious coupling:

Monica: Runtime registration (RegisterExtension before Process) creates fragile dependencies
Bruno: Each call depends on previous results, filling context with intermediate state
Eduardo: Every tool is self-contained and independently callable

The real cost isn’t the dependencies themselves, it’s forcing LLMs to become more orchestration engines instead of problem solvers, burning tokens on state management rather than solutions.

3) Evolution pitfalls

The feature creep spiral

Without clear intention, every user feature request seems reasonable:

Monica’s spiral:

“Can Process handle JSON?” → Add JSON mode
“What about YAML?” → Add YAML mode
“Can it validate too?” → Add validation modes
“Transform between formats?” → More modes

Result: 47 modes, none work reliably

Bruno’s spiral:

“We need WordCount” → Add endpoint
“Also need CharCount” → Add endpoint
“And LineCount” → Add endpoint

Result: 50+ endpoints, terrible UX

Eduardo’s triage:

“Can it fix the issues it finds?” → No, that’s a different intention
“Add markdown-to-HTML conversion?” → No, that’s transformation not quality analysis
“Add auto-translation?” → No, that’s content generation not quality checking
“Add plagiarism detection?” → No, that’s content verification not documentation quality
“Add SEO optimization?” → No, that’s marketing not documentation quality

Result: Focused tools that excel at their purpose

Upfront, clear intention is our defense against the feature creep spiral. It gives us clarity and permission to say “that’s a great idea for a different MCP server.”

Defining tighter boundaries using intention

The The agentic intention framework can be used as a daily decision-making filter. If every choice we face flows through this framework, we will succeed in keeping our MCP servers focused and effective.

Scope boundaries

Before writing any code, our intention helps us define what belongs in our MCP server and what doesn’t.

The Goldilocks test for server scope

Too broad: “Help with text processing”

Can’t list 10 specific tools without getting vague
LLMs can’t determine when to use it

Too narrow: “Validate Python 3.11 async functions in .md files between 1-5KB”

Too specific to last 6 months
Not enough tools to justify a server

Just right: “Help developers validate code examples in their documentation”

Clear tool ideas: ValidateCode, CheckImports, VerifyOutput
Natural boundaries between validation and fixing
Room to grow with more languages

One server or many?

Signs we need multiple servers:

Different user personas (developers vs. marketing)
Conflicting constraints (read vs. write)
No natural workflow between tool groups
Different core purposes

Signs we need one server:

Tools naturally chain: Parse → Analyze → Report
Same user persona throughout
Shared constraints (all read-only, same file types)
Combined value exceeds individual tools

Example: Eduardo’s decision

Eduardo considered adding documentation fixing to DocQualityAdvisor:

Analyze tools:              Fix tools:
- WHO: Developers          - WHO: Developers ✓
- WHAT: Find issues        - WHAT: Fix issues ✗
- CONSTRAINTS: Read-only   - CONSTRAINTS: Write ✗
- WHY: Understand quality  - WHY: Improve quality ✗

Three mismatches = separate server. DocQualityAdvisor finds problems, DocFixer solves them. They chain beautifully but maintain clear boundaries.

Scoping features

When requests arrive, we can also run them through the filter:

Request: “Can DocQualityAdvisor check Python code examples?”

WHO: ✓ Developers writing docs
WHAT: ✓ Quality includes working examples
CONSTRAINTS: ✓ Still read-only analysis
WHY: ✓ Reduces documentation issues

Decision: Yes, add ValidateCodeExamples

Implementation boundaries

Once we know our scope, intention also guides how we structure our tools.

Tool granularity decisions

// Intention: Help SREs debug production issues quickly and safely
 
// Clear boundaries from intention
func FindErrorPatterns(logs LogQuery) ([]ErrorPattern, error)
// Serves WHO (SREs), enables WHAT (find issues),
// respects CONSTRAINTS (read-only), advances WHY (faster identification)
 
// Would violate intention
func AutoFixErrors(pattern ErrorPattern) error
// Different WHAT (fixing vs debugging), violates CONSTRAINTS (modifies production)

Parameter design choices

Choice: Return all metrics in one call vs separate tools?

One call serves “quick overview” (aligns with WHY: fast understanding)
Separate tools serve “detailed analysis” (aligns with WHAT: thorough checking)

Decision: Both: GetOverview for quick checks, individual tools for deep dives

Communication boundaries

Clear communication helps LLMs understand and use our tools effectively.

Writing tool descriptions

// Weak: Generic description
"Analyzes text for various metrics"
 
// Strong: Intention-aligned description
"Analyzes API documentation to identify quality issues that frustrate developers,
including broken examples, missing parameters, and unclear descriptions"
// - Clear WHO: developers reading API docs
// - Clear WHAT: quality issues
// - Clear WHY: reduce frustration

Error message as guide

// Weak: Technical error
"Error: Invalid parameter type"
 
// Strong: Intention-guiding error
"Error: Code validation requires a markdown file with code blocks.
Supported languages: Python, JavaScript, Go"
// Reinforces WHAT: validating code in documentation
// Clarifies CONSTRAINTS: specific file types

Composition patterns that work

One of MCP’s most powerful features is enabling LLMs to combine tools into solutions. In this section I want to give you ideas for how to craft effective composition patterns.

Three principles for composable tools

1. Predictable contracts Each tool has clear inputs and outputs. When AnalyzeDocumentationQuality returns a QualityReport, the LLM knows exactly what fields it contains and can confidently access .Issues or .Score.

2. Independent operation CheckLinkValidity doesn’t need AnalyzeDocumentationQuality to run first. Each tool is self-contained, taking a document path and returning complete results.

3. Complementary purposes Each tool provides a different lens on the same problem space. Quality analysis, link checking, and readability metrics all contribute to understanding documentation health without overlapping.

Sequential patterns

Sequential composition creates workflows where each tool’s output enhances the next step.

Pattern: Progressive refinement

AnalyzeDocumentationQuality(doc)
→ identifies missing sections
→ CheckLinkValidity(doc, focus_on_sections=identified_sections)
→ checks links in problem areas first
→ GenerateImprovementReport(quality_results, link_results)

Pattern: Filter and focus

ExtractDocumentStructure(doc)
→ identifies all code blocks
→ ValidateCodeExamples(doc, languages=found_languages)
→ validates only relevant languages
→ CalculateCodeCoverage(validation_results)

Each step narrows focus based on previous discoveries, allowing LLMs to explain their reasoning clearly.

Parallel patterns

Parallel composition leverages independence for efficiency.

Pattern: Comprehensive analysis

Parallel:
├── AnalyzeDocumentationQuality(doc)
├── CheckLinkValidity(doc)
├── CalculateReadabilityMetrics(doc)
└── ExtractDocumentStructure(doc)

Then: CombineIntoReport(all_results)

Pattern: Multi-perspective validation

For each code example in parallel:
├── CheckSyntax(example)
├── CheckImports(example)
├── CheckOutput(example)
└── CheckComplexity(example)

No tool depends on another’s output. If one fails, others still provide value, allowing LLMs to optimize for speed without worrying about ordering.

The role of structured outputs

Structured outputs enable reliable composition. When tools return structured data:

LLMs know what fields are available
Results can be filtered intelligently
Tools can be chained predictably

// Structured output with clear fields for downstream consumption
type QualityReport struct {
    Score           float64
    Issues          []Issue
    MissingSections []string
}
 
// Organized output - ready for composition
type LinkReport struct {
    BrokenLinks []BrokenLink
    BySection   map[string][]BrokenLink  // Pre-grouped by document section
    BySeverity  map[string][]BrokenLink  // "critical", "warning", "info"
}

Each tool can efficiently access the exact subset of data it needs:

The MCP client asks: “What critical links need fixing in the API docs?”
LLM combines: report.BySection["api-docs"] ∩ report.BySeverity["critical"]
No filtering through hundreds of links needed

Composition best practices

1. Design tools that transform, not just extract

AnalyzeReadability() ReadabilityReport provides rich data for next steps
- Better than simple GetWordCount() int

2. Organize outputs for easy filtering Pre-group data by common use cases (by section, by severity) so downstream tools can efficiently access what they need.

3. Provide both summary and detail Let LLMs choose the appropriate level for their current task.

type QualityReport struct {
    Summary     Summary      // For quick decisions
    Details     []Issue      // For deep analysis
    Suggestions []Suggestion // For next steps
}

4. Use consistent identification All tools should use the same parameter names and types for common inputs like file paths.

TIP

Useful composition doesn’t happen by stacking iterations, it emerges from clear intentions, predictable contracts, and precise thinking about how tools can work together.

When intention meets reality, aka iterations

How do we evolve without losing sight of the intention we started out with? How do we adapt to real needs without becoming Monica’s FlexiServer?

Evolving without losing focus

The key to healthy evolution is treating our intention as a compass, not a cage. It guides direction while allowing for growth. Every successful MCP server evolves, but the ones that thrive do so deliberately.

Signs of healthy evolution:

New tools make existing ones more valuable
Original users get more power without more complexity
Each addition serves the core intention better
LLMs can still explain the server’s purpose in one sentence

Signs of drift:

New tools serve different user types
Original tools feel disconnected from new ones
We’re adding parameters to make tools do double duty
LLMs hedge when describing what the server does

The three patterns for evolution: Deepen, Fork, Pivot

1. Deepen (most common)

Intention stays the same, but we serve it better:

// V1: Help developers understand code
func ExplainFunction(name string) (string, error)
 
// V2: Same intention, deeper capability
func ExplainFunction(name string, detail Level) (Explanation, error)
func VisualizeCallGraph(name string) (GraphData, error)
func ExplainWithExamples(name string) (ExplanationWithCode, error)

Deepening feels natural because:

Original users get more value
New tools complement existing ones
The core intention gets stronger, not diluted

2. Fork (when users pull in new directions)

Users want something adjacent but different:

// Original: DocQualityAdvisor - helps understand docs
// Users want: "Can it fix the broken links it finds?"
 
// Wrong: Add fixing to DocQualityAdvisor (breaks read-only constraint)
// Right: Create DocFixer as a sibling server

Forking preserves clarity:

Original server stays focused
New server has its own clear intention
They compose beautifully together
Neither suffers from scope creep

3. Pivot (rare, when intention was wrong)

Only when we discover our original intention missed the mark entirely:

// Started: "Help developers write better commit messages"
// Discovered: They actually needed "Help developers understand what changed"
// Pivot: Refocus on change analysis, not message writing

Pivoting requires honesty:

Admit the original intention was off
Define a new, clearer intention
Potentially rename/rebrand
Communicate the change clearly

Version evolution example

Let’s follow a server through realistic growth:

Version 1.0: Personal use

Intention: Help me find security issues in my Node.js projects
Tools:
- FindHardcodedSecrets()
- CheckDependencyVulnerabilities()
- IdentifyInsecurePatterns()

Version 2.0: Team adoption (Deepen)

Same intention, expanded for team needs:
- FindHardcodedSecrets(severity Level)
- CheckDependencyVulnerabilities(includeDevDeps bool)
- IdentifyInsecurePatterns(customRules []Rule)
- GenerateSecurityReport() // New: aggregate findings

Version 3.0: Adjacent need emerges (Fork decision)

Users: "Can it automatically update vulnerable dependencies?"

Decision point:
- Updating != Finding (different WHAT)
- Requires write access (different CONSTRAINT)
- Solution: Fork into SecurityScanner + SecurityFixer

Each version deepened value for the original use case without losing focus.

When to say no

The hardest part of evolution might be saying no to good ideas that don’t fit. Here’s some guidance:

Immediate no:

Violates core constraints (write access for read-only tool)
Serves different users (enterprise features for personal tool)
Requires architecture change (real-time for batch tool)

Consider forking when:

Great idea but different intention
Would help users but breaks existing patterns
Valuable but changes core assumptions

Consider deepening when:

Makes existing tools more powerful
Serves same users better
Respects all constraints
Natural extension of current capabilities

Real example: Eduardo’s evolution decisions

Remember Eduardo’s decision to keep “fixing” separate from “analysis”? Here’s how he applied the same filter to feature requests:

Request: "Add README template generator" → No, generating != analyzing
Request: "Add API example validation" → Yes, code quality IS documentation quality

The filter works consistently: features that serve the core intention get deepened, others get redirected.

Conclusion: trust the process

We’ve seen three developers, three approaches, three outcomes. Monica chased flexibility and created confusion. Bruno mapped existing APIs and created fragmentation. Eduardo started with intention and created clarity.

And the difference really wasn’t skill or effort, it was mindset and process.

The path forward

Building MCP servers with AI in the loop requires us to flip our hard-earned instincts. Before we minimized constraints to stay adaptive, to favor continuous flexibility through iteration. Now we maximize constraints upfront to maximize effectiveness.

The process needed is relatively simple, but requires precision:

Start with clear intention: before writing code, write the WHO, WHAT, CONSTRAINTS, and WHY
Let intention guide every decision: server and tool boundaries, feature requests, error messages
Intentional composition through clarity: predictable tools with structured outputs work together naturally
Evolve deliberately: deepen to serve better, fork to serve different, pivot when necessary

Your turn

The next time you sit down to build an MCP server (or any agentic system):

Write your intention statement before your next commit
Put it in your README as the first thing users see
Use it to evaluate every decision: “Does this serve our intention?”
Let it guide your evolution: deepen rather than broaden

Building an MCP server? Start with WHY, then WHO, then WHAT. The HOW will become obvious.

Carlisia Campos

Recent Thinking

The role of intention in architecting MCP servers

The paradox of building with AI in the loop

Why MCP makes clear intention so very critical

The impact varies by client type

When humans are in the loop

When MCP clients are agents

Tool metadata: what LLMs actually see

Clear metadata with intentional boundaries

Vague metadata that hurts LLM usage

The intention statement framework for agentic systems

A tale of three MCP servers

Monica’s FlexiServer: why just one thing?

Bruno’s DocInspector: bring my own APIs, all of them

Eduardo’s DocQualityAdvisor: why so picky?

Six months later

The pitfalls of unclear intention

1) LLM interaction pitfalls

a) The tool granularity trap

b) The discovery paradox

c) The composition breakdown

2) Technical design pitfalls

a) The context window waste

b) The hidden dependencies trap

3) Evolution pitfalls

The feature creep spiral

Defining tighter boundaries using intention

Scope boundaries

The Goldilocks test for server scope

One server or many?

Scoping features

Implementation boundaries

Tool granularity decisions

Parameter design choices

Communication boundaries

Writing tool descriptions

Error message as guide

Composition patterns that work

Three principles for composable tools

Sequential patterns

Parallel patterns

The role of structured outputs

Composition best practices

When intention meets reality, aka iterations

Evolving without losing focus

The three patterns for evolution: Deepen, Fork, Pivot

1. Deepen (most common)

2. Fork (when users pull in new directions)

3. Pivot (rare, when intention was wrong)

Version evolution example

When to say no

Conclusion: trust the process

The path forward

Your turn

Graph View

Table of Contents

Mentioned by