MCP, Agents, Skills, Subagents: The Definitive Guide to AI's New Building Blocks

You've heard the terms. You've seen the Twitter threads. You've watched three conference talks and read five blog posts, and you're still not sure when to use an MCP server versus a skill versus a subagent versus just giving your agent a tool.

You are not alone. The AI tooling ecosystem in 2026 has a terminology problem. Four concepts — MCP, Agents, Skills, and Subagents — are used interchangeably by people who should know better. They are not interchangeable. They solve different problems, operate at different layers, and compose in specific ways. Confusing them leads to architectures that are either painfully over-engineered or dangerously under-specified.

This guide is the fix. By the end, you will have a precise mental model for each concept, know exactly when to reach for which, and have production-ready patterns for combining them. No hand-waving. No "it depends." Concrete architecture, concrete code, concrete decisions.

The Four Building Blocks: A 60-Second Overview

Before we go deep, here is the entire mental model in one diagram:

The four building blocks: agents reason, subagents specialize, skills package expertise, MCP servers connect to the world.

Here's the one-liner for each:

Concept	What It Is	Analogy
MCP Server	A standardized interface that connects AI to external tools and data	A USB-C port on a device
Agent	An LLM that can reason, plan, and take actions in a loop	A senior engineer working on a task
Skill	A reusable bundle of instructions, knowledge, and workflows	A playbook or runbook
Subagent	A specialized agent that a parent agent delegates tasks to	A team member with a specific expertise

If that table is enough for you, great. But the devil is in how these compose, when each is the right choice, and what happens when you pick the wrong one. That's what the rest of this guide is about.

Part 1: MCP — The Universal Connection Layer

Already read our MCP protocol guide?

This section covers MCP from an architectural perspective — how it fits into the agent ecosystem. For the protocol specification, JSON-RPC internals, and building your first server, see our full MCP developer guide.

What MCP Actually Is

Model Context Protocol is a connection standard. It does not reason. It does not plan. It does not decide anything. It is pure plumbing — an open protocol (JSON-RPC 2.0 over stdio or HTTP) that lets any AI application talk to any external system through a unified interface.

An MCP server exposes three types of things:

Tools — Functions the AI can call (query a database, create a PR, send a message)
Resources — Data the AI can read (files, records, live system state)
Prompts — Reusable prompt templates for common workflows

The critical insight: MCP servers are dumb. They don't know what the AI is trying to accomplish. They don't make decisions. They execute what they're told and return results. All intelligence lives in the agent that calls them.

The MCP Ecosystem Today

The numbers tell the story: 97 million monthly SDK downloads, 13,000+ public servers on GitHub, and first-party support in Claude, ChatGPT, VS Code Copilot, Cursor, Windsurf, and Zed. MCP is not experimental. It is infrastructure.

The MCP ecosystem — multiple AI hosts connecting to multiple servers through one standardized protocol.

When to Build an MCP Server

Build an MCP server when you want any AI application to be able to interact with your system. The key word is "any." If you only need one specific agent to call one specific API, a direct tool call is simpler. MCP's value is the write-once-use-everywhere guarantee.

Build an MCP server when:

Multiple AI tools (Claude, Copilot, Cursor) need access to the same system
You want a clean, standardized interface that survives model changes
You are exposing internal tools to your organization's AI stack
You want to publish a public integration others can use

Don't build an MCP server when:

You have a single agent talking to a single API — use a direct tool definition
The "tool" is just an LLM prompt — use a skill instead
You need complex, multi-step reasoning — that's an agent's job, not a server's

MCP Server Anatomy: What a Real One Looks Like

Here's a production-grade MCP server pattern for an internal ticket system — not a weather demo, but the kind of thing you'd actually build at work:

from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent, Resource
import httpx
 
app = Server("ticket-system")
 
# Tools — actions the AI can take
@app.list_tools()
async def list_tools() -> list[Tool]:
    return [
        Tool(
            name="search_tickets",
            description=(
                "Search support tickets by status, assignee, or keyword. "
                "Returns ticket ID, title, status, and assignee. "
                "Use this when users ask about open issues, bug reports, "
                "or the status of specific features."
            ),
            inputSchema={
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"},
                    "status": {
                        "type": "string",
                        "enum": ["open", "in_progress", "resolved", "closed"],
                        "description": "Filter by ticket status"
                    },
                    "limit": {
                        "type": "integer",
                        "default": 10,
                        "description": "Max results to return"
                    }
                },
                "required": ["query"]
            }
        ),
        Tool(
            name="create_ticket",
            description=(
                "Create a new support ticket. Requires title and description. "
                "Priority defaults to 'medium'. Only create tickets when the "
                "user explicitly asks — never speculatively."
            ),
            inputSchema={
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "description": {"type": "string"},
                    "priority": {
                        "type": "string",
                        "enum": ["low", "medium", "high", "critical"],
                        "default": "medium"
                    }
                },
                "required": ["title", "description"]
            }
        ),
    ]
 
@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
    async with httpx.AsyncClient(base_url="https://tickets.internal") as client:
        if name == "search_tickets":
            resp = await client.get("/api/tickets/search", params=arguments)
            return [TextContent(type="text", text=resp.text)]
        elif name == "create_ticket":
            resp = await client.post("/api/tickets", json=arguments)
            return [TextContent(type="text", text=resp.text)]
    raise ValueError(f"Unknown tool: {name}")
 
# Resources — data the AI can read
@app.list_resources()
async def list_resources() -> list[Resource]:
    return [
        Resource(
            uri="tickets://metrics/summary",
            name="Ticket Metrics Summary",
            description="Current ticket counts by status and priority",
            mimeType="application/json"
        )
    ]
 
async def main():
    async with stdio_server() as (read, write):
        await app.run(read, write, app.create_initialization_options())
 
if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Notice the tool descriptions. They are not afterthoughts — they are the primary interface between the LLM and your system. The description for create_ticket explicitly says "only create tickets when the user explicitly asks" because otherwise the model will speculatively create tickets during exploratory conversations. Tool descriptions are prompt engineering. Treat them accordingly.

Part 2: Agents — The Reasoning Layer

An agent is an LLM running in a loop. That's it. The rest is implementation detail.

More precisely: an agent is an LLM that can observe its environment (through tools and context), reason about what to do next, and act (by calling tools or generating output) — repeatedly, until the task is done. The loop is what separates an agent from a one-shot LLM call.

The Agent Loop

Every agent framework — LangGraph, CrewAI, AutoGen, Claude's agent SDK, or your custom code — implements some variation of this loop:

The think-act-observe cycle is not metaphorical. Here's a minimal agent implementation that makes the loop explicit:

import anthropic
import json
 
client = anthropic.Anthropic()
 
def run_agent(task: str, tools: list[dict], tool_executor) -> str:
    """A minimal agent loop. Think → Act → Observe → Repeat."""
    messages = [{"role": "user", "content": task}]
 
    while True:
        # THINK: Let the model reason and decide
        response = client.messages.create(
            model="claude-sonnet-4-6-20250514",
            max_tokens=4096,
            tools=tools,
            messages=messages,
        )
 
        # Check if the model wants to use tools
        tool_calls = [b for b in response.content if b.type == "tool_use"]
 
        if not tool_calls:
            # RESPOND: No tools needed, return the text response
            return "".join(
                b.text for b in response.content if b.type == "text"
            )
 
        # ACT: Execute each tool call
        messages.append({"role": "assistant", "content": response.content})
 
        tool_results = []
        for tool_call in tool_calls:
            # OBSERVE: Get the result
            result = tool_executor(tool_call.name, tool_call.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tool_call.id,
                "content": str(result),
            })
 
        messages.append({"role": "user", "content": tool_results})
        # Loop back to THINK

This is 30 lines. Every agent framework is a variation of this with added features: memory, planning, error recovery, parallel tool execution, guardrails, and observability. But the core is always: think, act, observe, repeat.

What Makes an Agent Different from a Chatbot

A chatbot takes input and produces output — one pass. An agent takes a goal and works toward it across multiple steps. The distinction matters because it changes what you need to design for:

	Chatbot	Agent
Turns	Single turn (or simple multi-turn)	Multiple internal turns per task
Autonomy	None — user drives every step	High — decides its own next action
Failure mode	Wrong answer	Wrong action (potentially destructive)
Cost profile	Predictable (1 LLM call)	Variable (3-50+ LLM calls per task)
When to use	Q&A, content generation, translation	Code changes, research, multi-step workflows

Agent Autonomy Is a Spectrum, Not a Switch

Don't confuse "agent" with "fully autonomous AI." In practice, most production agents operate with human-in-the-loop checkpoints: they plan, ask for confirmation on destructive actions, and present results for review. The loop gives them capability. Guardrails give them safety. You need both.

Agent Architecture Patterns

Not all agents are created equal. Here are the three patterns you'll see in production, from simplest to most complex:

Pattern 1: Single Agent with Tools

One LLM with direct access to tools. Good for well-scoped tasks with clear boundaries.

Agent → Tool A, Tool B, Tool C → Result

Pattern 2: Router Agent

A lightweight agent that classifies the task and routes to a specialized handler. Good when you have distinct task categories with different tool requirements.

Router Agent → [classify] → Code Agent | Search Agent | Data Agent

Pattern 3: Orchestrator with Subagents

A planning agent that breaks work into subtasks and delegates to specialized subagents. Good for complex, multi-faceted tasks. This is where subagents come in — we'll go deep in Part 4.

Orchestrator → [plan] → Subagent A + Subagent B + Subagent C → [synthesize] → Result

Part 3: Skills — The Knowledge Layer

This is the concept most people get wrong. A skill is not a tool. A skill is not an agent. A skill is packaged expertise — a bundle of instructions, domain knowledge, and workflows that teaches an agent how to approach a specific type of task.

The Skill Mental Model

Think of it this way:

A tool gives an agent the ability to do something (call an API, read a file)
A skill gives an agent the knowledge of how to do something well (the right approach, the right sequence, the gotchas to avoid)

A tool is a hammer. A skill is knowing which nail to hit, at what angle, and in what order.

Skills transform raw tool calls into informed, convention-following actions.

What a Skill Looks Like

A skill is typically a directory containing instructions, templates, and optionally scripts. Here's a real-world example — a skill that teaches an agent how to do code reviews:

skills/
└── code-review/
    ├── skill.md          # Instructions and knowledge
    ├── checklist.md      # Review checklist template
    └── examples/
        ├── good-review.md
        └── bad-review.md

The skill.md file is the core — it's the knowledge payload that gets injected into the agent's context:

# Code Review Skill
 
## When to Apply
Use this skill when reviewing pull requests, diffs, or code changes.
 
## Review Process
1. Read the PR description and linked issues first
2. Check the diff size — if >500 lines, suggest splitting
3. Review in this order: architecture → logic → security → style
4. For each issue found, classify as: blocking | suggestion | nit
 
## What to Look For
- Security: SQL injection, XSS, hardcoded secrets, auth bypass
- Logic: off-by-one errors, null handling, race conditions
- Performance: N+1 queries, unbounded loops, missing indexes
- Maintainability: unclear naming, missing error context, magic numbers
 
## What NOT to Do
- Don't nitpick formatting if there's a formatter configured
- Don't suggest refactors unrelated to the PR's purpose
- Don't rubber-stamp — if you can't find issues, look harder
 
## Output Format
Use this template for each finding:
**[BLOCKING/SUGGESTION/NIT]** `file:line` — Description of the issue
and why it matters, with a concrete suggestion for fixing it.

Skills vs. System Prompts

"Wait," you might say, "isn't a skill just a system prompt?" Conceptually, yes — skills are injected as context. But the distinction matters for three reasons:

Portability. A skill is a file (or directory) that any agent can discover and apply. System prompts are hardcoded into a specific agent configuration.
Composability. An agent can apply multiple skills simultaneously. A code review agent might apply both the "code-review" skill and a "security-audit" skill.
Discoverability. Skills can be listed, searched, and selected dynamically. An agent can look at a task, check available skills, and apply the relevant ones — rather than having every skill baked into a monolithic system prompt that grows forever.

Skills vs. MCP Prompts

MCP has a "prompts" primitive that surfaces reusable prompt templates. The difference:

	MCP Prompt	Skill
Scope	Single parameterized template	Full knowledge bundle (instructions + templates + examples)
Invocation	User/host triggers explicitly	Agent applies based on task context
Where it lives	Inside an MCP server	In a skill directory, discoverable by agents
Use case	"Run this specific workflow"	"Apply this expertise to whatever you're doing"

MCP prompts are buttons. Skills are training.

When to Create a Skill

Create a skill when you find yourself writing the same instructions into prompts repeatedly, or when an agent keeps making the same mistakes because it lacks domain knowledge that isn't in its training data.

Good candidates for skills:

Team-specific coding conventions
Deployment procedures and checklists
Code review standards
Incident response playbooks
Data pipeline validation steps
Documentation templates

Bad candidates for skills:

Generic knowledge the model already has (how to write Python, what REST is)
One-off instructions you'll never reuse
Tool configurations — those belong in MCP server definitions

Part 4: Subagents — The Delegation Layer

A subagent is an agent spawned by another agent to handle a specific subtask. The parent agent acts as a supervisor — it decides what to delegate, to whom, and how to synthesize the results.

Why Subagents Exist

The core problem subagents solve is context window management and specialization. A single agent trying to handle a complex task — say, "review this PR, check for security issues, verify test coverage, and update the changelog" — has to hold all of that context simultaneously. As the context grows, the agent's attention degrades. Quality drops.

Subagents solve this by giving each subtask its own context window, its own tool set, and its own focus:

An orchestrator delegates focused subtasks to specialized subagents and synthesizes their results.

Subagent Properties

Subagents differ from regular agents in several critical ways:

Scoped context. Each subagent starts with a clean context window. The parent sends only the information the subagent needs — not the entire conversation history.
Scoped tools. Subagents can have different tool sets than the parent. A security-review subagent gets the SAST scanner; the changelog subagent gets the file editor. This enforces least-privilege structurally.
Bounded autonomy. The subagent works on a specific task and returns a result. It doesn't get to redefine the goal or decide to work on something else.
Parallel execution. Independent subagents can run simultaneously. This is often the biggest practical benefit — a task that takes 60 seconds sequentially takes 20 seconds with three parallel subagents.

Building a Subagent System

Here's a concrete implementation using Claude's tool use to orchestrate subagents:

import anthropic
import asyncio
from dataclasses import dataclass
 
client = anthropic.Anthropic()
 
@dataclass
class SubagentTask:
    name: str
    instruction: str
    tools: list[dict]
    skill_context: str = ""
 
@dataclass
class SubagentResult:
    name: str
    output: str
    success: bool
 
 
async def run_subagent(task: SubagentTask, tool_executor) -> SubagentResult:
    """Run a single subagent with its own context and tools."""
    system = f"You are a specialized agent: {task.name}."
    if task.skill_context:
        system += f"\n\n{task.skill_context}"
 
    messages = [{"role": "user", "content": task.instruction}]
 
    # Subagent runs its own agent loop
    for _ in range(10):  # max iterations
        response = client.messages.create(
            model="claude-sonnet-4-6-20250514",
            max_tokens=4096,
            system=system,
            tools=task.tools,
            messages=messages,
        )
 
        tool_calls = [b for b in response.content if b.type == "tool_use"]
        if not tool_calls:
            text = "".join(b.text for b in response.content if b.type == "text")
            return SubagentResult(name=task.name, output=text, success=True)
 
        messages.append({"role": "assistant", "content": response.content})
        tool_results = []
        for tc in tool_calls:
            result = await tool_executor(tc.name, tc.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tc.id,
                "content": str(result),
            })
        messages.append({"role": "user", "content": tool_results})
 
    return SubagentResult(
        name=task.name,
        output="Max iterations reached",
        success=False,
    )
 
 
async def orchestrate(task: str, subtasks: list[SubagentTask], tool_executor):
    """Run subagents in parallel, then synthesize results."""
 
    # Execute all subagents concurrently
    results = await asyncio.gather(
        *[run_subagent(st, tool_executor) for st in subtasks]
    )
 
    # Synthesize results with the orchestrator
    synthesis_prompt = f"Original task: {task}\n\n"
    for r in results:
        status = "completed" if r.success else "FAILED"
        synthesis_prompt += f"## {r.name} ({status})\n{r.output}\n\n"
    synthesis_prompt += (
        "Synthesize these results into a coherent response. "
        "Flag any failures or conflicts between subagent findings."
    )
 
    response = client.messages.create(
        model="claude-sonnet-4-6-20250514",
        max_tokens=4096,
        messages=[{"role": "user", "content": synthesis_prompt}],
    )
    return response.content[0].text

Usage:

subtasks = [
    SubagentTask(
        name="Security Reviewer",
        instruction="Review this diff for security vulnerabilities:\n" + diff,
        tools=security_tools,
        skill_context=load_skill("security-audit"),
    ),
    SubagentTask(
        name="Test Analyzer",
        instruction="Check test coverage for the changed files:\n" + changed_files,
        tools=testing_tools,
        skill_context=load_skill("testing-standards"),
    ),
]
 
result = await orchestrate("Review PR #42", subtasks, execute_tool)

Subagent Supervision Patterns

How the parent manages subagents matters as much as the subagents themselves. Three patterns dominate:

Fan-Out / Fan-In — Best for independent subtasks. Run all subagents in parallel, collect results, synthesize. Maximum speed, minimum coordination overhead.

Pipeline — Best when each step depends on the previous one's output. Analysis → transformation → validation. Sequential, but each step gets clean, focused context.

Iterative Refinement — Best for quality-critical tasks. The parent spawns a subagent, reviews the output, and sends it back with feedback until the quality bar is met. Slower, but produces higher-quality results.

Part 5: How They All Work Together

Here's the full picture — a real-world architecture showing all four concepts composing into a production system:

A complete production architecture combining all four building blocks for an end-to-end developer workflow.

Walk through the flow:

User gives a high-level task: "Fix the auth bug in ticket PROJ-123"
Orchestrator agent (with the task-planning skill applied) breaks this into subtasks
Research subagent uses Jira MCP to read the ticket, GitHub MCP to find related code, then reports findings back to the orchestrator
Orchestrator reviews the findings, decides on the fix approach
Coding subagent (with team-conventions skill) reads files via Filesystem MCP, writes the fix
Testing subagent (with testing-standards skill) runs tests via Test Runner MCP
Orchestrator creates a PR via GitHub MCP, updates the ticket via Jira MCP, returns the PR link to the user

Each layer does what it's best at:

MCP handles the connections (standardized, reusable across any agent)
Skills provide the knowledge (team conventions, testing standards)
Subagents provide the focused execution (clean context, scoped tools)
The orchestrator agent provides the reasoning (planning, synthesis, quality control)

Part 6: The Decision Framework

This is the section you'll bookmark. When you're building an AI-powered system and need to decide which building block to use, work through this flowchart:

The decision framework: a flowchart for choosing the right building block for your AI system.

The Decision Matrix

For quick reference, here is every combination and when it applies:

Scenario	MCP	Agent	Skill	Subagent	Example
Expose your DB to any AI tool	X				PostgreSQL MCP server
Answer user questions with tool access		X			Customer support chatbot
Review code following team standards		X	X		PR review with style guide skill
Complex multi-step research task		X		X	"Analyze competitor pricing across 5 sources"
Full development workflow	X	X	X	X	"Fix bug PROJ-123 and open a PR"
Reusable API for AI ecosystem	X				Stripe MCP server for billing ops
Simple one-off API call					Direct function call — no framework needed

The Complexity Ladder

Start at the bottom. Only climb when you have a concrete reason.

Level 5: Orchestrator + Subagents + Skills + MCP    ← Multi-faceted autonomous work
Level 4: Agent + Skills + MCP                        ← Expert single-agent tasks
Level 3: Agent + MCP (or direct tools)               ← Simple agentic tasks
Level 2: MCP Server (no agent)                       ← Standardized tool exposure
Level 1: Direct API call                             ← One tool, one use

The Over-Engineering Trap

The single most common mistake in AI system design is reaching for Level 5 when Level 2 would suffice. Every layer you add increases latency, cost, and failure surface. A subagent architecture for a task that one agent with two tools can handle is not sophisticated — it is waste. Start simple. Add layers only when the simpler approach demonstrably fails.

Part 7: Anti-Patterns — What Not to Do

These are mistakes we see repeatedly in production systems. Each one seems reasonable until you've lived with the consequences.

Anti-Pattern 1: The God Agent

What it looks like: One agent with 40 tools, a 10,000-token system prompt, and responsibility for everything from code review to deployment to Slack notifications.

Why it fails: LLMs degrade with too many tools. At 40+ tools, the model spends more tokens deciding which tool to use than actually solving the problem. Tool selection accuracy drops. Latency increases. Cost explodes.

Fix: Split into a router agent with specialized subagents. Each subagent gets 3-8 tools relevant to its domain.

Anti-Pattern 2: MCP Everything

What it looks like: Building an MCP server for every internal function, including ones that only one agent will ever use.

Why it fails: MCP's value is interoperability — write once, use everywhere. If "everywhere" means "one agent in one application," the MCP abstraction adds complexity without adding value. You're paying the protocol overhead for no benefit.

Fix: Use direct tool definitions for single-agent, single-app integrations. Build MCP servers only when multiple clients need access.

Anti-Pattern 3: Skills as Entire Codebases

What it looks like: A skill that contains 50 pages of documentation, every edge case, every historical decision, and the entire API reference for a framework.

Why it fails: Skills are injected into context. A 50-page skill consumes your context window and dilutes the model's attention. The model can't find the relevant instruction buried in the noise.

Fix: Keep skills focused and concise — under 2,000 tokens. If you need more, split into multiple skills that the agent can selectively apply.

Anti-Pattern 4: Subagents Without Synthesis

What it looks like: An orchestrator that spawns three subagents, collects their outputs, and concatenates them into a response.

Why it fails: Subagent outputs often conflict, overlap, or assume different contexts. Concatenation produces incoherent results. The orchestrator's job is not to collect — it is to synthesize.

Fix: The orchestrator must reason about subagent outputs: resolve conflicts, eliminate redundancy, identify gaps, and produce a unified result. This is the most important step in the orchestration loop.

Anti-Pattern 5: No Error Boundaries

What it looks like: A subagent fails (tool error, hallucination, infinite loop), and the failure cascades to the entire system.

Why it fails: Subagents are semi-autonomous. They will fail. If you don't handle failure at the subagent level, you don't have a system — you have a hope.

Fix: Every subagent should have a max iteration limit, a timeout, and a fallback. The orchestrator should handle subagent failure gracefully — retry, skip, or ask the user for guidance.

Part 8: Production Checklist

Before you ship an agent system, walk through this checklist:

MCP Servers

Tool descriptions are specific, unambiguous, and include negative instructions (what NOT to do)
Input schemas have proper validation with enums, bounds, and required fields
Error messages are descriptive enough for the model to self-correct
Sensitive operations require explicit confirmation (not just the model deciding to call them)
Resources return focused data, not unbounded dumps
Rate limiting is implemented for expensive operations

Agent

Max iteration limit is set (prevent infinite loops)
Cost ceiling is enforced (stop after $X in API calls)
Destructive actions require human approval
Conversation history is pruned to stay within context limits
Agent outputs are logged for debugging and evaluation

Skills

Each skill is under 2,000 tokens
Skills are tested against real tasks (not just reviewed by humans)
Skills include negative instructions (what to avoid)
Skills are versioned and can be updated without redeploying

Subagents

Each subagent has a clear, bounded task description
Each subagent has a scoped tool set (least privilege)
Timeout and max iterations are set per subagent
The orchestrator synthesizes results (not concatenates)
Subagent failures are handled gracefully
Independent subagents run in parallel for performance

Observability

Every tool call is logged with inputs, outputs, and latency
Every LLM call is logged with token counts and cost
Subagent spawning and completion events are tracked
End-to-end task duration and cost are measurable
Error rates are tracked per tool, per subagent, and per task type

The One-Page Reference

Print this. Pin it to your wall. Send it to your team.

┌─────────────────────────────────────────────────────────────┐
│                    AI BUILDING BLOCKS                        │
│                     Quick Reference                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  MCP SERVER                                                 │
│  ┄┄┄┄┄┄┄┄┄┄                                                │
│  What: Standardized connection to external system           │
│  When: Multiple AI clients need the same integration        │
│  Not:  A decision-maker (it's plumbing, not brains)         │
│  Key:  Tool descriptions ARE your interface — write well    │
│                                                             │
│  AGENT                                                      │
│  ┄┄┄┄┄                                                      │
│  What: LLM in a think→act→observe loop                     │
│  When: Task requires planning, reasoning, multi-step work   │
│  Not:  A one-shot API call (that's just an LLM call)        │
│  Key:  Set max iterations + cost limits ALWAYS              │
│                                                             │
│  SKILL                                                      │
│  ┄┄┄┄┄                                                      │
│  What: Reusable bundle of expertise and workflows           │
│  When: Agent needs domain knowledge not in training data    │
│  Not:  A tool (skills teach, tools act)                     │
│  Key:  Keep under 2000 tokens — focused beats exhaustive    │
│                                                             │
│  SUBAGENT                                                   │
│  ┄┄┄┄┄┄┄┄                                                   │
│  What: Specialized agent handling a delegated subtask       │
│  When: Task is multi-faceted with independent subtasks      │
│  Not:  Always necessary (single agent + tools often enough) │
│  Key:  Scope tools per subagent — structural least-privilege│
│                                                             │
├─────────────────────────────────────────────────────────────┤
│  COMPOSITION RULE: Start at the bottom, add layers only     │
│  when the simpler approach demonstrably fails.              │
│                                                             │
│  Direct call → MCP → Agent → Agent+Skill → Subagents       │
│     simple ─────────────────────────────── complex          │
└─────────────────────────────────────────────────────────────┘

The Bigger Picture

MCP, agents, skills, and subagents are not competing approaches. They are layers in a stack, each solving a different problem:

MCP standardizes the connection between AI and the world
Agents provide the reasoning that decides what to do
Skills encode the expertise that guides how to do it well
Subagents enable the delegation that scales complex work

The developers who build great AI systems in 2026 are not the ones who use the most sophisticated architecture. They are the ones who use the right architecture — the simplest one that solves their actual problem. Start with a direct tool call. Add MCP when you need interoperability. Add an agent when you need reasoning. Add skills when you need expertise. Add subagents when you need delegation.

And when someone tells you they need an "agentic MCP-powered multi-subagent system with dynamic skill loading" for a feature that sends Slack notifications — send them this article.

Sources & Further Reading

The Four Building Blocks: A 60-Second Overview

Before we go deep, here is the entire mental model in one diagram:

Here's the one-liner for each:

Concept	What It Is	Analogy
MCP Server	A standardized interface that connects AI to external tools and data	A USB-C port on a device
Agent	An LLM that can reason, plan, and take actions in a loop	A senior engineer working on a task
Skill	A reusable bundle of instructions, knowledge, and workflows	A playbook or runbook
Subagent	A specialized agent that a parent agent delegates tasks to	A team member with a specific expertise

Part 1: MCP — The Universal Connection Layer

Already read our MCP protocol guide?

What MCP Actually Is

An MCP server exposes three types of things:

Tools — Functions the AI can call (query a database, create a PR, send a message)
Resources — Data the AI can read (files, records, live system state)
Prompts — Reusable prompt templates for common workflows

The MCP Ecosystem Today

When to Build an MCP Server

Build an MCP server when:

Multiple AI tools (Claude, Copilot, Cursor) need access to the same system
You want a clean, standardized interface that survives model changes
You are exposing internal tools to your organization's AI stack
You want to publish a public integration others can use

Don't build an MCP server when:

You have a single agent talking to a single API — use a direct tool definition
The "tool" is just an LLM prompt — use a skill instead
You need complex, multi-step reasoning — that's an agent's job, not a server's

MCP Server Anatomy: What a Real One Looks Like

Here's a production-grade MCP server pattern for an internal ticket system — not a weather demo, but the kind of thing you'd actually build at work:

from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent, Resource
import httpx
 
app = Server("ticket-system")
 
# Tools — actions the AI can take
@app.list_tools()
async def list_tools() -> list[Tool]:
    return [
        Tool(
            name="search_tickets",
            description=(
                "Search support tickets by status, assignee, or keyword. "
                "Returns ticket ID, title, status, and assignee. "
                "Use this when users ask about open issues, bug reports, "
                "or the status of specific features."
            ),
            inputSchema={
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"},
                    "status": {
                        "type": "string",
                        "enum": ["open", "in_progress", "resolved", "closed"],
                        "description": "Filter by ticket status"
                    },
                    "limit": {
                        "type": "integer",
                        "default": 10,
                        "description": "Max results to return"
                    }
                },
                "required": ["query"]
            }
        ),
        Tool(
            name="create_ticket",
            description=(
                "Create a new support ticket. Requires title and description. "
                "Priority defaults to 'medium'. Only create tickets when the "
                "user explicitly asks — never speculatively."
            ),
            inputSchema={
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "description": {"type": "string"},
                    "priority": {
                        "type": "string",
                        "enum": ["low", "medium", "high", "critical"],
                        "default": "medium"
                    }
                },
                "required": ["title", "description"]
            }
        ),
    ]
 
@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
    async with httpx.AsyncClient(base_url="https://tickets.internal") as client:
        if name == "search_tickets":
            resp = await client.get("/api/tickets/search", params=arguments)
            return [TextContent(type="text", text=resp.text)]
        elif name == "create_ticket":
            resp = await client.post("/api/tickets", json=arguments)
            return [TextContent(type="text", text=resp.text)]
    raise ValueError(f"Unknown tool: {name}")
 
# Resources — data the AI can read
@app.list_resources()
async def list_resources() -> list[Resource]:
    return [
        Resource(
            uri="tickets://metrics/summary",
            name="Ticket Metrics Summary",
            description="Current ticket counts by status and priority",
            mimeType="application/json"
        )
    ]
 
async def main():
    async with stdio_server() as (read, write):
        await app.run(read, write, app.create_initialization_options())
 
if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

Part 2: Agents — The Reasoning Layer

An agent is an LLM running in a loop. That's it. The rest is implementation detail.

The Agent Loop

Every agent framework — LangGraph, CrewAI, AutoGen, Claude's agent SDK, or your custom code — implements some variation of this loop:

The think-act-observe cycle is not metaphorical. Here's a minimal agent implementation that makes the loop explicit:

import anthropic
import json
 
client = anthropic.Anthropic()
 
def run_agent(task: str, tools: list[dict], tool_executor) -> str:
    """A minimal agent loop. Think → Act → Observe → Repeat."""
    messages = [{"role": "user", "content": task}]
 
    while True:
        # THINK: Let the model reason and decide
        response = client.messages.create(
            model="claude-sonnet-4-6-20250514",
            max_tokens=4096,
            tools=tools,
            messages=messages,
        )
 
        # Check if the model wants to use tools
        tool_calls = [b for b in response.content if b.type == "tool_use"]
 
        if not tool_calls:
            # RESPOND: No tools needed, return the text response
            return "".join(
                b.text for b in response.content if b.type == "text"
            )
 
        # ACT: Execute each tool call
        messages.append({"role": "assistant", "content": response.content})
 
        tool_results = []
        for tool_call in tool_calls:
            # OBSERVE: Get the result
            result = tool_executor(tool_call.name, tool_call.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tool_call.id,
                "content": str(result),
            })
 
        messages.append({"role": "user", "content": tool_results})
        # Loop back to THINK

What Makes an Agent Different from a Chatbot

A chatbot takes input and produces output — one pass. An agent takes a goal and works toward it across multiple steps. The distinction matters because it changes what you need to design for:

	Chatbot	Agent
Turns	Single turn (or simple multi-turn)	Multiple internal turns per task
Autonomy	None — user drives every step	High — decides its own next action
Failure mode	Wrong answer	Wrong action (potentially destructive)
Cost profile	Predictable (1 LLM call)	Variable (3-50+ LLM calls per task)
When to use	Q&A, content generation, translation	Code changes, research, multi-step workflows

Agent Autonomy Is a Spectrum, Not a Switch

Agent Architecture Patterns

Not all agents are created equal. Here are the three patterns you'll see in production, from simplest to most complex:

Pattern 1: Single Agent with Tools

One LLM with direct access to tools. Good for well-scoped tasks with clear boundaries.

Agent → Tool A, Tool B, Tool C → Result

Pattern 2: Router Agent

A lightweight agent that classifies the task and routes to a specialized handler. Good when you have distinct task categories with different tool requirements.

Router Agent → [classify] → Code Agent | Search Agent | Data Agent

Pattern 3: Orchestrator with Subagents

A planning agent that breaks work into subtasks and delegates to specialized subagents. Good for complex, multi-faceted tasks. This is where subagents come in — we'll go deep in Part 4.

Orchestrator → [plan] → Subagent A + Subagent B + Subagent C → [synthesize] → Result

Part 3: Skills — The Knowledge Layer

The Skill Mental Model

Think of it this way:

A tool gives an agent the ability to do something (call an API, read a file)
A skill gives an agent the knowledge of how to do something well (the right approach, the right sequence, the gotchas to avoid)

A tool is a hammer. A skill is knowing which nail to hit, at what angle, and in what order.

What a Skill Looks Like

A skill is typically a directory containing instructions, templates, and optionally scripts. Here's a real-world example — a skill that teaches an agent how to do code reviews:

skills/
└── code-review/
    ├── skill.md          # Instructions and knowledge
    ├── checklist.md      # Review checklist template
    └── examples/
        ├── good-review.md
        └── bad-review.md

The skill.md file is the core — it's the knowledge payload that gets injected into the agent's context:

# Code Review Skill
 
## When to Apply
Use this skill when reviewing pull requests, diffs, or code changes.
 
## Review Process
1. Read the PR description and linked issues first
2. Check the diff size — if >500 lines, suggest splitting
3. Review in this order: architecture → logic → security → style
4. For each issue found, classify as: blocking | suggestion | nit
 
## What to Look For
- Security: SQL injection, XSS, hardcoded secrets, auth bypass
- Logic: off-by-one errors, null handling, race conditions
- Performance: N+1 queries, unbounded loops, missing indexes
- Maintainability: unclear naming, missing error context, magic numbers
 
## What NOT to Do
- Don't nitpick formatting if there's a formatter configured
- Don't suggest refactors unrelated to the PR's purpose
- Don't rubber-stamp — if you can't find issues, look harder
 
## Output Format
Use this template for each finding:
**[BLOCKING/SUGGESTION/NIT]** `file:line` — Description of the issue
and why it matters, with a concrete suggestion for fixing it.

Skills vs. System Prompts

"Wait," you might say, "isn't a skill just a system prompt?" Conceptually, yes — skills are injected as context. But the distinction matters for three reasons:

Portability. A skill is a file (or directory) that any agent can discover and apply. System prompts are hardcoded into a specific agent configuration.
Composability. An agent can apply multiple skills simultaneously. A code review agent might apply both the "code-review" skill and a "security-audit" skill.
Discoverability. Skills can be listed, searched, and selected dynamically. An agent can look at a task, check available skills, and apply the relevant ones — rather than having every skill baked into a monolithic system prompt that grows forever.

Skills vs. MCP Prompts

MCP has a "prompts" primitive that surfaces reusable prompt templates. The difference:

	MCP Prompt	Skill
Scope	Single parameterized template	Full knowledge bundle (instructions + templates + examples)
Invocation	User/host triggers explicitly	Agent applies based on task context
Where it lives	Inside an MCP server	In a skill directory, discoverable by agents
Use case	"Run this specific workflow"	"Apply this expertise to whatever you're doing"

MCP prompts are buttons. Skills are training.

When to Create a Skill

Good candidates for skills:

Team-specific coding conventions
Deployment procedures and checklists
Code review standards
Incident response playbooks
Data pipeline validation steps
Documentation templates

Bad candidates for skills:

Generic knowledge the model already has (how to write Python, what REST is)
One-off instructions you'll never reuse
Tool configurations — those belong in MCP server definitions

Part 4: Subagents — The Delegation Layer

A subagent is an agent spawned by another agent to handle a specific subtask. The parent agent acts as a supervisor — it decides what to delegate, to whom, and how to synthesize the results.

Why Subagents Exist

Subagents solve this by giving each subtask its own context window, its own tool set, and its own focus:

Subagent Properties

Subagents differ from regular agents in several critical ways:

Scoped context. Each subagent starts with a clean context window. The parent sends only the information the subagent needs — not the entire conversation history.
Scoped tools. Subagents can have different tool sets than the parent. A security-review subagent gets the SAST scanner; the changelog subagent gets the file editor. This enforces least-privilege structurally.
Bounded autonomy. The subagent works on a specific task and returns a result. It doesn't get to redefine the goal or decide to work on something else.
Parallel execution. Independent subagents can run simultaneously. This is often the biggest practical benefit — a task that takes 60 seconds sequentially takes 20 seconds with three parallel subagents.

Building a Subagent System

Here's a concrete implementation using Claude's tool use to orchestrate subagents:

import anthropic
import asyncio
from dataclasses import dataclass
 
client = anthropic.Anthropic()
 
@dataclass
class SubagentTask:
    name: str
    instruction: str
    tools: list[dict]
    skill_context: str = ""
 
@dataclass
class SubagentResult:
    name: str
    output: str
    success: bool
 
 
async def run_subagent(task: SubagentTask, tool_executor) -> SubagentResult:
    """Run a single subagent with its own context and tools."""
    system = f"You are a specialized agent: {task.name}."
    if task.skill_context:
        system += f"\n\n{task.skill_context}"
 
    messages = [{"role": "user", "content": task.instruction}]
 
    # Subagent runs its own agent loop
    for _ in range(10):  # max iterations
        response = client.messages.create(
            model="claude-sonnet-4-6-20250514",
            max_tokens=4096,
            system=system,
            tools=task.tools,
            messages=messages,
        )
 
        tool_calls = [b for b in response.content if b.type == "tool_use"]
        if not tool_calls:
            text = "".join(b.text for b in response.content if b.type == "text")
            return SubagentResult(name=task.name, output=text, success=True)
 
        messages.append({"role": "assistant", "content": response.content})
        tool_results = []
        for tc in tool_calls:
            result = await tool_executor(tc.name, tc.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tc.id,
                "content": str(result),
            })
        messages.append({"role": "user", "content": tool_results})
 
    return SubagentResult(
        name=task.name,
        output="Max iterations reached",
        success=False,
    )
 
 
async def orchestrate(task: str, subtasks: list[SubagentTask], tool_executor):
    """Run subagents in parallel, then synthesize results."""
 
    # Execute all subagents concurrently
    results = await asyncio.gather(
        *[run_subagent(st, tool_executor) for st in subtasks]
    )
 
    # Synthesize results with the orchestrator
    synthesis_prompt = f"Original task: {task}\n\n"
    for r in results:
        status = "completed" if r.success else "FAILED"
        synthesis_prompt += f"## {r.name} ({status})\n{r.output}\n\n"
    synthesis_prompt += (
        "Synthesize these results into a coherent response. "
        "Flag any failures or conflicts between subagent findings."
    )
 
    response = client.messages.create(
        model="claude-sonnet-4-6-20250514",
        max_tokens=4096,
        messages=[{"role": "user", "content": synthesis_prompt}],
    )
    return response.content[0].text

Usage:

subtasks = [
    SubagentTask(
        name="Security Reviewer",
        instruction="Review this diff for security vulnerabilities:\n" + diff,
        tools=security_tools,
        skill_context=load_skill("security-audit"),
    ),
    SubagentTask(
        name="Test Analyzer",
        instruction="Check test coverage for the changed files:\n" + changed_files,
        tools=testing_tools,
        skill_context=load_skill("testing-standards"),
    ),
]
 
result = await orchestrate("Review PR #42", subtasks, execute_tool)

Subagent Supervision Patterns

How the parent manages subagents matters as much as the subagents themselves. Three patterns dominate:

Fan-Out / Fan-In — Best for independent subtasks. Run all subagents in parallel, collect results, synthesize. Maximum speed, minimum coordination overhead.

Pipeline — Best when each step depends on the previous one's output. Analysis → transformation → validation. Sequential, but each step gets clean, focused context.

Part 5: How They All Work Together

Here's the full picture — a real-world architecture showing all four concepts composing into a production system:

Walk through the flow:

User gives a high-level task: "Fix the auth bug in ticket PROJ-123"
Orchestrator agent (with the task-planning skill applied) breaks this into subtasks
Research subagent uses Jira MCP to read the ticket, GitHub MCP to find related code, then reports findings back to the orchestrator
Orchestrator reviews the findings, decides on the fix approach
Coding subagent (with team-conventions skill) reads files via Filesystem MCP, writes the fix
Testing subagent (with testing-standards skill) runs tests via Test Runner MCP
Orchestrator creates a PR via GitHub MCP, updates the ticket via Jira MCP, returns the PR link to the user

Each layer does what it's best at:

MCP handles the connections (standardized, reusable across any agent)
Skills provide the knowledge (team conventions, testing standards)
Subagents provide the focused execution (clean context, scoped tools)
The orchestrator agent provides the reasoning (planning, synthesis, quality control)

Part 6: The Decision Framework

This is the section you'll bookmark. When you're building an AI-powered system and need to decide which building block to use, work through this flowchart:

The Decision Matrix

For quick reference, here is every combination and when it applies:

Scenario	MCP	Agent	Skill	Subagent	Example
Expose your DB to any AI tool	X				PostgreSQL MCP server
Answer user questions with tool access		X			Customer support chatbot
Review code following team standards		X	X		PR review with style guide skill
Complex multi-step research task		X		X	"Analyze competitor pricing across 5 sources"
Full development workflow	X	X	X	X	"Fix bug PROJ-123 and open a PR"
Reusable API for AI ecosystem	X				Stripe MCP server for billing ops
Simple one-off API call					Direct function call — no framework needed

The Complexity Ladder

Start at the bottom. Only climb when you have a concrete reason.

Level 5: Orchestrator + Subagents + Skills + MCP    ← Multi-faceted autonomous work
Level 4: Agent + Skills + MCP                        ← Expert single-agent tasks
Level 3: Agent + MCP (or direct tools)               ← Simple agentic tasks
Level 2: MCP Server (no agent)                       ← Standardized tool exposure
Level 1: Direct API call                             ← One tool, one use

The Over-Engineering Trap

Part 7: Anti-Patterns — What Not to Do

These are mistakes we see repeatedly in production systems. Each one seems reasonable until you've lived with the consequences.

Anti-Pattern 1: The God Agent

What it looks like: One agent with 40 tools, a 10,000-token system prompt, and responsibility for everything from code review to deployment to Slack notifications.

Fix: Split into a router agent with specialized subagents. Each subagent gets 3-8 tools relevant to its domain.

Anti-Pattern 2: MCP Everything

What it looks like: Building an MCP server for every internal function, including ones that only one agent will ever use.

Fix: Use direct tool definitions for single-agent, single-app integrations. Build MCP servers only when multiple clients need access.

Anti-Pattern 3: Skills as Entire Codebases

What it looks like: A skill that contains 50 pages of documentation, every edge case, every historical decision, and the entire API reference for a framework.

Why it fails: Skills are injected into context. A 50-page skill consumes your context window and dilutes the model's attention. The model can't find the relevant instruction buried in the noise.

Fix: Keep skills focused and concise — under 2,000 tokens. If you need more, split into multiple skills that the agent can selectively apply.

Anti-Pattern 4: Subagents Without Synthesis

What it looks like: An orchestrator that spawns three subagents, collects their outputs, and concatenates them into a response.

Anti-Pattern 5: No Error Boundaries

What it looks like: A subagent fails (tool error, hallucination, infinite loop), and the failure cascades to the entire system.

Why it fails: Subagents are semi-autonomous. They will fail. If you don't handle failure at the subagent level, you don't have a system — you have a hope.

Fix: Every subagent should have a max iteration limit, a timeout, and a fallback. The orchestrator should handle subagent failure gracefully — retry, skip, or ask the user for guidance.

Part 8: Production Checklist

Before you ship an agent system, walk through this checklist:

MCP Servers

Tool descriptions are specific, unambiguous, and include negative instructions (what NOT to do)
Input schemas have proper validation with enums, bounds, and required fields
Error messages are descriptive enough for the model to self-correct
Sensitive operations require explicit confirmation (not just the model deciding to call them)
Resources return focused data, not unbounded dumps
Rate limiting is implemented for expensive operations

Agent

Max iteration limit is set (prevent infinite loops)
Cost ceiling is enforced (stop after $X in API calls)
Destructive actions require human approval
Conversation history is pruned to stay within context limits
Agent outputs are logged for debugging and evaluation

Skills

Each skill is under 2,000 tokens
Skills are tested against real tasks (not just reviewed by humans)
Skills include negative instructions (what to avoid)
Skills are versioned and can be updated without redeploying

Subagents

Each subagent has a clear, bounded task description
Each subagent has a scoped tool set (least privilege)
Timeout and max iterations are set per subagent
The orchestrator synthesizes results (not concatenates)
Subagent failures are handled gracefully
Independent subagents run in parallel for performance

Observability

Every tool call is logged with inputs, outputs, and latency
Every LLM call is logged with token counts and cost
Subagent spawning and completion events are tracked
End-to-end task duration and cost are measurable
Error rates are tracked per tool, per subagent, and per task type

The One-Page Reference

Print this. Pin it to your wall. Send it to your team.

┌─────────────────────────────────────────────────────────────┐
│                    AI BUILDING BLOCKS                        │
│                     Quick Reference                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  MCP SERVER                                                 │
│  ┄┄┄┄┄┄┄┄┄┄                                                │
│  What: Standardized connection to external system           │
│  When: Multiple AI clients need the same integration        │
│  Not:  A decision-maker (it's plumbing, not brains)         │
│  Key:  Tool descriptions ARE your interface — write well    │
│                                                             │
│  AGENT                                                      │
│  ┄┄┄┄┄                                                      │
│  What: LLM in a think→act→observe loop                     │
│  When: Task requires planning, reasoning, multi-step work   │
│  Not:  A one-shot API call (that's just an LLM call)        │
│  Key:  Set max iterations + cost limits ALWAYS              │
│                                                             │
│  SKILL                                                      │
│  ┄┄┄┄┄                                                      │
│  What: Reusable bundle of expertise and workflows           │
│  When: Agent needs domain knowledge not in training data    │
│  Not:  A tool (skills teach, tools act)                     │
│  Key:  Keep under 2000 tokens — focused beats exhaustive    │
│                                                             │
│  SUBAGENT                                                   │
│  ┄┄┄┄┄┄┄┄                                                   │
│  What: Specialized agent handling a delegated subtask       │
│  When: Task is multi-faceted with independent subtasks      │
│  Not:  Always necessary (single agent + tools often enough) │
│  Key:  Scope tools per subagent — structural least-privilege│
│                                                             │
├─────────────────────────────────────────────────────────────┤
│  COMPOSITION RULE: Start at the bottom, add layers only     │
│  when the simpler approach demonstrably fails.              │
│                                                             │
│  Direct call → MCP → Agent → Agent+Skill → Subagents       │
│     simple ─────────────────────────────── complex          │
└─────────────────────────────────────────────────────────────┘

The Bigger Picture

MCP, agents, skills, and subagents are not competing approaches. They are layers in a stack, each solving a different problem:

MCP standardizes the connection between AI and the world
Agents provide the reasoning that decides what to do
Skills encode the expertise that guides how to do it well
Subagents enable the delegation that scales complex work

And when someone tells you they need an "agentic MCP-powered multi-subagent system with dynamic skill loading" for a feature that sends Slack notifications — send them this article.

Related Posts

MCP: The Developer's Guide to the Protocol Quietly Rewiring AI Applications

AI Agents Keep Dying in Production. The Fix Was Invented in 1986.

How to Give Claude Full Control of Your WordPress Site Using MCP

Comments

Leave a comment

Related Posts

MCP: The Developer's Guide to the Protocol Quietly Rewiring AI Applications

AI Agents Keep Dying in Production. The Fix Was Invented in 1986.

How to Give Claude Full Control of Your WordPress Site Using MCP

Comments

Leave a comment