MCP, Agents, Skills, Subagents: The Definitive Guide to AI's New Building Blocks
Everyone's building with agents, MCP servers, skills, and subagents. Almost nobody can explain when to use which. This is the guide that fixes that — with architecture diagrams, production code, and a decision framework you can apply today.
You've heard the terms. You've seen the Twitter threads. You've watched three conference talks and read five blog posts, and you're still not sure when to use an MCP server versus a skill versus a subagent versus just giving your agent a tool.
You are not alone. The AI tooling ecosystem in 2026 has a terminology problem. Four concepts — MCP, Agents, Skills, and Subagents — are used interchangeably by people who should know better. They are not interchangeable. They solve different problems, operate at different layers, and compose in specific ways. Confusing them leads to architectures that are either painfully over-engineered or dangerously under-specified.
This guide is the fix. By the end, you will have a precise mental model for each concept, know exactly when to reach for which, and have production-ready patterns for combining them. No hand-waving. No "it depends." Concrete architecture, concrete code, concrete decisions.
The Four Building Blocks: A 60-Second Overview
Before we go deep, here is the entire mental model in one diagram:
Here's the one-liner for each:
| Concept | What It Is | Analogy |
|---|---|---|
| MCP Server | A standardized interface that connects AI to external tools and data | A USB-C port on a device |
| Agent | An LLM that can reason, plan, and take actions in a loop | A senior engineer working on a task |
| Skill | A reusable bundle of instructions, knowledge, and workflows | A playbook or runbook |
| Subagent | A specialized agent that a parent agent delegates tasks to | A team member with a specific expertise |
If that table is enough for you, great. But the devil is in how these compose, when each is the right choice, and what happens when you pick the wrong one. That's what the rest of this guide is about.
Part 1: MCP — The Universal Connection Layer
Already read our MCP protocol guide?
This section covers MCP from an architectural perspective — how it fits into the agent ecosystem. For the protocol specification, JSON-RPC internals, and building your first server, see our full MCP developer guide.
What MCP Actually Is
Model Context Protocol is a connection standard. It does not reason. It does not plan. It does not decide anything. It is pure plumbing — an open protocol (JSON-RPC 2.0 over stdio or HTTP) that lets any AI application talk to any external system through a unified interface.
An MCP server exposes three types of things:
- Tools — Functions the AI can call (query a database, create a PR, send a message)
- Resources — Data the AI can read (files, records, live system state)
- Prompts — Reusable prompt templates for common workflows
The critical insight: MCP servers are dumb. They don't know what the AI is trying to accomplish. They don't make decisions. They execute what they're told and return results. All intelligence lives in the agent that calls them.
The MCP Ecosystem Today
The numbers tell the story: 97 million monthly SDK downloads, 13,000+ public servers on GitHub, and first-party support in Claude, ChatGPT, VS Code Copilot, Cursor, Windsurf, and Zed. MCP is not experimental. It is infrastructure.
When to Build an MCP Server
Build an MCP server when you want any AI application to be able to interact with your system. The key word is "any." If you only need one specific agent to call one specific API, a direct tool call is simpler. MCP's value is the write-once-use-everywhere guarantee.
Build an MCP server when:
- Multiple AI tools (Claude, Copilot, Cursor) need access to the same system
- You want a clean, standardized interface that survives model changes
- You are exposing internal tools to your organization's AI stack
- You want to publish a public integration others can use
Don't build an MCP server when:
- You have a single agent talking to a single API — use a direct tool definition
- The "tool" is just an LLM prompt — use a skill instead
- You need complex, multi-step reasoning — that's an agent's job, not a server's
MCP Server Anatomy: What a Real One Looks Like
Here's a production-grade MCP server pattern for an internal ticket system — not a weather demo, but the kind of thing you'd actually build at work:
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp.types import Tool, TextContent, Resource
import httpx
app = Server("ticket-system")
# Tools — actions the AI can take
@app.list_tools()
async def list_tools() -> list[Tool]:
return [
Tool(
name="search_tickets",
description=(
"Search support tickets by status, assignee, or keyword. "
"Returns ticket ID, title, status, and assignee. "
"Use this when users ask about open issues, bug reports, "
"or the status of specific features."
),
inputSchema={
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"},
"status": {
"type": "string",
"enum": ["open", "in_progress", "resolved", "closed"],
"description": "Filter by ticket status"
},
"limit": {
"type": "integer",
"default": 10,
"description": "Max results to return"
}
},
"required": ["query"]
}
),
Tool(
name="create_ticket",
description=(
"Create a new support ticket. Requires title and description. "
"Priority defaults to 'medium'. Only create tickets when the "
"user explicitly asks — never speculatively."
),
inputSchema={
"type": "object",
"properties": {
"title": {"type": "string"},
"description": {"type": "string"},
"priority": {
"type": "string",
"enum": ["low", "medium", "high", "critical"],
"default": "medium"
}
},
"required": ["title", "description"]
}
),
]
@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
async with httpx.AsyncClient(base_url="https://tickets.internal") as client:
if name == "search_tickets":
resp = await client.get("/api/tickets/search", params=arguments)
return [TextContent(type="text", text=resp.text)]
elif name == "create_ticket":
resp = await client.post("/api/tickets", json=arguments)
return [TextContent(type="text", text=resp.text)]
raise ValueError(f"Unknown tool: {name}")
# Resources — data the AI can read
@app.list_resources()
async def list_resources() -> list[Resource]:
return [
Resource(
uri="tickets://metrics/summary",
name="Ticket Metrics Summary",
description="Current ticket counts by status and priority",
mimeType="application/json"
)
]
async def main():
async with stdio_server() as (read, write):
await app.run(read, write, app.create_initialization_options())
if __name__ == "__main__":
import asyncio
asyncio.run(main())Notice the tool descriptions. They are not afterthoughts — they are the primary interface between the LLM and your system. The description for create_ticket explicitly says "only create tickets when the user explicitly asks" because otherwise the model will speculatively create tickets during exploratory conversations. Tool descriptions are prompt engineering. Treat them accordingly.
Part 2: Agents — The Reasoning Layer
An agent is an LLM running in a loop. That's it. The rest is implementation detail.
More precisely: an agent is an LLM that can observe its environment (through tools and context), reason about what to do next, and act (by calling tools or generating output) — repeatedly, until the task is done. The loop is what separates an agent from a one-shot LLM call.
The Agent Loop
Every agent framework — LangGraph, CrewAI, AutoGen, Claude's agent SDK, or your custom code — implements some variation of this loop:
The think-act-observe cycle is not metaphorical. Here's a minimal agent implementation that makes the loop explicit:
import anthropic
import json
client = anthropic.Anthropic()
def run_agent(task: str, tools: list[dict], tool_executor) -> str:
"""A minimal agent loop. Think → Act → Observe → Repeat."""
messages = [{"role": "user", "content": task}]
while True:
# THINK: Let the model reason and decide
response = client.messages.create(
model="claude-sonnet-4-6-20250514",
max_tokens=4096,
tools=tools,
messages=messages,
)
# Check if the model wants to use tools
tool_calls = [b for b in response.content if b.type == "tool_use"]
if not tool_calls:
# RESPOND: No tools needed, return the text response
return "".join(
b.text for b in response.content if b.type == "text"
)
# ACT: Execute each tool call
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for tool_call in tool_calls:
# OBSERVE: Get the result
result = tool_executor(tool_call.name, tool_call.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": tool_call.id,
"content": str(result),
})
messages.append({"role": "user", "content": tool_results})
# Loop back to THINKThis is 30 lines. Every agent framework is a variation of this with added features: memory, planning, error recovery, parallel tool execution, guardrails, and observability. But the core is always: think, act, observe, repeat.
What Makes an Agent Different from a Chatbot
A chatbot takes input and produces output — one pass. An agent takes a goal and works toward it across multiple steps. The distinction matters because it changes what you need to design for:
| Chatbot | Agent | |
|---|---|---|
| Turns | Single turn (or simple multi-turn) | Multiple internal turns per task |
| Autonomy | None — user drives every step | High — decides its own next action |
| Failure mode | Wrong answer | Wrong action (potentially destructive) |
| Cost profile | Predictable (1 LLM call) | Variable (3-50+ LLM calls per task) |
| When to use | Q&A, content generation, translation | Code changes, research, multi-step workflows |
Agent Autonomy Is a Spectrum, Not a Switch
Don't confuse "agent" with "fully autonomous AI." In practice, most production agents operate with human-in-the-loop checkpoints: they plan, ask for confirmation on destructive actions, and present results for review. The loop gives them capability. Guardrails give them safety. You need both.
Agent Architecture Patterns
Not all agents are created equal. Here are the three patterns you'll see in production, from simplest to most complex:
Pattern 1: Single Agent with Tools
One LLM with direct access to tools. Good for well-scoped tasks with clear boundaries.
Agent → Tool A, Tool B, Tool C → Result
Pattern 2: Router Agent
A lightweight agent that classifies the task and routes to a specialized handler. Good when you have distinct task categories with different tool requirements.
Router Agent → [classify] → Code Agent | Search Agent | Data Agent
Pattern 3: Orchestrator with Subagents
A planning agent that breaks work into subtasks and delegates to specialized subagents. Good for complex, multi-faceted tasks. This is where subagents come in — we'll go deep in Part 4.
Orchestrator → [plan] → Subagent A + Subagent B + Subagent C → [synthesize] → Result
Part 3: Skills — The Knowledge Layer
This is the concept most people get wrong. A skill is not a tool. A skill is not an agent. A skill is packaged expertise — a bundle of instructions, domain knowledge, and workflows that teaches an agent how to approach a specific type of task.
The Skill Mental Model
Think of it this way:
- A tool gives an agent the ability to do something (call an API, read a file)
- A skill gives an agent the knowledge of how to do something well (the right approach, the right sequence, the gotchas to avoid)
A tool is a hammer. A skill is knowing which nail to hit, at what angle, and in what order.
What a Skill Looks Like
A skill is typically a directory containing instructions, templates, and optionally scripts. Here's a real-world example — a skill that teaches an agent how to do code reviews:
skills/
└── code-review/
├── skill.md # Instructions and knowledge
├── checklist.md # Review checklist template
└── examples/
├── good-review.md
└── bad-review.md
The skill.md file is the core — it's the knowledge payload that gets injected into the agent's context:
# Code Review Skill
## When to Apply
Use this skill when reviewing pull requests, diffs, or code changes.
## Review Process
1. Read the PR description and linked issues first
2. Check the diff size — if >500 lines, suggest splitting
3. Review in this order: architecture → logic → security → style
4. For each issue found, classify as: blocking | suggestion | nit
## What to Look For
- Security: SQL injection, XSS, hardcoded secrets, auth bypass
- Logic: off-by-one errors, null handling, race conditions
- Performance: N+1 queries, unbounded loops, missing indexes
- Maintainability: unclear naming, missing error context, magic numbers
## What NOT to Do
- Don't nitpick formatting if there's a formatter configured
- Don't suggest refactors unrelated to the PR's purpose
- Don't rubber-stamp — if you can't find issues, look harder
## Output Format
Use this template for each finding:
**[BLOCKING/SUGGESTION/NIT]** `file:line` — Description of the issue
and why it matters, with a concrete suggestion for fixing it.Skills vs. System Prompts
"Wait," you might say, "isn't a skill just a system prompt?" Conceptually, yes — skills are injected as context. But the distinction matters for three reasons:
-
Portability. A skill is a file (or directory) that any agent can discover and apply. System prompts are hardcoded into a specific agent configuration.
-
Composability. An agent can apply multiple skills simultaneously. A code review agent might apply both the "code-review" skill and a "security-audit" skill.
-
Discoverability. Skills can be listed, searched, and selected dynamically. An agent can look at a task, check available skills, and apply the relevant ones — rather than having every skill baked into a monolithic system prompt that grows forever.
Skills vs. MCP Prompts
MCP has a "prompts" primitive that surfaces reusable prompt templates. The difference:
| MCP Prompt | Skill | |
|---|---|---|
| Scope | Single parameterized template | Full knowledge bundle (instructions + templates + examples) |
| Invocation | User/host triggers explicitly | Agent applies based on task context |
| Where it lives | Inside an MCP server | In a skill directory, discoverable by agents |
| Use case | "Run this specific workflow" | "Apply this expertise to whatever you're doing" |
MCP prompts are buttons. Skills are training.
When to Create a Skill
Create a skill when you find yourself writing the same instructions into prompts repeatedly, or when an agent keeps making the same mistakes because it lacks domain knowledge that isn't in its training data.
Good candidates for skills:
- Team-specific coding conventions
- Deployment procedures and checklists
- Code review standards
- Incident response playbooks
- Data pipeline validation steps
- Documentation templates
Bad candidates for skills:
- Generic knowledge the model already has (how to write Python, what REST is)
- One-off instructions you'll never reuse
- Tool configurations — those belong in MCP server definitions
Part 4: Subagents — The Delegation Layer
A subagent is an agent spawned by another agent to handle a specific subtask. The parent agent acts as a supervisor — it decides what to delegate, to whom, and how to synthesize the results.
Why Subagents Exist
The core problem subagents solve is context window management and specialization. A single agent trying to handle a complex task — say, "review this PR, check for security issues, verify test coverage, and update the changelog" — has to hold all of that context simultaneously. As the context grows, the agent's attention degrades. Quality drops.
Subagents solve this by giving each subtask its own context window, its own tool set, and its own focus:
Subagent Properties
Subagents differ from regular agents in several critical ways:
-
Scoped context. Each subagent starts with a clean context window. The parent sends only the information the subagent needs — not the entire conversation history.
-
Scoped tools. Subagents can have different tool sets than the parent. A security-review subagent gets the SAST scanner; the changelog subagent gets the file editor. This enforces least-privilege structurally.
-
Bounded autonomy. The subagent works on a specific task and returns a result. It doesn't get to redefine the goal or decide to work on something else.
-
Parallel execution. Independent subagents can run simultaneously. This is often the biggest practical benefit — a task that takes 60 seconds sequentially takes 20 seconds with three parallel subagents.
Building a Subagent System
Here's a concrete implementation using Claude's tool use to orchestrate subagents:
import anthropic
import asyncio
from dataclasses import dataclass
client = anthropic.Anthropic()
@dataclass
class SubagentTask:
name: str
instruction: str
tools: list[dict]
skill_context: str = ""
@dataclass
class SubagentResult:
name: str
output: str
success: bool
async def run_subagent(task: SubagentTask, tool_executor) -> SubagentResult:
"""Run a single subagent with its own context and tools."""
system = f"You are a specialized agent: {task.name}."
if task.skill_context:
system += f"\n\n{task.skill_context}"
messages = [{"role": "user", "content": task.instruction}]
# Subagent runs its own agent loop
for _ in range(10): # max iterations
response = client.messages.create(
model="claude-sonnet-4-6-20250514",
max_tokens=4096,
system=system,
tools=task.tools,
messages=messages,
)
tool_calls = [b for b in response.content if b.type == "tool_use"]
if not tool_calls:
text = "".join(b.text for b in response.content if b.type == "text")
return SubagentResult(name=task.name, output=text, success=True)
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for tc in tool_calls:
result = await tool_executor(tc.name, tc.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": tc.id,
"content": str(result),
})
messages.append({"role": "user", "content": tool_results})
return SubagentResult(
name=task.name,
output="Max iterations reached",
success=False,
)
async def orchestrate(task: str, subtasks: list[SubagentTask], tool_executor):
"""Run subagents in parallel, then synthesize results."""
# Execute all subagents concurrently
results = await asyncio.gather(
*[run_subagent(st, tool_executor) for st in subtasks]
)
# Synthesize results with the orchestrator
synthesis_prompt = f"Original task: {task}\n\n"
for r in results:
status = "completed" if r.success else "FAILED"
synthesis_prompt += f"## {r.name} ({status})\n{r.output}\n\n"
synthesis_prompt += (
"Synthesize these results into a coherent response. "
"Flag any failures or conflicts between subagent findings."
)
response = client.messages.create(
model="claude-sonnet-4-6-20250514",
max_tokens=4096,
messages=[{"role": "user", "content": synthesis_prompt}],
)
return response.content[0].textUsage:
subtasks = [
SubagentTask(
name="Security Reviewer",
instruction="Review this diff for security vulnerabilities:\n" + diff,
tools=security_tools,
skill_context=load_skill("security-audit"),
),
SubagentTask(
name="Test Analyzer",
instruction="Check test coverage for the changed files:\n" + changed_files,
tools=testing_tools,
skill_context=load_skill("testing-standards"),
),
]
result = await orchestrate("Review PR #42", subtasks, execute_tool)Subagent Supervision Patterns
How the parent manages subagents matters as much as the subagents themselves. Three patterns dominate:
Fan-Out / Fan-In — Best for independent subtasks. Run all subagents in parallel, collect results, synthesize. Maximum speed, minimum coordination overhead.
Pipeline — Best when each step depends on the previous one's output. Analysis → transformation → validation. Sequential, but each step gets clean, focused context.
Iterative Refinement — Best for quality-critical tasks. The parent spawns a subagent, reviews the output, and sends it back with feedback until the quality bar is met. Slower, but produces higher-quality results.
Part 5: How They All Work Together
Here's the full picture — a real-world architecture showing all four concepts composing into a production system:
Walk through the flow:
- User gives a high-level task: "Fix the auth bug in ticket PROJ-123"
- Orchestrator agent (with the task-planning skill applied) breaks this into subtasks
- Research subagent uses Jira MCP to read the ticket, GitHub MCP to find related code, then reports findings back to the orchestrator
- Orchestrator reviews the findings, decides on the fix approach
- Coding subagent (with team-conventions skill) reads files via Filesystem MCP, writes the fix
- Testing subagent (with testing-standards skill) runs tests via Test Runner MCP
- Orchestrator creates a PR via GitHub MCP, updates the ticket via Jira MCP, returns the PR link to the user
Each layer does what it's best at:
- MCP handles the connections (standardized, reusable across any agent)
- Skills provide the knowledge (team conventions, testing standards)
- Subagents provide the focused execution (clean context, scoped tools)
- The orchestrator agent provides the reasoning (planning, synthesis, quality control)
Part 6: The Decision Framework
This is the section you'll bookmark. When you're building an AI-powered system and need to decide which building block to use, work through this flowchart:
The Decision Matrix
For quick reference, here is every combination and when it applies:
| Scenario | MCP | Agent | Skill | Subagent | Example |
|---|---|---|---|---|---|
| Expose your DB to any AI tool | X | PostgreSQL MCP server | |||
| Answer user questions with tool access | X | Customer support chatbot | |||
| Review code following team standards | X | X | PR review with style guide skill | ||
| Complex multi-step research task | X | X | "Analyze competitor pricing across 5 sources" | ||
| Full development workflow | X | X | X | X | "Fix bug PROJ-123 and open a PR" |
| Reusable API for AI ecosystem | X | Stripe MCP server for billing ops | |||
| Simple one-off API call | Direct function call — no framework needed |
The Complexity Ladder
Start at the bottom. Only climb when you have a concrete reason.
Level 5: Orchestrator + Subagents + Skills + MCP ← Multi-faceted autonomous work
Level 4: Agent + Skills + MCP ← Expert single-agent tasks
Level 3: Agent + MCP (or direct tools) ← Simple agentic tasks
Level 2: MCP Server (no agent) ← Standardized tool exposure
Level 1: Direct API call ← One tool, one use
The Over-Engineering Trap
The single most common mistake in AI system design is reaching for Level 5 when Level 2 would suffice. Every layer you add increases latency, cost, and failure surface. A subagent architecture for a task that one agent with two tools can handle is not sophisticated — it is waste. Start simple. Add layers only when the simpler approach demonstrably fails.
Part 7: Anti-Patterns — What Not to Do
These are mistakes we see repeatedly in production systems. Each one seems reasonable until you've lived with the consequences.
Anti-Pattern 1: The God Agent
What it looks like: One agent with 40 tools, a 10,000-token system prompt, and responsibility for everything from code review to deployment to Slack notifications.
Why it fails: LLMs degrade with too many tools. At 40+ tools, the model spends more tokens deciding which tool to use than actually solving the problem. Tool selection accuracy drops. Latency increases. Cost explodes.
Fix: Split into a router agent with specialized subagents. Each subagent gets 3-8 tools relevant to its domain.
Anti-Pattern 2: MCP Everything
What it looks like: Building an MCP server for every internal function, including ones that only one agent will ever use.
Why it fails: MCP's value is interoperability — write once, use everywhere. If "everywhere" means "one agent in one application," the MCP abstraction adds complexity without adding value. You're paying the protocol overhead for no benefit.
Fix: Use direct tool definitions for single-agent, single-app integrations. Build MCP servers only when multiple clients need access.
Anti-Pattern 3: Skills as Entire Codebases
What it looks like: A skill that contains 50 pages of documentation, every edge case, every historical decision, and the entire API reference for a framework.
Why it fails: Skills are injected into context. A 50-page skill consumes your context window and dilutes the model's attention. The model can't find the relevant instruction buried in the noise.
Fix: Keep skills focused and concise — under 2,000 tokens. If you need more, split into multiple skills that the agent can selectively apply.
Anti-Pattern 4: Subagents Without Synthesis
What it looks like: An orchestrator that spawns three subagents, collects their outputs, and concatenates them into a response.
Why it fails: Subagent outputs often conflict, overlap, or assume different contexts. Concatenation produces incoherent results. The orchestrator's job is not to collect — it is to synthesize.
Fix: The orchestrator must reason about subagent outputs: resolve conflicts, eliminate redundancy, identify gaps, and produce a unified result. This is the most important step in the orchestration loop.
Anti-Pattern 5: No Error Boundaries
What it looks like: A subagent fails (tool error, hallucination, infinite loop), and the failure cascades to the entire system.
Why it fails: Subagents are semi-autonomous. They will fail. If you don't handle failure at the subagent level, you don't have a system — you have a hope.
Fix: Every subagent should have a max iteration limit, a timeout, and a fallback. The orchestrator should handle subagent failure gracefully — retry, skip, or ask the user for guidance.
Part 8: Production Checklist
Before you ship an agent system, walk through this checklist:
MCP Servers
- Tool descriptions are specific, unambiguous, and include negative instructions (what NOT to do)
- Input schemas have proper validation with enums, bounds, and required fields
- Error messages are descriptive enough for the model to self-correct
- Sensitive operations require explicit confirmation (not just the model deciding to call them)
- Resources return focused data, not unbounded dumps
- Rate limiting is implemented for expensive operations
Agent
- Max iteration limit is set (prevent infinite loops)
- Cost ceiling is enforced (stop after $X in API calls)
- Destructive actions require human approval
- Conversation history is pruned to stay within context limits
- Agent outputs are logged for debugging and evaluation
Skills
- Each skill is under 2,000 tokens
- Skills are tested against real tasks (not just reviewed by humans)
- Skills include negative instructions (what to avoid)
- Skills are versioned and can be updated without redeploying
Subagents
- Each subagent has a clear, bounded task description
- Each subagent has a scoped tool set (least privilege)
- Timeout and max iterations are set per subagent
- The orchestrator synthesizes results (not concatenates)
- Subagent failures are handled gracefully
- Independent subagents run in parallel for performance
Observability
- Every tool call is logged with inputs, outputs, and latency
- Every LLM call is logged with token counts and cost
- Subagent spawning and completion events are tracked
- End-to-end task duration and cost are measurable
- Error rates are tracked per tool, per subagent, and per task type
The One-Page Reference
Print this. Pin it to your wall. Send it to your team.
┌─────────────────────────────────────────────────────────────┐
│ AI BUILDING BLOCKS │
│ Quick Reference │
├─────────────────────────────────────────────────────────────┤
│ │
│ MCP SERVER │
│ ┄┄┄┄┄┄┄┄┄┄ │
│ What: Standardized connection to external system │
│ When: Multiple AI clients need the same integration │
│ Not: A decision-maker (it's plumbing, not brains) │
│ Key: Tool descriptions ARE your interface — write well │
│ │
│ AGENT │
│ ┄┄┄┄┄ │
│ What: LLM in a think→act→observe loop │
│ When: Task requires planning, reasoning, multi-step work │
│ Not: A one-shot API call (that's just an LLM call) │
│ Key: Set max iterations + cost limits ALWAYS │
│ │
│ SKILL │
│ ┄┄┄┄┄ │
│ What: Reusable bundle of expertise and workflows │
│ When: Agent needs domain knowledge not in training data │
│ Not: A tool (skills teach, tools act) │
│ Key: Keep under 2000 tokens — focused beats exhaustive │
│ │
│ SUBAGENT │
│ ┄┄┄┄┄┄┄┄ │
│ What: Specialized agent handling a delegated subtask │
│ When: Task is multi-faceted with independent subtasks │
│ Not: Always necessary (single agent + tools often enough) │
│ Key: Scope tools per subagent — structural least-privilege│
│ │
├─────────────────────────────────────────────────────────────┤
│ COMPOSITION RULE: Start at the bottom, add layers only │
│ when the simpler approach demonstrably fails. │
│ │
│ Direct call → MCP → Agent → Agent+Skill → Subagents │
│ simple ─────────────────────────────── complex │
└─────────────────────────────────────────────────────────────┘
The Bigger Picture
MCP, agents, skills, and subagents are not competing approaches. They are layers in a stack, each solving a different problem:
- MCP standardizes the connection between AI and the world
- Agents provide the reasoning that decides what to do
- Skills encode the expertise that guides how to do it well
- Subagents enable the delegation that scales complex work
The developers who build great AI systems in 2026 are not the ones who use the most sophisticated architecture. They are the ones who use the right architecture — the simplest one that solves their actual problem. Start with a direct tool call. Add MCP when you need interoperability. Add an agent when you need reasoning. Add skills when you need expertise. Add subagents when you need delegation.
And when someone tells you they need an "agentic MCP-powered multi-subagent system with dynamic skill loading" for a feature that sends Slack notifications — send them this article.
Sources & Further Reading
- Model Context Protocol Specification
- MCP Architecture
- Skills Explained — Claude Blog
- Anthropic Agent SDK Documentation
- MCP, Skills, and Agents — David Cramer
- Best Practices for AI Agents, Subagents, Skills & MCP — Foojay
- MCP Reference Servers
- MCP Inspector
- Model Context Protocol — Wikipedia
- Everything About MCP in 2026 — WorkOS
Was this article helpful?
Related Posts
MCP: The Developer's Guide to the Protocol Quietly Rewiring AI Applications
Model Context Protocol (MCP) is becoming the USB-C of AI integration — a single standard for connecting LLMs to any tool, database, or API. Here's the architecture, the primitives, and how to build your first server.
Read moreAI Agents Keep Dying in Production. The Fix Was Invented in 1986.
Your agent framework handles the happy path. Erlang's supervision trees handled telecom uptime for 40 years. Here's how to apply the same 'let it crash' philosophy to make AI agents self-healing.
Read moreHow to Give Claude Full Control of Your WordPress Site Using MCP
WordPress.com just shipped 19 write operations via MCP — your AI agent can now draft posts, fix SEO, and manage your entire site in plain English.
Read moreComments
No comments yet. Be the first to share your thoughts!