RM-004 · Advanced · 4–6 months

Agent Builder

Design, deploy, and evaluate autonomous agents. From single tool-use loops to production multi-agent systems.

An agent builder ships LLM systems that take actions, not just generate text. The art is knowing when to add another agent, when to add a tool, and when to take the human out of the loop.

0/28 topics · 0% complete
S01

Agent Basics

What an agent actually is, beyond the marketing.

  1. 01

    What Counts as an Agent

    core

    Loop + tools + autonomy. Anything less is just a chatbot with extra steps.

  2. 02

    The Agent Loop

    core

    Plan → act → observe → reflect → repeat. Every framework is some flavor of this.

  3. 03

    When Not to Use an Agent

    core

    If a single prompt or a small pipeline works, use that. Agents add latency, cost, and failure modes.

S02

Tool Use

The thing that makes an agent more than a chatbot.

  1. 01

    Designing Tools

    core

    Narrow schemas, clear names, honest error messages. Tools are an API for the model.

  2. 02

    Tool Orchestration

    core

    Parallel tool calls, dependent calls, partial failures. The model can't see your queue.

  3. 03

    MCP Servers

    recommended

    Package tools as MCP servers so any agent runtime can use them.

S03

Memory & Context

Agents that forget the last thing they did aren't agents.

  1. 01

    Short-Term (Context Window)

    core

    Compaction, summarization, sliding windows. Long sessions decay without help.

  2. 02

    Long-Term Memory

    recommended

    File-based, vector-based, structured. Pick by query pattern, not by hype.

  3. 03

    State Machines

    recommended

    Some workflows aren't free-form — model the state explicitly, let the agent fill in the steps.

S04

Orchestration Patterns

Single-agent loops, planner/worker, fan-out, voting, human-in-the-loop.

  1. 01

    Single-Agent Loops

    core

    The default. Master this before adding more agents.

  2. 02

    Sequential Pipelines

    core

    Plan → Build → Review → Deploy. Each stage is a focused agent.

  3. 03

    Fan-Out / Fan-In

    recommended

    Parallel subagents for independent work. Aggregate at the end.

  4. 04

    Voting & Consensus

    optional

    When correctness matters more than latency — multiple agents, majority vote.

  5. 05

    Human-in-the-Loop

    core

    Approval gates at the right points keep autonomy from becoming catastrophe.

S05

Agent Evaluation

Agents fail in weirder ways than prompts. You need harnesses, not vibes.

  1. 01

    Trajectory Evaluation

    core

    Score the path, not just the final answer. Wrong tools called early are silent failures.

  2. 02

    Task Success Metrics

    core

    Define 'done' for each task type. Auto-graded where possible, human-graded otherwise.

  3. 03

    Regression Suites

    recommended

    Lock in known-good traces. Block deploys when an agent regresses on them.

S06

Safety & Permissions

Autonomy without guardrails is a liability. Sandbox, scope, audit.

  1. 01

    Sandboxing

    core

    Filesystem, network, shell — restrict the blast radius before you give an agent the keys.

  2. 02

    Permission Models

    core

    Allowlists, approval gates, scoped credentials. Default deny.

  3. 03

    Audit Logs

    recommended

    Every tool call, every input, every output. Future incidents need this trail.

S07

Production Agents

What it takes to run agents 24/7 without a human babysitting.

  1. 01

    Cost Budgeting

    core

    Per-task budgets, model tiering, and the kill switch when an agent loops.

  2. 02

    Agent Observability

    core

    Traces, span trees, replay. You'll want to scrub a failed run like a video.

  3. 03

    Failure Modes

    core

    Loops, hallucinated tool calls, scope drift. Detect, retry, escalate.

  4. 04

    SLAs & Reliability

    recommended

    What you can promise when the underlying model is non-deterministic.

S08

Real Agent Patterns

Battle-tested templates you can adapt.

  1. 01

    Auto-PR Pipeline

    recommended

    Issue → plan → patch → review → merge. The canonical multi-agent workflow.

  2. 02

    Doc Team

    optional

    Generate, review, and refresh docs from the codebase on a schedule.

  3. 03

    Triage Swarm

    optional

    Classify, label, and route incoming issues / tickets / emails.

  4. 04

    Migration Squad

    optional

    Codebase-wide refactors broken into parallel agents.