Home  /  Modules  /  Module 1
MODULE 01 · ~60–90 min · beginner

Ship Your First Managed Agent

It's 2am. PagerDuty fires: checkout p99 latency is 10× baseline. You know the drill — open the dashboard, grep logs, scroll deploys, guess, check, repeat. Forty minutes later you find it: someone shipped an N+1 query. This module builds the agent that does those forty minutes for you.

The scenario. You're handed a working incident dashboard for a fictional e-commerce stack. Metrics, Logs, and Deploys all run from mock data. On the side of every page is an SRE Agent chat panel — and it's offline. Bringing it online is the module.

By the end of this module you can…

  • Explain the four Managed Agents primitives — Agent → Environment → Session → Events — and the order they're created in.
  • Stand up an agent that writes and runs Python in a sandboxed cloud container you never provision.
  • Wire custom tools so the cloud agent calls functions running on your own machine.
  • Stream a reply event-by-event and handle tool calls mid-stream.
  • List and reload stateful sessions with no database of your own.

1 · Prerequisites & setup

You need Python 3.10+ and an Anthropic API key. Clone the workshop bundle and enter this module's folder:

git clone https://github.com/anthropics/cwc-workshops
cd cwc-workshops/ship-your-first-managed-agent

python -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate

pip install -r requirements.txt
cp .env.example .env               # then put your ANTHROPIC_API_KEY in .env
streamlit run app.py

The dashboard opens at localhost:8501. Click around — Metrics, Logs, and Deploys all work. The SRE Agent panel on the right reads:

"agent offline — implement setup_agent() in agent.py"

That message is your starting line.

2 · What you build

Open agent.py. It contains seven functions, each currently raise NotImplementedError. Fill them in one at a time and the panel comes online step by step. The whole job is about 34 lines of code — everything else (the system prompt, the tool schemas, the chat UI, the session picker) is already provided in provided.py.

#FunctionAPI callLines
1setup_agent()client.beta.agents.create3
2setup_environment()client.beta.environments.create4
3upload_log()client.beta.files.upload2
4start_session()client.beta.sessions.create5
5stream_reply()sessions.events.stream + .send12
6handle_tool()runs locally — reads data/*.json7
7delete_session()client.beta.sessions.delete1
Stuck? agent_complete.py holds the finished versions of all seven. Treat it as the answer key — reach for it only after you've tried, and in a graded setting your instructor releases it after submission.

3 · Walkthrough

Checkpoint A — the agent wakes up (functions 1–4)

The first four functions stand up the scaffolding in the order the Managed Agents quickstart introduces them:

After these, the panel stops saying "offline" — but it can't answer yet.

Checkpoint B — it speaks and acts (functions 5–6)

stream_reply() is the heart of the module (~12 lines). It sends the user's message and iterates the Events stream — text deltas render to the chat, and when the agent asks to call one of your tools, you pause, run it, and feed the result back.

handle_tool() is where the magic of "cloud agent, local hands" lives. The agent runs in Anthropic's cloud, but these handlers run on your laptop, reading local data/*.json:

get_metrics        → data/metrics.json
get_recent_deploys → data/deploys.json
get_diff           → data/diff.txt
Why this matters. Swap those three mock readers for a real Datadog client and a Git API, and the exact same agent is running in production. The tool boundary is the whole point — the model reasons in the cloud, your code touches your systems.

Checkpoint C — clean up (function 7)

delete_session() is one line. Sessions are stateful and listed by the picker above the chat; deleting tears one down.

4 · The incident you're solving

The data/ folder ships a real-feeling outage. At 14:31:18 UTC, commit a3f9c21 deploys to the checkout service, replacing a batched query with a per-row loop. Within minutes p99 latency climbs from 65 ms to 3,600 ms, the DB connection pool saturates, and 20% of checkouts start failing.

The evidence is deliberately spread across four sources the agent has to correlate:

5 · Assessment — what "done" looks like

You're finished when you can type into the panel:

"What caused the latency spike?"

…and watch the agent grep the 70k-line log in its sandbox, call your local tools for metrics and deploys, correlate the timestamps, fetch the offending diff, and name commit a3f9c21 as the root cause.

A headless check is provided: run python e2e.py to exercise the full path without the UI. In a graded setting this is the auto-marker — a passing e2e.py plus a correct root-cause answer is the completion bar.

6 · What this teaches (and where it goes)

Repo layout (for reference)
agent.py            ← the only file you edit
agent_complete.py   ← reference implementation
provided.py         ← system prompt, tool schemas, chat UI, session picker
e2e.py              ← headless test of the full path
app.py              ← incident overview
pages/              ← Metrics, Logs, Deploys
data/               ← log + metrics + deploys + diff fixtures
ui.py, assets/      ← styling

Stretch goals