Module 1 · Ship Your First Managed Agent

The scenario. You're handed a working incident dashboard for a fictional e-commerce stack. Metrics, Logs, and Deploys all run from mock data. On the side of every page is an SRE Agent chat panel — and it's offline. Bringing it online is the module.

By the end of this module you can…

Explain the four Managed Agents primitives — Agent → Environment → Session → Events — and the order they're created in.
Stand up an agent that writes and runs Python in a sandboxed cloud container you never provision.
Wire custom tools so the cloud agent calls functions running on your own machine.
Stream a reply event-by-event and handle tool calls mid-stream.
List and reload stateful sessions with no database of your own.

1 · Prerequisites & setup

You need Python 3.10+ and an Anthropic API key. Clone the workshop bundle and enter this module's folder:

git clone https://github.com/anthropics/cwc-workshops
cd cwc-workshops/ship-your-first-managed-agent

python -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate

pip install -r requirements.txt
cp .env.example .env               # then put your ANTHROPIC_API_KEY in .env
streamlit run app.py

The dashboard opens at localhost:8501. Click around — Metrics, Logs, and Deploys all work. The SRE Agent panel on the right reads:

"agent offline — implement setup_agent() in agent.py"

That message is your starting line.

2 · What you build

Open agent.py. It contains seven functions, each currently raise NotImplementedError. Fill them in one at a time and the panel comes online step by step. The whole job is about 34 lines of code — everything else (the system prompt, the tool schemas, the chat UI, the session picker) is already provided in provided.py.

#	Function	API call	Lines
1	`setup_agent()`	`client.beta.agents.create`	3
2	`setup_environment()`	`client.beta.environments.create`	4
3	`upload_log()`	`client.beta.files.upload`	2
4	`start_session()`	`client.beta.sessions.create`	5
5	`stream_reply()`	`sessions.events.stream` + `.send`	12
6	`handle_tool()`	runs locally — reads `data/*.json`	7
7	`delete_session()`	`client.beta.sessions.delete`	1

Stuck? agent_complete.py holds the finished versions of all seven. Treat it as the answer key — reach for it only after you've tried, and in a graded setting your instructor releases it after submission.

3 · Walkthrough

Checkpoint A — the agent wakes up (functions 1–4)

The first four functions stand up the scaffolding in the order the Managed Agents quickstart introduces them:

setup_agent() — creates the Agent: the persistent definition (its system prompt and tools).
setup_environment() — creates the Environment: the sandboxed container the agent runs code in.
upload_log() — pushes app.log into that environment so the agent can grep it in the cloud.
start_session() — opens a Session: one conversation against the agent + environment.

After these, the panel stops saying "offline" — but it can't answer yet.

Checkpoint B — it speaks and acts (functions 5–6)

stream_reply() is the heart of the module (~12 lines). It sends the user's message and iterates the Events stream — text deltas render to the chat, and when the agent asks to call one of your tools, you pause, run it, and feed the result back.

handle_tool() is where the magic of "cloud agent, local hands" lives. The agent runs in Anthropic's cloud, but these handlers run on your laptop, reading local data/*.json:

get_metrics        → data/metrics.json
get_recent_deploys → data/deploys.json
get_diff           → data/diff.txt

Why this matters. Swap those three mock readers for a real Datadog client and a Git API, and the exact same agent is running in production. The tool boundary is the whole point — the model reasons in the cloud, your code touches your systems.

Checkpoint C — clean up (function 7)

delete_session() is one line. Sessions are stateful and listed by the picker above the chat; deleting tears one down.

4 · The incident you're solving

The data/ folder ships a real-feeling outage. At 14:31:18 UTC, commit a3f9c21 deploys to the checkout service, replacing a batched query with a per-row loop. Within minutes p99 latency climbs from 65 ms to 3,600 ms, the DB connection pool saturates, and 20% of checkouts start failing.

The evidence is deliberately spread across four sources the agent has to correlate:

app.log — 70,000 lines of JSON logs, grepped in the agent's sandbox.
metrics.json — what get_metrics returns (the latency climb).
deploys.json — what get_recent_deploys returns (the suspect deploy).
diff.txt — what get_diff returns (the offending change).

5 · Assessment — what "done" looks like

You're finished when you can type into the panel:

"What caused the latency spike?"

…and watch the agent grep the 70k-line log in its sandbox, call your local tools for metrics and deploys, correlate the timestamps, fetch the offending diff, and name commit a3f9c21 as the root cause.

A headless check is provided: run python e2e.py to exercise the full path without the UI. In a graded setting this is the auto-marker — a passing e2e.py plus a correct root-cause answer is the completion bar.

6 · What this teaches (and where it goes)

Agent → Environment → Session → Events — the four resources, in quickstart order.
Sandboxed code execution — the agent writes and runs Python in a container you never provisioned.
Custom tools — the cloud agent calls functions on your machine over the event stream.
Stateful sessions — sessions.list() populates the picker; selecting one reloads the conversation from events.list(). No database, no local state.

Repo layout (for reference)

agent.py            ← the only file you edit
agent_complete.py   ← reference implementation
provided.py         ← system prompt, tool schemas, chat UI, session picker
e2e.py              ← headless test of the full path
app.py              ← incident overview
pages/              ← Metrics, Logs, Deploys
data/               ← log + metrics + deploys + diff fixtures
ui.py, assets/      ← styling

Stretch goals

Replace one mock tool reader with a real API client (e.g. a live Git provider for get_diff).
Add a fourth tool — say get_traces — end to end: schema in provided.py, handler in handle_tool().
Add a second incident fixture and confirm the agent generalises instead of pattern-matching the first.