By the end of this module you can…
- Explain the four Managed Agents primitives — Agent → Environment → Session → Events — and the order they're created in.
- Stand up an agent that writes and runs Python in a sandboxed cloud container you never provision.
- Wire custom tools so the cloud agent calls functions running on your own machine.
- Stream a reply event-by-event and handle tool calls mid-stream.
- List and reload stateful sessions with no database of your own.
1 · Prerequisites & setup
You need Python 3.10+ and an Anthropic API key. Clone the workshop bundle and enter this module's folder:
git clone https://github.com/anthropics/cwc-workshops
cd cwc-workshops/ship-your-first-managed-agent
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env # then put your ANTHROPIC_API_KEY in .env
streamlit run app.py
The dashboard opens at localhost:8501. Click around — Metrics, Logs, and Deploys all work. The SRE Agent panel on the right reads:
setup_agent() in agent.py"That message is your starting line.
2 · What you build
Open agent.py. It contains seven functions, each currently raise NotImplementedError. Fill them in one at a time and the panel comes online step by step. The whole job is about 34 lines of code — everything else (the system prompt, the tool schemas, the chat UI, the session picker) is already provided in provided.py.
| # | Function | API call | Lines |
|---|---|---|---|
| 1 | setup_agent() | client.beta.agents.create | 3 |
| 2 | setup_environment() | client.beta.environments.create | 4 |
| 3 | upload_log() | client.beta.files.upload | 2 |
| 4 | start_session() | client.beta.sessions.create | 5 |
| 5 | stream_reply() | sessions.events.stream + .send | 12 |
| 6 | handle_tool() | runs locally — reads data/*.json | 7 |
| 7 | delete_session() | client.beta.sessions.delete | 1 |
agent_complete.py holds the finished versions of all seven. Treat it as the answer key — reach for it only after you've tried, and in a graded setting your instructor releases it after submission.
3 · Walkthrough
Checkpoint A — the agent wakes up (functions 1–4)
The first four functions stand up the scaffolding in the order the Managed Agents quickstart introduces them:
setup_agent()— creates the Agent: the persistent definition (its system prompt and tools).setup_environment()— creates the Environment: the sandboxed container the agent runs code in.upload_log()— pushesapp.loginto that environment so the agent can grep it in the cloud.start_session()— opens a Session: one conversation against the agent + environment.
After these, the panel stops saying "offline" — but it can't answer yet.
Checkpoint B — it speaks and acts (functions 5–6)
stream_reply() is the heart of the module (~12 lines). It sends the user's message and iterates the Events stream — text deltas render to the chat, and when the agent asks to call one of your tools, you pause, run it, and feed the result back.
handle_tool() is where the magic of "cloud agent, local hands" lives. The agent runs in Anthropic's cloud, but these handlers run on your laptop, reading local data/*.json:
get_metrics → data/metrics.json
get_recent_deploys → data/deploys.json
get_diff → data/diff.txt
Checkpoint C — clean up (function 7)
delete_session() is one line. Sessions are stateful and listed by the picker above the chat; deleting tears one down.
4 · The incident you're solving
The data/ folder ships a real-feeling outage. At 14:31:18 UTC, commit a3f9c21 deploys to the checkout service, replacing a batched query with a per-row loop. Within minutes p99 latency climbs from 65 ms to 3,600 ms, the DB connection pool saturates, and 20% of checkouts start failing.
The evidence is deliberately spread across four sources the agent has to correlate:
app.log— 70,000 lines of JSON logs, grepped in the agent's sandbox.metrics.json— whatget_metricsreturns (the latency climb).deploys.json— whatget_recent_deploysreturns (the suspect deploy).diff.txt— whatget_diffreturns (the offending change).
5 · Assessment — what "done" looks like
You're finished when you can type into the panel:
…and watch the agent grep the 70k-line log in its sandbox, call your local tools for metrics and deploys, correlate the timestamps, fetch the offending diff, and name commit a3f9c21 as the root cause.
A headless check is provided: run python e2e.py to exercise the full path without the UI. In a graded setting this is the auto-marker — a passing e2e.py plus a correct root-cause answer is the completion bar.
6 · What this teaches (and where it goes)
- Agent → Environment → Session → Events — the four resources, in quickstart order.
- Sandboxed code execution — the agent writes and runs Python in a container you never provisioned.
- Custom tools — the cloud agent calls functions on your machine over the event stream.
- Stateful sessions —
sessions.list()populates the picker; selecting one reloads the conversation fromevents.list(). No database, no local state.
agent.py ← the only file you edit
agent_complete.py ← reference implementation
provided.py ← system prompt, tool schemas, chat UI, session picker
e2e.py ← headless test of the full path
app.py ← incident overview
pages/ ← Metrics, Logs, Deploys
data/ ← log + metrics + deploys + diff fixtures
ui.py, assets/ ← styling
Stretch goals
- Replace one mock tool reader with a real API client (e.g. a live Git provider for
get_diff). - Add a fourth tool — say
get_traces— end to end: schema inprovided.py, handler inhandle_tool(). - Add a second incident fixture and confirm the agent generalises instead of pattern-matching the first.