Home  /  Modules  /  Module 04
MODULE 04 · intermediate · ~60 min

Eval-Driven Agent Development

Iterate a PPTX-generating agent through six variants — naive to QA-loop — scoring every change against a 10-task suite so each prompt change is measured, not vibed.

This module is being built

Module 1 is the fully-developed reference. This page captures everything needed to expand Module 04 next: its scenario, learning objectives, and source material.

See a finished module →

Planned learning objectives

  • Build a two-layer grader: programmatic .pptx XML metrics plus an LLM-as-judge on rendered slides.
  • Run a 10-task suite and read per-variant scores.
  • Iterate naive → visual → typography → palette → density → QA-loop.
  • Make prompt decisions from evidence instead of intuition.

Source material

This module adapts cwc-workshops/eval-driven-agent-development (Apache-2.0). The build will follow the standard module template: scenario, objectives, setup, walkthrough with checkpoints, assessment, and stretch goals — the same shape as Module 1.

View the upstream workshop on GitHub →