Home  /  Modules  /  Module 05
MODULE 05 · intermediate · ~45 min

Picking the Right Model

A sweep is only as trustworthy as the eval underneath it. Audit an eval suite for reliability gotchas, then sweep it across models and inference parameters to find the best quality-per-dollar configuration.

This module is being built

Module 1 is the fully-developed reference. This page captures everything needed to expand Module 05 next: its scenario, learning objectives, and source material.

See a finished module →

Planned learning objectives

  • Audit an eval for task-design, harness, metrics-hygiene, and grader-bias gotchas using a Claude Code skill.
  • Wrap any eval in a model × thinking × effort grid.
  • Instrument per-cell pass rate, cost, and latency.
  • Produce comparison plots and a one-sentence model recommendation.

Source material

This module adapts cwc-workshops/rightmodel (Apache-2.0). The build will follow the standard module template: scenario, objectives, setup, walkthrough with checkpoints, assessment, and stretch goals — the same shape as Module 1.

View the upstream workshop on GitHub →