2025-12-08 · Sora Yamane
Golden Files and Device-Class Buckets
How we split baselines without exploding storage costs in CI.
Golden baselines multiply quickly once you respect font scaling and locale. We bucket devices into “compact ja”, “compact en”, and “tablet en” only—adding a bucket requires deleting another to keep storage flat.
CI uploads failures as triptychs: expected, actual, and diff mask tinted for daltonic reviewers. The triptych naming matches the branch slug so artifact browsers stay searchable.
Flaky renders often trace to shader warmup. We pre-roll a warm-up frame in test harnesses before capturing goldens, documented in the track workbook so students do not mistake it for cheating.
When a baseline changes legitimately, the PR must include a one-paragraph rationale referencing the widget diff—not only the image swap—so future auditors understand intent.