MAC - Model Acceptance Criteria. The testing discipline analytical data has always needed.

MAC Model Acceptance Criteria · v0 · coming soon

Model Acceptance Criteria One Cell.

For the data leaders building agentic discipline into their team's workflow. MAC is how we prove our data models work when the agents reading those models won't pause to ask. A free 7-email course on the testing matrix that turns "we have some not_nulls" into a deliberate coverage system.

Day 1 ships immediately. Day 14 wraps the course. Then you decide.

The pattern

TDD → code.
Evals → LLMs.
MAC → data.

Every era of software gets the testing discipline its problems demand. Web apps got TDD. Foundation models got evals. Analytical data has always needed a framework that handles aggregates, lineage, and reconciliation, and never had one. MAC is that framework.

The problem

Your tests are in one corner of a much larger room.

Pull up any schema.yml in your repo and count: how many of your tests check a single column in isolation against a fixed rule? In most codebases, north of 90%.

That means every test you have lives in one cell of a much larger space. One corner of a matrix you didn’t know you were supposed to fill.

Absolute Abs

Rel: Source Src

Rel: Production Prod

Rel: Recon Recon

Temporal Temp

Human Human

Column

Row

Aggregate

The X is where most teams test. The ?s are where the bugs hide.

On a recent gold-layer pipeline model, the inherited tests covered 14 cases. Most were not_null, unique, and accepted_values. By the time the matrix was walked, the model had 95 tests. The 81 new ones weren’t edge cases. They were the cells nobody had thought to fill.

One of those new tests caught thousands of rows in production reporting where opportunity type and sub-type didn’t align for closed-won deals. The bug had been live for months. Nobody caught it because nobody had a framework that told them where to look.

The framework

Scope × Basis.
Three rows. Six columns.
Eighteen cells.

Every meaningful test on an analytical model lives at the intersection of two questions. Where are you evaluating? And against what?

Absolute Abs

Rel: Source Src

Rel: Production Prod

Rel: Recon Recon

Temporal Temp

Human Human

Column

Row

Aggregate

One cell per intersection. Walk every cell. Decide explicitly.

Scope

Column. One field, in isolation. amount > 0. stage is in an accepted list.

Row. One record, across columns. start_date < end_date. New-business opportunities must have positive ACV.

Aggregate. The whole dataset. Accounting identities. Row-count reconciliation. Total pipeline ties to the executive dashboard.

Basis

Absolute. A fixed rule. No external reference.

Relative: Source. Did the transformation preserve what it was supposed to?

Relative: Production. Does the new model tie to the old truth?

Relative: Recon. Stripe, the bank, a vendor export.

Temporal. Did last quarter’s numbers retroactively change?

Human. Show the number to someone who knows the business and watch their face.

A model isn’t "tested" until you’ve walked every cell and decided explicitly whether it applies, what the check is, or why you’re skipping it. That’s the discipline.

The agent multiplier

95 tests is a lot.
Agents make it work.

If 95 tests fire alerts at 3am and a human triages every one, the on-call rotation burns out in a week. That’s why most data teams can’t run this many checks. The signal-to-noise destroys them before the tests prove their worth.

Severity tiers change the math. Every test runs, but only Stops wake a human. Pauses go to an agent that knows the lineage, pulls row counts from the last seven runs, cross-references the affected models, and writes a triage note. By the time a human looks, the question isn’t "what happened?" It’s "do I accept this explanation?"

And the flip side: agents require this rigor. A model making decisions on your data at 3am doesn’t Slack you when a number looks weird. It acts on what you served. The old testing posture - catch the big stuff, trust the rest - stops being safe.

MAC makes the load bearable for humans and the surface legible for agents. It’s the testing discipline analytical data has always needed and now - finally - has the tooling to deploy.

What you walk away with

Four things, no fluff.

Triage discipline

Stop / Pause / Go severity. Agents handle Pause; humans only see Stop. The on-call rotation survives.

A coverage map

The 3x6 Scope x Basis matrix. Walk every cell, decide every test deliberately. Nothing slips because nobody knew to look.

A diagnostic you run Monday

Pick a model. Walk the matrix. Ship the gap-fill by the end of the day. No new tooling required.

Agent-ready surface area

Tests an agent can read, triage, and explain. The discipline that makes 95-test pipelines bearable for humans.

The 7-email course

One cell at a time, end to end.

Day 1 · Your tests are in one cell. The matrix problem. Why not_null + unique + accepted_values covers ~10% of the surface area, and the 3am question that exposes the gap.
Day 3 · Walking the Scope axis. Column vs. Row vs. Aggregate. What each one catches, and the quick diagnostic to figure out which scope your current tests are covering.
Day 5 · Walking the Basis axis. Absolute, Relative: Source, Relative: Production, Relative: Recon, Temporal, Human. The six lenses, with one example each.
Day 7 · System of Record. The prerequisite question nobody answers. Why half of all dbt projects can’t write trustworthy Relative: Source tests until they fix this.
Day 9 · Severity tiers. Stop, Pause, Go. Layer-aware (Bronze / Silver / Gold). The mapping that lets agents handle Pauses without waking humans.
Day 12 · The agent multiplier. Why MAC works now and didn’t five years ago. Jevons paradox for data testing: lower the per-test triage cost, and the economically-viable test surface expands.
Day 14 · What to do Monday. The one-page diagnostic. Pick a model, walk the matrix, ship the gap-fill. Plus what comes next.

Get the course

Free.
No vendor noise.

7 emails over 14 days. Practitioner takes from real client engagements. No "AI will replace you" panic pieces. Unsubscribe with one click.

Who’s sending this

Ben Wilson.
Founder, Ray Data Co.

Twelve years building analytical data systems for managers who have to defend the numbers in front of their executives. Most recently ran a multi-team data-quality discipline across enterprise SaaS engagements: 11 clients, 24 statements of work, 6,866 hours, owning gold-layer models that finance leaders read off in board decks. The MAC framework came out of that work: every quarter, the same shape of bug surfacing the same way, and no language for the testing gap that let it through.

Ray Data Co is the consulting and writing practice. The weekly newsletter is Sanity Check - field notes from the edge of agentic adoption in data engineering, written for the manager-data-leads building agentic discipline into their teams. MAC is the first piece of permissionless leverage coming out of that work: a course, a methodology, and the start of a longer book project.

TDD → code. Evals → LLMs. MAC → data.