MAC Model Acceptance Criteria · v0 · coming soon
For the data leaders building agentic discipline into their team's workflow. MAC is how we prove our data models work when the agents reading those models won't pause to ask. A free 7-email course on the testing matrix that turns "we have some not_nulls" into a deliberate coverage system.
You’re in. Day 1 of the course is on its way.
The next email lands in two days. Watch your inbox - and your spam folder, just in case.
The pattern
Every era of software gets the testing discipline its problems demand. Web apps got TDD. Foundation models got evals. Analytical data has always needed a framework that handles aggregates, lineage, and reconciliation, and never had one. MAC is that framework.
The problem
Pull up any schema.yml in your repo and count: how many of your tests check a single column in isolation against a fixed rule? In most codebases, north of 90%.
That means every test you have lives in one cell of a much larger space. One corner of a matrix you didn’t know you were supposed to fill.
On a recent gold-layer pipeline model, the inherited tests covered 14 cases. Most were not_null, unique, and accepted_values. By the time the matrix was walked, the model had 95 tests. The 81 new ones weren’t edge cases. They were the cells nobody had thought to fill.
One of those new tests caught thousands of rows in production reporting where opportunity type and sub-type didn’t align for closed-won deals. The bug had been live for months. Nobody caught it because nobody had a framework that told them where to look.
The framework
Every meaningful test on an analytical model lives at the intersection of two questions. Where are you evaluating? And against what?
Column. One field, in isolation. amount > 0. stage is in an accepted list.
Row. One record, across columns. start_date < end_date. New-business opportunities must have positive ACV.
Aggregate. The whole dataset. Accounting identities. Row-count reconciliation. Total pipeline ties to the executive dashboard.
Absolute. A fixed rule. No external reference.
Relative: Source. Did the transformation preserve what it was supposed to?
Relative: Production. Does the new model tie to the old truth?
Relative: Recon. Stripe, the bank, a vendor export.
Temporal. Did last quarter’s numbers retroactively change?
Human. Show the number to someone who knows the business and watch their face.
A model isn’t "tested" until you’ve walked every cell and decided explicitly whether it applies, what the check is, or why you’re skipping it. That’s the discipline.
The agent multiplier
If 95 tests fire alerts at 3am and a human triages every one, the on-call rotation burns out in a week. That’s why most data teams can’t run this many checks. The signal-to-noise destroys them before the tests prove their worth.
Severity tiers change the math. Every test runs, but only Stops wake a human. Pauses go to an agent that knows the lineage, pulls row counts from the last seven runs, cross-references the affected models, and writes a triage note. By the time a human looks, the question isn’t "what happened?" It’s "do I accept this explanation?"
And the flip side: agents require this rigor. A model making decisions on your data at 3am doesn’t Slack you when a number looks weird. It acts on what you served. The old testing posture - catch the big stuff, trust the rest - stops being safe.
MAC makes the load bearable for humans and the surface legible for agents. It’s the testing discipline analytical data has always needed and now - finally - has the tooling to deploy.
What you walk away with
Triage discipline
Stop / Pause / Go severity. Agents handle Pause; humans only see Stop. The on-call rotation survives.
A coverage map
The 3x6 Scope x Basis matrix. Walk every cell, decide every test deliberately. Nothing slips because nobody knew to look.
A diagnostic you run Monday
Pick a model. Walk the matrix. Ship the gap-fill by the end of the day. No new tooling required.
Agent-ready surface area
Tests an agent can read, triage, and explain. The discipline that makes 95-test pipelines bearable for humans.
The 7-email course
Get the course
7 emails over 14 days. Practitioner takes from real client engagements. No "AI will replace you" panic pieces. Unsubscribe with one click.
You’re in. Day 1 of the course is on its way.
The next email lands in two days. Watch your inbox - and your spam folder, just in case.
Who’s sending this
Twelve years building analytical data systems for managers who have to defend the numbers in front of their executives. Most recently ran a multi-team data-quality discipline across enterprise SaaS engagements: 11 clients, 24 statements of work, 6,866 hours, owning gold-layer models that finance leaders read off in board decks. The MAC framework came out of that work: every quarter, the same shape of bug surfacing the same way, and no language for the testing gap that let it through.
Ray Data Co is the consulting and writing practice. The weekly newsletter is Sanity Check - field notes from the edge of agentic adoption in data engineering, written for the manager-data-leads building agentic discipline into their teams. MAC is the first piece of permissionless leverage coming out of that work: a course, a methodology, and the start of a longer book project.