The $1M Agent: Three Lessons From Building Linter Agent
May 2026
Linter Agent optimises data pipelines, and has so far transformed 100+ slow, expensive pipelines into faster, cheaper ones across 5+ enterprises. Saving over $1m in under 6 months. Whilst building this agent, 3 key areas have stood out:
- What should the agent do?
- How to build trust in our agents?
- How do agents compound and self-improve over time?
Let's break these down:
What should the agent do?
Building agents has become much easier.
I can now prompt Claude Code to build a custom agent using an agent SDK, describe its responsibility, tools and data access in natural language, then give it examples and tests so it can run, fail and improve.
The bottleneck has become purpose — where can an agent fit into my company and what should it do?
Here are some perspectives:
-
Backlogged skilled work
- Linter Agent went after code optimisations — this is an abundant space where years (/decades for pre-cloud stacks) of legacy code has accumulated and the bottleneck is squarely on skilled human labour.
- Where are you accepting delays, bringing in consultants or paying overtime to remote workers?
-
Increase the search space — work where humans currently sample, audit, or ignore low-value cases because full review is too expensive
- In large enterprises there are certain dollar values under which most approvals are automated because the cost of human verification is too high relative to the volume; this changes when you have a set of AI eyes!
- For example, in procurement, collections, support, etc.
- Alternately, think of internal teams that rotate around to do audits (e.g. cyber security swat teams).
- In large enterprises there are certain dollar values under which most approvals are automated because the cost of human verification is too high relative to the volume; this changes when you have a set of AI eyes!
How to build trust in our agents?
Linter Agent has generated over 50 PRs; what matters is getting these merged — this can be the tough part, from chasing SMEs down to iterating on tacit knowledge.
Here are some questions we should answer early on:
- Who signs off your work? Who are the "Oracles" or SMEs you turn to when unsure?
- What do they need to see to be able to quickly come to a conclusion on your agent's outcome? How can you make it easier/faster for the approver to get to a yes?
I've learnt that if reviewers need extra data to approve/reject, then that data is valuable missing context for your agent.
Do not only optimise the agent's output. Optimise the review path.
For Linter Agent's approvers, coding/data outputs are easier to verify, using compilation checks, tracking non-functional attributes (latencies) and SQL analysis over the final datasets (branch level data comparison to user defined precision).
A note on change management: Linter Agent mostly runs in the background. Some agents are more front-and-centre and change how people work day to day. In those cases, adoption has required a more personal human touch: helping teams understand what the agent is doing, where it fits, and how their role changes around it.
I am particularly interested in how others are building adoption in organisations with widespread legacy software. How are you using AI to productise what was previously a services-heavy change management motion?
How do agents compound and self-improve over time?
At work, we often value experience, as we've seen people get better at their responsibilities over time.
They learn the edge cases, the shortcuts, the exceptions, the preferences of reviewers, and the hidden constraints that never quite make it into documentation.
For agents to compound the same way, they have to absorb tacit knowledge from the SMEs around them. That sounds straightforward and isn't, for two reasons.
Tacit knowledge is, by definition, not in the trace.
Logging the agent's reasoning tells you what the agent considered. It doesn't tell you what an experienced reviewer would have noticed and the agent missed. The data you need is in the SME's head, and the only way to get it is to ask — at the right moment, with the right structure, in a UX they'll actually engage with. Free-text PR comments put the onus on the reviewer, whereas structured questions tied to the specific data/decision help the reviewer.
Experts can disagree, and the disagreement is the signal.
Two SREs will give you different opinions on the same Linter Agent PR. Both will sound right. If your feedback system treats this as noise to average over, you'll get an agent that's mediocre by consensus. The leverage is in surfacing the disagreement and asking the disambiguating question: what about this case made you call it differently from a similar one last month? The answer is a constraint — a context variable the agent didn't know about — and now it does.
What I want Linter Agent to compound on:
- the scope and complexity of an actionable optimisation increasing
- median end-to-end time falling
- fewer interventions between PR opened and merged
These are the visible outputs. The mechanism underneath them is the same mechanism that grows a junior engineer into a senior one: better intuitions about what matters in a given context, learned from being corrected by someone who already knows.
The custom UX is where this happens. It asks the right question, shows the right comparable cases, and captures not just the verdict but the context the verdict depended on. That's the flywheel.
The agents that compound are the ones whose reviewers are giving them apprenticeship, not approval.