A Framework for Disciplined AI-Assisted Development
At Ocula, we build AI agents that help enterprise retailers make their product catalogues discoverable. We also use AI agents heavily in our own engineering workflow and our team ships to production every few hours
That velocity is real. But so is a problem we’ve been forced to confront: when code is cheap to produce, what becomes expensive?
This post lays out the framework we’ve built to answer that question. We treat AI agents as collaborative partners in a structured, disciplined development process, not as autocomplete on steroids.
The Problem: Cognitive Debt
Everyone in software understands technical debt. Code that works but is poorly structured. You know it when you see it.
Cognitive debt is different. It’s the hidden cost of code that works, is well-structured, passes every test, but carries no record of why it was built that way, what trade-offs were made, or what edge cases were considered.
AI agents have made code cheap. Producing it is almost free now. But the context behind code, the reasoning, the trade-offs, the decisions, that’s as expensive as it ever was. When code is cheap but context is scarce, cognitive debt accumulates fast.
When you vibe code with AI agents, prompting loosely, accepting output uncritically, skipping documentation, you produce working code brilliantly. But you’ve also created a future where:
New team members face code archaeology instead of productive onboarding
Debugging becomes guesswork because the decision context is gone
Other agents working on adjacent features have zero context to draw from
When you hit production issues, the person who could explain the reasoning might not be available
The consequences compound over time. And they compound faster the more quickly you ship. This isn’t a call to slow down. We’re not slowing down. It’s a call to recognise that code is now the cheap part. The expensive part is the context that makes code maintainable, debuggable, and buildable. That’s what we need to protect.
Speed vs Velocity
It’s tempting to equate shipping fast with being productive. Every conversation about AI-assisted development focuses on speed: ship faster, generate more code, merge more PRs.
But speed without context is just forward motion. Speed is a scalar, it tells you how fast you’re going. Velocity is a vector, it tells you how fast you’re going and in what direction.
If half your merge requests create code that only one person understands, if the agents working on feature B have no idea what architectural decisions were made on feature A, if the person who built a critical module leaves and nobody knows why it was designed that way, then you were fast but you weren’t going anywhere useful.
Every time you skip a spec, accept agent output without scrutiny, or merge code that only one person can explain, you’re borrowing velocity from the future. You’re fast today at the cost of being slow tomorrow.
The framework we’ve built isn’t about reducing speed. It’s about converting speed into velocity by making sure every merge request carries direction: clear intent, documented decisions, and enough context that anyone, human or agent, can build on it without archaeology.
The Framework: Spec-Driven Development
The core change is simple: every piece of agent-written code starts with a human-reviewed spec.
The Workflow
1. Collaborate with the agent to produce a spec
The engineer works with their AI agent to think through the problem before writing any production code. This is the creative, architectural phase where you define what you’re building and why. It’s fine to prototype and explore with the agent at this stage, but that exploratory code is not the deliverable.
2. Open a merge request with the spec
The spec goes through human review. The team evaluates architecture, flags risks, and aligns on approach before a line of production code is written.
3. Get approval
Once reviewers sign off on the spec, the approval stays open. This is the trust gate.
4. Build, self-review against the spec, and merge.
With an approved spec in place, the engineer builds with their agent, reviews their own work against the approved spec, and merges. No additional review cycle needed. The spec approval is the gate.
What a Spec Must Include
Every spec should be a living document that gives any team member enough context to understand the work without a walkthrough. At minimum, each spec covers:
Architecture diagram: A visual representation of how the feature fits into the system. A Mermaid diagram in markdown works perfectly.
Key decisions and trade-offs: What options were considered? Why was this approach chosen? What are we knowingly deferring?
Files it will touch: An explicit list of files being created or modified, so reviewers can assess blast radius and potential conflicts.
Security considerations: Auth implications, data exposure risks, input validation requirements, and any relevant threat vectors.
Edge cases: Known boundary conditions, failure modes, and how they’re handled.
Shared patterns to be used: Which shared patterns or internal libraries the implementation will leverage. This is critical for keeping agents consistent across the team.
The Trust Contract
Follow the process and you get more freedom. Skip it and you get less.
If you submit a spec, get it approved, and build against it, you have full autonomy. You self-review against the approved spec and merge on your own timeline. No waiting for reviewers. No bottleneck.
If you don’t submit a spec, your code goes through the full traditional review process. Reviewers can, and do, refuse to review work that doesn’t have an approved spec backing it.
This wasn’t imposed top-down. The engineering team shaped it together, and it works because the trade is genuine: do the upfront thinking and you get full autonomy to ship.
The spec is what makes self-merging safe. It proves you thought through the architecture, considered edge cases, flagged security implications, and identified which files you’re touching. Without that, the safety net of peer review stays in place. With it, you’ve earned the trust to move independently.
Accountability
The person who merges the code owns that code. If something breaks, the spec and the git history together provide the full picture: what was intended, what was approved, and who shipped it.
This doesn’t mean code ships unchecked. Unit tests, pydantic evals, pre-commit hooks for linting and formatting, and automated reviews via CodeRabbit run on every merge request. The automated layer handles code quality. The human layer handles intent.
The spec isn’t overhead. It’s the ticket to autonomy. The better your spec, the faster you move, because you’re not waiting on anyone. We’re shipping faster now than we were before we introduced this process, not despite it.
Two Collaboration Surfaces
There are two kinds of collaboration happening on every AI-assisted engineering team. Most aren’t distinguishing between them.
Human-to-agent collaboration happens locally
In your editor, in your terminal, in your prompts. This is where you and your AI agent think through problems, prototype solutions, and generate code. It’s fast, iterative, and private. You go down rabbit holes. You throw things away. You ask the agent to try three different approaches before picking one. You have a 45-minute conversation about error handling that ends with “actually, scrap all of that.”
That’s fine. That’s how it should be. None of that belongs in a merge request.
Human-to-human collaboration happens in GitLab
Through merge requests, comments, and approvals. This is where the team aligns on intent, reviews architecture, and shares context.
This distinction matters because it changes what code review is for. In the old model, code review was about catching bugs, enforcing style, and verifying logic. In an AI-assisted model, the agent can write syntactically perfect, well-tested code that’s completely wrong architecturally. Or that duplicates patterns another team member already solved differently. Or that introduces a security surface nobody considered.
Line-by-line code review won’t catch any of that. Reviewing the spec will.
By separating our two collaboration surfaces, we’ve made our review process focused on the thing that matters: is the plan sound? The clean output is the spec that captures what you learned and decided. That’s what your team reviews. That’s where the collaboration happens.
Pattern Drift
Here’s a problem your team is about to have, if you don’t already.
Five engineers on your team each use an AI coding agent. Each engineer has their own prompting style, their own conventions, their own way of structuring code. The agents mirror and amplify these individual preferences.
Every single file works. Every test passes. Every feature ships. But you now have five micro-codebases wearing a monorepo as a costume.
We call this pattern drift, and it’s a second-order effect of AI-assisted development that catches teams off guard. When you’re shipping at pace, the code is correct. It’s just inconsistent. And inconsistency compounds silently until the moment someone needs to work across boundaries.
When you open a file written by someone else’s agent, it shouldn’t feel like reading a foreign dialect of the same language. But without deliberate alignment, that’s exactly what happens.
Our solution has been to standardise at the agent level. Shared patterns that define common structures, consistent conventions, and aligned approaches to recurring problems. The goal is simple: when you open a file written by someone else’s agent, you should be able to pattern-match against your own agent’s output. No code archaeology required.
This is the kind of problem that doesn’t show up in metrics. Your velocity looks great. Your test coverage is solid. Your agents are productive. But beneath the surface, your codebase is accumulating a quiet form of cognitive debt: not missing context, but contradictory conventions.
Living Specs
The worst part of joining a new codebase is the archaeology. Why is this service structured this way? What was the alternative that got rejected? Which edge cases did the original author think about, and which did they miss?
Most teams don’t write this down. And in an AI-assisted world, the person who could tell you might not even remember, because their agent did most of the work six weeks ago and the decision context was in a prompt that’s long gone.
All specs at Ocula are stored in a dedicated folder in the repository. These aren’t throwaway planning documents. They’re living references that get updated as the implementation evolves.
When a team member needs to work on a part of the codebase they didn’t build, the spec is their first stop. It tells them what was intended, what alternatives were considered, and what constraints shaped the design. No walkthrough needed. No archaeology. No Slack messages asking “hey, who built this and why?”
The compound effect has been clear. Onboarding to any part of our codebase is self-serve. The goal is for our AI agents to reference specs for adjacent features, so they carry context that would otherwise be lost. And we’re not just maintaining our pace. We’re accelerating, because the specs remove the friction that used to slow everything down as codebases grew.
Process at a Glance
Scenario
What Happens
Spec approved
You’re free to build and self-merge. Git blame tracks ownership.
No spec submitted
Your code requires full peer review. Reviewers can refuse until a spec is signed off.
Spec exists but is outdated
Update the spec to reflect reality before or alongside the code change. Specs are living documents.
The Bottom Line
Future maintenance is part of the velocity equation. If you can’t sustain your pace through launch and beyond, you haven’t actually moved fast. You’ve just front-loaded with shortcuts.
Every spec we write, every pattern we share, every decision we document adds direction to our speed. That’s what converts it into velocity.
This framework isn’t finished. It’s a living process, just like the specs it’s built around. We’re a lean team shipping at a pace that would have been impossible two years ago. The challenge now isn’t going faster. It’s making sure we can sustain this speed through scale and through the inevitable surprises.
Greg Fletcher is CTO and Co-Founder at Ocula Technologies. Evan Blair is Head of Engineering. Ronan Bradley is Principal Engineer. Ocula builds AI agents to generate the data layer for agentic commerce.
This post is based on a 7-part LinkedIn series on Agentic Engineering. If you’re interested in how we’re applying these principles to our product architecture, follow Greg on LinkedIn for the upcoming series on Agentic Architecture.