Vibecoded MVP stopped shipping? Manage your AI like a junior

AI vibecoding

The same story keeps surfacing in founder communities — the tech press has already named it the vibe-coding hangover — and it always reaches the same point. The MVP exists. It has real users, sometimes real revenue. The first months were the best building experience of the founder's life — describe a feature in Cursor, Claude Code, Lovable, or Bolt, and get a feature. And then, months in, shipping slows to a stop. Not all at once. Releases get scarier, fixes take longer, and changes that used to take an afternoon start taking weeks.

The usual conclusion at this point is that AI coding was a trap, or that the codebase is garbage and needs a rewrite. I don't think either is true. What I see in these codebases is something I'd been fixing for years before an LLM ever wrote a line of production code: the output of fast, unsupervised junior work. The industry has known how to handle that for decades — and inside a typical vibecoded MVP, almost none of that knowledge has been applied. That's the good news: your situation is well understood, and so are the fixes.

The shape of the stall

The pattern repeats so consistently that I can list the symptoms without seeing the codebase:

You add one feature and three others break. Finding out which three takes longer than the feature did.
There are parts of the AI-generated codebase nobody understands — including you, and including the AI that wrote them. Touching those parts is a gamble.
The AI keeps rewriting the same code differently. Each fix undoes a previous fix. You're paying for tokens to go in circles.
The demo flow works. Then a real user does something slightly unusual, and production breaks at night — and the first you hear of it is a user complaint.

All four symptoms come from the same place: nothing in the codebase or the workflow is holding earlier decisions in place, so every new change re-rolls the dice on everything before it. Research is starting to document the same trade-off: a recent study of vibe coding in practice describes seamless generation paid for later in architectural inconsistency, security gaps, and maintenance overhead.

The fastest junior you've ever hired

Here's the reframe that makes the whole situation tractable: treat the coding agent as a teammate with a very specific profile. It is the fastest junior developer you have ever worked with — it produces a feature in minutes, never pushes back, never gets tired — and it has zero memory of why last month's decisions were made, no fear of touching anything, and no ownership of the outcome. It will confidently hand you code that looks finished, because looking finished is what it optimizes for.

Software teams have employed exactly this person for as long as the industry has existed — every practice built around junior developers exists because of this profile. Experienced engineers have largely converged on this framing; what hasn't reached founders is the half that matters — the management system that comes with it.

I say this as someone who builds with AI agents every day — my own product work is vibecoded, and I have no intention of stopping. Every change my agents produce passes linters, type checks, and tests in CI before it lands. The tools differ by stack — mypy and pytest on Python, tsc and ESLint on Node, PHPStan and PHPUnit on PHP — the idea stays the same. The speed is real. So is the profile.

I've cleaned up after the human version of this several times. At one EdTech startup where I worked as a fractional CTO, production could be down for hours before anyone found out: nothing was watching. No alerts, no error tracking. Functional bugs sat in the product with nobody paying attention to them. The code had been written quickly, by people with more speed than experience, and the biggest gaps sat around it: nobody reviewed changes, nothing tested the critical paths, nothing watched production, and a release meant a developer logging into the production server and changing things by hand.

We put monitoring in place, added tests, code review, and a QA step, replaced hand-edits on the server with a release pipeline, and rewrote some parts of the code along the way. The outages stopped; production stayed stable from then on. Over the engagement, downtime fell more than tenfold, delivery — time-to-market and throughput — more than doubled, running costs dropped 1.5×, and the team's eNPS went from −100% to +100%. Alongside the tooling I sat down with the developer one-on-one — his goals, how to reach them — because managing a junior includes growing one. None of this is exotic; it's the standard kit, applied late.

AI didn't exist for that startup, and the failure is identical to the one inside an AI-built MVP. AI just reaches it faster.

What experienced teams do about juniors — and the AI equivalent of each

The practices that turn junior output into a working business are boringly well known. Every one of them has a direct AI-era equivalent, and a vibecoded MVP is usually missing all of them at once.

How teams handle juniors	The AI equivalent	What it stops
An onboarding doc: how we structure code here, what we never do	A conventions file the agent reads on every task (AGENTS.md / CLAUDE.md)	The agent re-inventing architecture from scratch on every request
Code review before anything merges	A human or senior agent reviews every AI change; linters and type checkers run as a hard gate	“Looks finished” code silently breaking features that worked
A test suite that must stay green	Tests on the critical paths, run automatically; a red suite blocks the merge	The fix that breaks two other things; the regression you find in production
CI that won't let bad changes through	The same pipeline, applied to agent output: lint, types, security scan, build	Hardcoded credentials, unvalidated input, dependencies nobody chose on purpose
Monitoring, so problems surface before customers do	Error tracking and alerts wired to the deployed product	Production down for hours while everyone assumes it's fine
Small, scoped tasks — a junior never gets “rebuild billing” as one ticket	One change per request, sized so a human can actually review the diff	The 4,000-line diff nobody can check, which is where the worst surprises live
Regular one-on-ones and feedback, so the junior grows instead of repeating mistakes	What review catches gets folded back into the conventions file, so the agent stops making that mistake	Paying for the same correction, in tokens and review time, every single week

The left column and the right column are the same list. That's the entire argument of this essay: the discipline for managing AI agents already exists, fully road-tested, and is waiting to be applied.

The security row carries a number worth knowing: in Veracode's 2025 GenAI Code Security Report, 45% of AI-generated code samples — across more than 100 models — failed security checks. The gate exists because the junior ships vulnerabilities at scale.

Better prompting is missing from the list on purpose. A senior fixes the environment around a junior so that careless work physically can't reach production — however the ticket is phrased.

The fixes you already tried, and why they didn't hold

By the time a founder starts looking for outside help, they've usually been through two or three rescue attempts. The attempts fail in predictable ways, and the junior lens explains every one of them.

Switching AI tools. Swapping Cursor for Windsurf or Copilot replaces one junior with another junior. The new one may be smarter; it still has no memory, no fear, and no review waiting for its output. The same patterns return within weeks, because they were never about which model wrote the code.

Better prompts and rules files. Closer — this is the onboarding doc, and it's a real layer. But an instruction the pipeline doesn't enforce is a suggestion. A junior nods at the style guide and then ships whatever they ship; what makes the guide real is the review and the gates behind it.

The refactor sprint. “Let's stop features for two weeks and clean up.” Refactoring without tests is how a junior turns old bugs into new bugs — the codebase ends up differently broken, and you've paid two weeks of standstill for it. Cleanup comes after the net that catches what cleanup breaks.

Hiring a senior developer. The right instinct, often at the wrong moment. Drop a senior into a codebase with no tests, no review history, and no monitoring, and their first months disappear into archaeology — rediscovering, by hand, facts that tooling should be stating outright. Some leave. Build the safety net first and the same hire becomes effective in week one — if you still need them at all.

Restarting delivery

The order of operations matters more than any individual practice.

Diagnose first. Which missing layer is actually stopping shipping now? If production breaks silently, monitoring comes first. If every change causes regressions, tests on the two or three critical user paths come first. A vibecoded MVP can't absorb six new practices in a week; start with the one that hurts most, then add the next.
Build the safety net. Monitoring and error tracking, a CI pipeline with real gates, tests on the paths the business dies without, review on every change — human or agent, the rule is the same: nothing merges unseen. This phase is measured in weeks — a couple of months at the outside — and it's where shipping quietly restarts, because changes stop being frightening.
Only then scale. Architecture cleanup, performance tuning, hiring decisions — the work of scaling AI-generated software. All of it is cheap once the net exists and reckless while it doesn't.

This is the same sequence I've run on junior-built codebases for years, and it's exactly what I offer for vibecoded products — the longer version, with what each phase costs and what it returns, is on the AI vibecoding cleanup page. And if you want to measure where your delivery actually stalls instead of guessing — which tasks stick, and at which stage — I wrote a separate playbook on diagnosing a delivery system with metrics; it applies to a two-person AI-assisted team exactly as it does to a forty-engineer org.

FAQ

Do I need to rewrite the MVP from scratch?

Almost certainly not. Junior-built codebases have been rescued without starting over for decades, and the method transfers directly: stabilize, add the safety net, then rewrite the worst parts piece by piece — with the net there to catch what the rewrite breaks. A rewrite throws away the one asset you actually have — working code that users already exercise daily — and restarts the same process with the same missing practices.

Is this just technical debt?

It's a more specific thing. Technical debt implies someone understood the trade-off and chose speed. An AI-generated codebase contains decisions nobody made consciously and nobody can explain — the debt sits in comprehension. That's why the fix starts with making the codebase observable and testable; cleanup comes after.

Can I keep building with AI while this gets fixed?

Yes — that's rather the point. The goal is to put review, tests, and gates around the junior while the keyboard stays where it is. You keep shipping with AI throughout; the agent's output simply starts passing through the same checks a human's would. I build with AI daily inside exactly this kind of setup.

Can AI fix the codebase it broke?

Partially, and only inside guardrails. Agents are genuinely good at the maintenance work an AI-generated codebase needs — writing tests, adding error handling, mechanical refactors — once a pipeline checks their output. Asking an agent to “clean up the codebase” with no tests and no gates is asking the same junior to mark their own homework.