Building on something that changes every week

The thing nobody says about building AI products right now is that the substrate keeps moving. Not slowly, not predictably. It lurches. A model that was the obvious choice in April is mediocre or deprecated by July. The tooling that shipped as a stable release three months ago has a breaking change in the README. You push code against Claude Sonnet, then Gemini 2.5 Flash starts beating it on your actual workload, then DeepSeek ships something that undercuts both on cost and you have to re-evaluate again. That’s not a quarterly cycle. That’s every few weeks.

I’ve been building agent tooling on top of this for the better part of the year, and the clearest lesson so far is: don’t marry the model.

the abstraction you need

The discipline that actually saves you is provider abstraction from the start. Not as a future-proofing box to check. As a survival mechanism. You write one interface. Behind it you wire Claude, GPT-4.1, Grok 3, whatever OpenRouter exposes this week. When something shifts, pricing, quality on your specific task, context window, rate limits, you rotate the backend without touching the product logic.

This sounds obvious. In practice, almost nobody does it until they’ve been burned. You start with one provider because that’s what the tutorial uses, the code works, and adding an abstraction layer feels like over-engineering for a side project. Then the provider raises prices 40%, or drops a model you depended on, or just gets slower, and now you’re refactoring load-bearing code under a live system.

The abstraction doesn’t have to be clever. A thin wrapper that normalizes the call signature, maps to the right SDK, and lets you swap a config key is enough. The goal is boring. Boring survives.

What doesn’t survive is the opposite approach: vibe-coding something that works once in a notebook, treating it as done, and assuming it’ll hold. That was fine when models were slow to change and you were just prototyping. It isn’t fine now. The notebook demo starts rotting the day you don’t touch it, because the world around it has moved.

what actually stays stable

Here’s the thing about the chaos: most of the hard problems aren’t in the model. Routing, fallback logic, caching, tool validation, retry handling, rate limit management: none of that changes with the model. The mechanics of building reliable systems haven’t changed because inference got fast. Bad async code is still bad. Untested tool execution is still a liability. Shipping without logging is still how you get surprised at 2am.

The boring patterns hold. The model is a dependency, not the architecture.

MCP helped here: having a standard protocol for tools means agent tooling doesn’t have to be fully bespoke anymore. You write a tool once, expose it over MCP, and the model abstraction handles the rest. That’s a real improvement. It doesn’t fix the underlying churn problem, but it raises the floor on interop.

What I try to do now: keep the model-specific surface area as small as possible, and invest the real engineering effort in the layer that doesn’t move. Prompt logic, structured output validation, tool contracts: those are where the actual work lives. The model behind the call is a runtime detail.

Cursor, Claude Code, Codex: these things are useful, and I use them. But using them well is different from letting them drive architecture. An AI pair-programmer that generates a 400-line implementation in one shot is still generating something you need to understand and maintain. The code is yours. The system is yours. “The model wrote it” doesn’t survive the first production incident.

There’s a version of this that goes: just vibe-code everything and swap models as better ones ship, because nothing is permanent anyway. I’ve tried that version. It works until it doesn’t, and the blast radius when it doesn’t is proportional to how much load-bearing code you let the vibes touch.

The actual approach that works for me is mundane: use the fast AI tooling for the repetitive scaffolding where mistakes are cheap, write the core logic like it’s 2019, and keep the model layer shallow and swappable. Assume the thing underneath will be different next month, because it will be. Build accordingly and don’t make it interesting.

The chaos is real. The response to it doesn’t have to be.