Services · AI

AI Development

Production AI features built the way the studio builds everything else — scoped honestly, evaluated before launch, and engineered to keep working when the model underneath it changes.

Category

Starts with

A scoping call

Status

Booking 2026

(01) Our take

Every week we meet a team that has an AI demo working on a good day and needs it working on every day. The gap between those two is the entire content of our AI practice. The demo works because a specific prompt, with a specific input shape, returns a specific output. Production is different: real users asking things the prompt didn’t anticipate, latency budgets the proof-of-concept never had to respect, error states that silently return bad content rather than loudly returning nothing, and a model provider that patches the behaviour underneath you every few months.

We build AI features that survive that transition. Our default architecture is Claude-first, because the quality at reasoning tasks and the tool-use behaviour are currently the best in the field, but we design every feature with a multi-provider abstraction so the provider choice is a config change rather than a rewrite. The Vercel AI SDK is our standard abstraction layer; when the shape of the feature needs more than it provides, we build directly on the provider APIs with an adapter we can swap.

For retrieval-augmented features, our default stack is Postgres with pgvector. Most of our clients already run Postgres, and adding pgvector avoids standing up a second database for an early-stage feature that may or may not earn its keep. When the scale genuinely warrants it, we’ll move to a specialised vector store, but we benchmark first rather than assuming. Agentic workflows get a different treatment: tool definitions designed like APIs, agent loops that can be reasoned about, and logging detailed enough to debug a failed run without rerunning it.

The part of the work that ages best is the eval infrastructure. Before we ship an AI feature, we ship the eval harness that will tell us whether the next prompt change or model update made the feature better or worse. This isn’t optional, it’s not an upsell, and it’s not something we apologise for scoping in. “It got better” is not a measurement. We build the measurement.

We’re honest with clients about where AI currently shouldn’t be the answer. Agents acting autonomously on production data are a smaller category than the demos suggest. Natural language interfaces over databases are a worse experience than a good form, more often than not. Automated customer support that’s actually good requires more scaffolding than most teams assume. If your roadmap assumes one of these and the evidence isn’t there yet, we’ll say so — and we’ll help you scope down to what will actually ship.

(02) What we build

Typical work

RAG systems (semantic search, retrieval-augmented generation, hybrid search)
Agentic workflows with tool use, guardrails, and observable loops
Structured generation and extraction pipelines
Classification, summarisation, and content moderation features
Internal AI tools (engineering copilots, ops assistants, support co-pilots)

(03) Is this for you

When to pick this

You have an AI demo working and need it working in production, at scale, on budget.
You’re building an AI-native product and want engineering that treats evals and guardrails as non-negotiable.
You’re evaluating providers or architectures and want senior hands on the build.
You have a specific workflow where AI earns its keep and you want it built right the first time.

When not to pick this

You want to “add AI” to your product without a specific workflow in mind. That’s an AI/LLM Strategy engagement first.
You need a chatbot and your support volume doesn’t justify the infrastructure.
You’re hoping AI will solve a UX problem that a better form would fix faster.

(04) Engagement shape

How we engage

4–16 week engagements for a scoped AI feature, including eval harness, guardrails, and observability. Larger AI-native product builds run longer and phase into sustained work.

(05) What you walk away with

Deliverable

The headline artefact

A production AI feature — with the eval harness that tells you whether the next change made it better, and the observability to debug it when a user reports something weird.

Signature tools we reach for

ClaudeVercel AI SDKpgvectorPython

(06) Pairs with

Related services

Services we often run alongside AI Development, or that make sense as the next engagement after it.

Scope the work →back to all services

What We Do

Recent Work

Industries We Power

Boring Tech, On Purpose

Borrow Our Judgement