Using AI in 2026: In Your Repos, and In Your Product

Table of contents

Introduction

“Using AI” in a 2026 codebase means two completely different things, and conflating them is the source of a lot of confused architecture conversations.

  1. AI as the tool you build with — a coding agent (Claude Code, in my case) that reads your repo, writes code, runs your tests, opens PRs. This changes your workflow.
  2. AI as a feature in the product — a chat panel, an assistant, a “generate this for me” button that calls a model at runtime on behalf of your users. This changes your architecture.

The first is about making yourself faster. The second is about shipping a capability. They share a vocabulary (agents, tools, prompts) and almost nothing else. This article is the condensed version of both, built on the same scaffolding as the last few posts: the frontend template, the full-stack monorepo, and a pair of standalone dashboard-frontend / dashboard-backend templates I’ll wire an assistant into at the end.

I’ll keep it portable, same as the testing article: the patterns are what matter, so I’ll use the boring stack everyone can run — Claude Code for the agent half, the Vercel AI SDK for the product half — and point at a production gRPC AI service for the “this is what it looks like at scale” notes.


Part 1 — AI working in your repo

The CLAUDE.md is the highest-leverage file you own

A coding agent is only as good as the context it starts with. The single best investment is a CLAUDE.md at the repo root: the operating contract for anyone — human or agent — touching the code. The rule that keeps it useful: write down the decisions and the gotchas, not what the code already says.

The test for a good line: would a competent new contributor get it wrong without this sentence? If yes, it earns its place. If the code makes it obvious, cut it.

Give the agent the gate, then let it close its own loop

The biggest behavioral upgrade is mechanical: make sure the agent can run your checks itself. Typecheck, tests, lint, build — one command each, documented in the CLAUDE.md. An agent that can run pnpm typecheck && pnpm test after it edits will catch its own mistakes before you ever see them. An agent that can’t is guessing, and you become its compiler.

This is also the honesty mechanism. “I added the feature” means nothing; “I added the feature and the 31 tests pass, here’s the output” is a claim you can trust. Insist on the second.

What to do, what not to do

A short list earned the hard way:

What an agent actually is

Past the buzzword: an agent (subagent) is a fresh context window with its own system prompt and its own tool allowlist, spawned to do one scoped job, that returns a result without polluting your main conversation. That’s the whole idea. Three properties make it worth using:

Where agents live: in the repo, not in your home dir

Claude Code resolves agents from both your global ~/.claude/agents/ and the project’s .claude/agents/. For anything that encodes a repo’s conventions, put it in the repo. It gets versioned, shared with the team, travels with a clone, and — crucially — it can reference the actual structure (“copy src/modules/projects/ as the reference”). A global agent can only be generic, which defeats the point. Reserve global for personal, cross-project helpers.

The dashboard templates ship convention-aware agents in their .claude/agents/:

dashboard-frontend/.claude/agents/
  scaffold-domain.md     # create a new api/<domain>/ (Zod types + raw fns + SWR hooks)
  scaffold-route.md      # add a file-based route + route-local components + nav
  frontend-reviewer.md   # review a diff against the architecture rules, run the gates
  api-contract-sync.md   # reconcile the frontend Zod schemas with the backend contract

dashboard-backend/.claude/agents/
  scaffold-module.md     # create a new modules/<domain>/ (schemas → service → routes)
  backend-reviewer.md    # review with security/owner-scoping first, run the gates

Each is the same shape as the production agents in our other services: YAML frontmatter (name, description, tools), the reference files to read, a step-by-step process, a verify step, and hard rules. The format matters less than the discipline: a recurring multi-step task becomes an agent so it’s done the same way every time.

What not to do with agents

What an MCP actually is

MCP — Model Context Protocol — is a standard way to hand a model tools and data from an external system. A provider (Figma, Linear, GitHub, your own database) runs an MCP server that exposes a set of typed tools and resources; the agent connects to it and calls those tools exactly like its built-in ones. The win is decoupling: the tool provider and the agent evolve independently, and any MCP-speaking agent can use any MCP server.

Connecting one is a line:

claude mcp add --transport http figma https://…/mcp
# then, in a session, the agent can call the Figma tools directly

The three you’ll actually use

gh CLI vs. GitHub MCP — when to use which

This is the question people actually have, and the answer is a rule of thumb, not a religion:

Can you write the exact command?           →  use `gh` via Bash.
Does the model need to navigate/decide      →  use the GitHub MCP.
  across many operations or read-heavy
  structured queries?

My default: gh for the known path, MCP when I want the model to figure out the path. Don’t add an MCP server for something a single gh command does — you’re paying context for capability you didn’t need.

What not to do with MCPs


Part 2 — AI built into your product

Now flip it. You want a feature: an assistant in the dashboard that can answer “how many active projects do I have?” and “create a project called Q3 Launch.” That model call happens at runtime, for your users, and it needs a home in your architecture.

Start as a module; graduate to a service

The honest answer for most apps: start as a module in the backend you already have. Our dashboard-backend groups one folder per domain (schemasserviceroutes); an assistant is just another domain, modules/ai/. Don’t stand up a separate microservice on day one.

You graduate to a dedicated AI service when the pressures show up: provider keys and prompt logic you want isolated, per-tenant cost/usage tracking, heavy streaming, multiple consumers, or a model-ops surface (evals, prompt versioning) that doesn’t belong in your CRUD backend. Our production ai-service is exactly that — a stateless gateway in front of every AI feature. The structure is the same either way; only the deployment boundary moves.

small:   [ dashboard-backend ]──modules/ai/──▶ model provider
                 │ projects/ dashboard/ auth/

at scale: [ backend ] ──HTTP──▶ [ ai-service ] ──▶ model provider(s)
                                   stateless gateway: prompts, tools,
                                   streaming, usage tracking, evals

The five layers of an AI feature

Whether it’s a module or a service, the anatomy is the same — and it rhymes with the BFF layering from the rest of this series:

transport     route/handler: validate input, open a stream                (routes.ts)
orchestration the "Logic": build the system prompt, run the model loop     (ai.service.ts)
tools         the bridge to your domain — each tool calls your services    (tools)
provider      one abstraction over the model(s), injected, never imported  (AI SDK)
persistence   usage/cost records, optional history                         (db)

A few principles carry almost all the weight:

Stateless, like the BFF. The client sends the full message history each request; the server holds no session. This is the same statelessness decision from article 112, and it’s why the AI tier scales horizontally and tests cleanly.

One provider abstraction, injected. Business logic never imports @ai-sdk/anthropic directly. A registry creates models; the Logic receives a LanguageModel. Swapping Claude for another model, or routing different features to different models, becomes config — not a refactor. (Our service does createProviderRegistry({ openai, anthropic }) and injects registry.languageModel("anthropic:…") into every Logic class.)

Tools are the entire bridge to your app. This is the part people underestimate. The model doesn’t touch your database — it calls tools, and each tool’s execute calls a function you already have. The assistant’s power is exactly the set of tools you give it, no more.

Wiring it into the templates

Here’s the assistant, concretely, as a module in dashboard-backend. The orchestration:

// src/modules/ai/ai.service.ts
import { anthropic } from "@ai-sdk/anthropic";
import { stepCountIs, streamText, tool } from "ai";
import { z } from "zod";
import * as projects from "../projects/projects.service.js";

export function streamAssistant(opts: { ownerId: string; messages: ModelMessage[] }) {
  return streamText({
    model: anthropic("claude-..."),       // inject via a registry in real code
    system: buildSystemPrompt(),           // inline, template-literal (see below)
    messages: opts.messages,               // full history — stateless
    stopWhen: stepCountIs(8),              // bound the tool loop
    tools: {
      list_projects: tool({
        description: "List the signed-in user's projects, optionally by status.",
        inputSchema: z.object({
          status: z.enum(["active", "paused", "archived"]).optional(),
        }),
        // ownerId comes from the SESSION, never from the model. This is the line.
        execute: ({ status }) =>
          projects.listProjects(opts.ownerId, { status, page: 1, pageSize: 20 }),
      }),
      create_project: tool({
        description: "Create a new project for the signed-in user.",
        inputSchema: z.object({ name: z.string(), status: z.enum(["active", "paused"]).optional() }),
        needsApproval: true,               // destructive-ish → confirm before running
        execute: (input) => projects.createProject(opts.ownerId, input),
      }),
    },
  });
}

The route is a thin handler that streams, exactly like every other route in the template:

// src/modules/ai/ai.routes.ts
export const aiRoutes = new Hono<{ Variables: AuthVariables }>()
  .use(requireAuth)
  .post("/chat", zValidator("json", chatInput), (c) => {
    const result = streamAssistant({
      ownerId: c.get("user").id,           // the verified token, not the body
      messages: c.req.valid("json").messages,
    });
    return result.toUIMessageStreamResponse();   // Server-Sent Events
  });

And the frontend is a new api/ai/ domain plus a chat panel. Because the AI SDK ships a React binding, the streaming plumbing is one hook:

// a chat route in dashboard-frontend
const { messages, sendMessage, status } = useChat({ api: "/api/ai/chat" });
// render messages + tool calls; on a create_project tool result, revalidate the
// projects SWR cache so the table updates live — server state still has one owner.

Notice what didn’t change: auth still rides the same cookie, the request still goes through the proxy to one origin, and tool results that mutate data flow back into the same SWR cache that owns server state. The AI feature slots into the architecture instead of fighting it.

The security line you cannot cross

Re-read the execute closures above. The tools take an ownerId from the verified session, captured in the closure — the model supplies a status, never an owner. This is the single most important rule in product AI, and it’s easy to get wrong because the model is so helpful it’ll happily pass you an advertiserId it inferred from the conversation.

We shipped a fix for exactly this class of bug: “use the server-trusted advertiser for read tools, not the model’s guess.” The model’s job is to choose the action; your server’s job is to choose whose data. The backend is still the security boundary — an AI tool is just another caller, and owner-scoping applies to it the same as to a REST handler. Let the model pick the tenant and you’ve built a confused-deputy data leak with a friendly chat UI.

Lessons from a production AI service

Things that aren’t obvious until they bite, condensed from running one for real:

Know what a wire schema can actually enforce. Tool inputSchema constraints are enforced by the provider’s structured-output decoder — but providers accept different subsets, and overstepping fails the whole call closed. Enums and number bounds (z.enum, .min/.max/.int) are hard-enforced and free — use them. String length and array size are not — one provider silently tolerates them, another rejects the entire request. Ask for “3–5 items” / “≤160 chars” in the prompt, then clamp in JS after the model returns. We took a production outage learning this.

Stream typed chunks, not a blob. A real assistant stream isn’t just text deltas. It’s a discriminated union: text, tool-call, tool-result, canvas operations (structured “create/update node” events the UI applies live), approval requests (for destructive tools), and a finish chunk carrying token usage. Design the chunk types up front; the UI subscribes to the kinds it cares about.

Approval flow for destructive tools. Mark a tool needsApproval: true; the server emits an approval-request chunk instead of executing; the client shows a confirm UI and replies; the server then runs it. “Delete all archived projects” should never fire on a model’s say-so alone.

Track usage at the boundary or you’ll fly blind on cost. Wrap the model so every call records tokens against the tenant/feature. Retrofitting this after you have a bill is miserable; bake it in.

Prompts live inline in code, on purpose. Build system prompts with template literals in the Logic, versioned with the code — editing a prompt is a code change + deploy + the same review as any other logic. We piloted a DB-backed prompt registry and tore it out; “edit prod prompts live” sounds great until an untested prompt change silently degrades every response with no diff to point at.

What not to do (product AI)

Takeaways