Claude Code, Committed: Making AI Prompts Part of Your Codebase and Your Audit Trail

Akshay Sura - Partner

30 Apr 2026

Share on social media

A few weeks ago I gave a talk called "AI Is Rewriting the CMS. Are We Ready?" The follow-up was titled "Nobody Is Stopping Me." That second title came out of a real moment. I was talking with a room of Sitecore practitioners about AI governance, and the most consistent reaction was not pushback. It was a shrug. People wanted to know what they should be doing, and most of them admitted that nobody at their company was actually stopping them from doing whatever they wanted with AI tools.

That is the gap I keep coming back to. Most organizations have an AI policy now. Very few have anything that looks like an audit trail. The policy lives in a PDF. The work lives in tools nobody is watching.

We decided to do something concrete about it on our side. This post is about a small piece of tooling we built for Claude Code, why we built it, and why I think it matters more than it looks.

The problem

When a developer uses Claude Code on a codebase, that interaction generates a lot of value. A prompt gets typed. Code gets generated. Decisions get made. The code ends up in git. The reasoning behind it does not.

Six months later somebody opens a file and asks: why does this component work this way? Git tells you who committed it and when. It does not tell you what was asked of the AI to produce it, or what the AI actually said back. That context is gone.

For a consultancy this is more than a curiosity. We have three goals that are hard to meet without it:

Auditability. When a client asks how we used AI on their project, we should have a real answer. Not a policy document. A log.

Memory. When somebody on the team needs to understand what was decided last week, or why we went with one approach over another, the prompts and responses are often more revealing than the final code.

Provenance. When a generated component starts behaving oddly, the fastest path to understanding it is reading the prompt that produced it.

None of those are exotic asks. They are the same things any mature engineering practice expects from its other tools. Compilers leave logs. CI systems leave logs. Code review leaves a record. AI-assisted development, in most shops, leaves nothing.

What we built

Claude Code has a feature called hooks. Hooks are shell commands that run at specific points in the agent's lifecycle: when a user submits a prompt, when a tool is called, when the agent finishes responding. They are deterministic. They run every time. The model cannot decide to skip them.

We wrote three hooks that capture what happens during a Claude Code session into a per-session log file inside the repo:

A hook that fires when you submit a prompt and writes the prompt text to a log
A hook that fires when Claude writes or edits a file and records the file path
A hook that fires when Claude finishes its turn and captures the response

The logs land in ai-history/YYYY-MM-DD/{session}__{user}__{host}.jsonl. One file per session. Each line is a JSON entry. The directory is committed to the repo. Pull request reviewers can see what prompts produced the diff. Six months from now, anyone can search for a component name and find the prompt that generated it.

That is the entire system. No database. No external service. No dashboard. Just files in your repo, in a format search already know how to find.

Three modes, because one size does not fit

The first version captured everything: prompts, file paths, file contents, responses. It worked. It was also overkill for most repos.

The thing we kept coming back to: file content is already in git. Logging the content of every file Claude writes duplicates what git already does, and the duplicate goes stale the moment somebody edits the file. Worse, it bloats the log to the point where searching it becomes annoying.

So we settled on three capture modes. The default is the middle one.

Mode	Prompt	File paths	File content	Response	Use when
`minimal`	yes	yes	no	no	Client repos where you want a paper trail of asks but not Claude's prose
`standard`	yes	yes	no	yes	Default. Internal repos and most client work
`full`	yes	yes	yes	yes	When you need to capture exactly what Claude wrote even if files later change

A single file controls it: .claude/ai-history.config.json. Change the mode, and the next prompt is captured at the new level. Every entry is stamped with the mode it was written under, so a repo that changes modes mid-project is still self-describing.

For a regulated client engagement we will often start with minimal. The auditability question is satisfied, every prompt against the codebase is logged, with user, host, timestamp, git branch, and git commit. The response text is not. For internal Konabos work we use standard, which adds Claude's responses to the log, because the back-and-forth is genuinely useful to revisit. We rarely use full because the file content question is already solved by git itself.

How it actually works

The whole thing is four files in .claude/ and one directory at the repo root. Here is the configuration that wires the hooks into Claude Code:

1{
2 "hooks": {
3   "UserPromptSubmit": [
4     { "hooks": [{ "type": "command", "command": "node .claude/hooks/log-prompt.mjs" }] }
5   ],
6   "PostToolUse": [
7     {
8       "matcher": "Write|Edit|MultiEdit",
9       "hooks": [{ "type": "command", "command": "node .claude/hooks/log-tool.mjs" }]
10     }
11   ],
12   "Stop": [
13     { "hooks": [{ "type": "command", "command": "node .claude/hooks/log-response.mjs" }] }
14   ]
15 }
16}

The mode lives in a separate config so it is easy to find and easy to change without touching the hook wiring:

1{
2 "mode": "standard"
3}

The hooks themselves are short. Here is the prompt logger, which is the simplest of the three:

1#!/usr/bin/env node
2import { readStdin, appendEntry, gitInfo, truncate, loadConfig, captureFlags } from "./_lib.mjs";
3
4(async () => {
5 const raw = await readStdin();
6 let input = {};
7 try { input = JSON.parse(raw); } catch { process.exit(0); }
8
9 const cwd = input.cwd || process.cwd();
10 const cfg = loadConfig(cwd);
11 const flags = captureFlags(cfg.mode);
12 if (!flags.prompt) process.exit(0);
13
14 const session = input.session_id || input.sessionId || "no-session";
15 const prompt = input.prompt || input.user_prompt || "";
16
17 appendEntry(session, {
18   type: "prompt",
19   session,
20   prompt: truncate(prompt, cfg.truncate),
21   git: gitInfo(cwd),
22   cwd,
23 }, cwd);
24
25 process.exit(0);
26})().catch(() => process.exit(0));

A few choices in here are worth calling out. The hook reads from standard input because that is how Claude Code passes event data. It captures git branch and commit at the moment of the prompt, which is the breadcrumb you actually want six months later. It catches every error and exits zero, because logging must never block the agent. If our hook crashes, your prompt still goes through.

The shared library handles three things that all the hooks need: loading the mode config, redacting secrets, and computing the per-session file path. Secret redaction matters because pasted content gets logged too, and the last thing you want in a committed file is somebody's API key. The redaction list catches Anthropic keys, GitHub tokens, JWTs, AWS keys, bearer tokens, connection-string passwords, and long hex strings. It is best-effort, not magic. The rule we tell ourselves is the same rule we already had: do not paste secrets into prompts.

What you actually do with it

The point of all this is the search workflow. Once the logs are in the repo, you can find the following info and more:

What prompts mentioned EpisodeList?
Show me everything from a specific session, in order
Pull just the prompts from a date
Find every session that touched a specific file

That is the whole interface. This makes it instant even at year-scale. No web UI, no service to maintain, no vendor to depend on.

The cross-developer angle matters here too. Different people on our team work from different machines. The session filename includes the user and the host, and every entry inside has the same fields. So a question like "who used Claude Code on this repo last quarter" is a search, not a meeting.

Why this fits the work we do

Konabos works on Sitecore, headless architecture, composable DXP. The ecosystem is moving fast. AI is now part of how content is produced, how code is written, and increasingly how runtime decisions get made inside a CMS. Our clients are asking about this. The serious ones want to know not just whether we use AI, but how we govern it.

A committed prompt log is a small thing. But it is a concrete answer to a question that most consultancies are still answering with hand-waving. It says: here is what we asked the AI, here is what it did, here is the trail. If you want to audit our work, here it is. If you want to understand a decision, the prompt is in the repo next to the code.

It also pulls AI governance out of the policy document and into the engineering discipline where it belongs. A PDF that says "we use AI responsibly" is not a control. A hook that runs on every prompt and writes to a file in the repo is a control. The difference matters.

There is one more thing this changes, which I did not expect when we started building it. When you know your prompts are being committed, you write better prompts. You think more carefully about what you are asking for. You leave more context in the prompt because you know future-you will read it. The log is not just a record; it nudges the practice in a healthier direction.

What we are not doing

A few things this is deliberately not.

It is not a dashboard. We have no plans to build a UI on top of the logs. The terminal and search are good enough.

It is not a policy enforcement tool. The hooks log; they do not block. If a developer is doing something they should not, the log catches it after the fact, not before. Policy enforcement is a different problem with different tools.

It is not free of judgment. We decide per engagement whether to install the kit and which mode to run. Some clients want everything captured. Some prefer minimal. Some regulated environments require a conversation with the client before we turn it on at all. The default is "ask before you turn it on, and document the answer in the repo's main README."

And it is not a substitute for the harder governance work, defining what AI is allowed to do on a project, what it is not, who reviews AI-assisted changes, how those reviews differ from human code review. The log is a foundation. The actual policy goes on top.

Where we are with this

The kit is running on our internal repos. We are rolling it into client engagements deliberately, project by project, with the conversation about mode happening as part of kickoff.

If you are running Claude Code on a codebase that anyone else is going to read, six months from now or six minutes from now, I would encourage you to set up something like this. The implementation is small. The discipline it creates is not.

The receipts are the point. Have them.