Nobody Is Stopping Me: What I Heard After Presenting AI Governance to Sitecore Teams

Akshay Sura - Partner

31 Mar 2026

Share on social media

Over the past few weeks, I have presented "AI Is Rewriting Sitecore. Are We Ready?" to various user groups.

Brazil, Hyderabad, and an in-person lunch-and-learn in Australia.

Different regions. Different audiences. Different levels of experience.

I expected different questions.

What I got instead was the same patterns, the same behaviors, and the same blind spots. In every room.

But the conversations that followed the slides mattered more than the presentation itself. Because practitioners started sharing stories. Not hypotheticals. Things that already happened.

This is what they told me.

The pattern I did not expect

No matter where the conversation happened, the reactions were consistent.

Not "AI is scary." Not "AI is amazing."

Something more subtle.

People are already using it everywhere. They just are not talking about how.

Developers are prompting instead of designing. Marketers are generating instead of structuring. Data teams are integrating systems without traditional engineering oversight. And the justification is always the same.

It works.

One customer I work with told me, very casually, that his team was building an integration between Shopify and their ERP. One developer. The rest of the team were people who spent their days in Excel spreadsheets. All using AI. When I asked what would happen six months from now if something broke, the answer was, "It does not bother me because it works and no one is stopping me."

That phrase came up in every single room. No one is stopping me.

That is not a technology problem. That is a governance vacuum.

AI lowered the cost of building. Not the cost of maintaining.

This came up multiple times, and it is worth stating clearly.

AI absolutely reduces the time it takes to build something. But it does nothing to reduce the time it takes to understand it later. If anything, it makes it worse.

At the lunch-and-learn, an architect made an observation that has stuck with me ever since. He pointed out that we used to have an evolution of a component. You could look at the git history and see how the code changed over time. Each commit told a story. You could trace decisions and backtrack if something went wrong.

AI destroys that. It does not make incremental changes. It replaces the entire thing. Every iteration is new. There is no continuity between one version and the next. You cannot look at the diff and understand what changed because everything changed.

2,000 lines of code and serialized items appear overnight in your Sitecore CLI or Unicorn repository. No trail. No progression from simple to complex. Just a block of generated output that appeared fully formed.

Six months later, when something breaks during a publishing operation, you open the file and find nothing to read except the code itself. No history of how the content tree was structured or why. The code is not necessarily bad. It is unknowable.

When AI gets stuck, so do you

In Hyderabad, one of the attendees shared a painfully relatable debugging story.

She had a bug. Stale media content is being served from the edge delivery CDN. The Vercel layer showed the correct image, but the edge content was wrong. She started feeding logs to the AI. It suggested regenerating environment credentials. She pushed back. It gave another suggestion. Then another. Twenty or thirty options over the course of a full day. None of them was right.

The next morning, she woke up, thought about it manually for a few minutes, and found the answer. Two images had the same name. The system was picking the first one published.

Five-minute diagnosis for a human who steps back and thinks about the system. A full day wasted with AI because the AI does not understand the system. It is pattern matching against symptoms. And when the answer is simple but contextual, it spirals.

I have been there. I spent a 17-hour day once, hyper-focused on one thing with Claude, going back and forth. Code changes. Recompile. Test. The AI kept attacking the problem from different angles. There is a lot of recursion when you do AI coding. You get stuck in a loop together and do not realize it until you step out.

The next morning, I solved it in five minutes.

The double-edged sword is real. On another occasion, I had a HubSpot production integration break in a React project. No React developers available. I am not a React developer. Will never be one. The AI reviewed the entire codebase in about 25 minutes and found the fix. One line change. I would not have found that without it.

Sometimes it saves you. Sometimes it burns an entire day generating plausible wrong answers while you chase them. The problem is that you do not realize which one you are in until it is too late.

AI confidence is the actual risk

At the lunch-and-learn, someone shared a story that stopped the room.

His team was migrating a system with 10,000 items. Every item had an image. The customer wanted AI to generate alt tags for all of them. Classic use case. You do not want a human to look at 10,000 images and write descriptions by hand.

He set it up. Tested a few images. The results were perfect. Ran it on the entire database.

Then he spot checked.

Some were great. Some were completely wrong. Not close. Not even on the right topic. One image was a photograph of a river. The AI described it as a woman in black, wearing a white blouse and sunglasses, on a table.

He went back and asked why. The AI said it could not read the image.

The prompt had explicit guardrails. Do not hallucinate. If you cannot read the image, return blank. Do not make anything up. Return null, and we will handle those by hand.

It did not follow those instructions. It generated confident, fluent, completely fabricated descriptions instead of admitting it could not see what it was looking at.

Now he has 10,000 items and cannot trust any of them. Because the ones that are wrong look exactly like the ones that are right. There is no signal. No flag. Just polished, confident output that happens to be fiction.

That is the core problem with AI validation. The failures do not look like failures. They look like success.

The real bottleneck is not generation. It is a review.

AI can generate 50 content variants, 12 regional pages, and thousands of lines of code in minutes. But your review process did not change.

Output grows. Review does not. Something has to give. Either you slow down output, or you lower review standards. Most teams are choosing the second without realizing it.

I saw this play out in a specific way with a customer. Their marketing team was generating landing pages for twelve regional markets using Sitecore AI content agents. Three of them referenced a product feature not available in those regions. Nobody caught it because the output looked polished. The pages cleared workflow, passed through publishing, and went live. The content was fluent, clean, and correct.

And that was just twelve pages. Imagine the same team generating hundreds of variations, with Sitecore Personalize running AI-suggested rules on top of it, pushing content through publishing workflows that nobody reviewed.

The false sense of validation

This one is more subtle, but more concerning.

The same architect from the lunch-and-learn made a point I have not been able to shake.

He said unit tests only have value if you write them first.

If you write the test after the code, the test has no value. It is not there to exercise the code. It is there to reflect the business rule. You do not need a unit test to prove the computer can add one and one. You need a unit test to prove that if you put one in this field and one in that field, the answer should be three.

When AI generates the code and then generates the tests for that code, every test passes. Green across the board. And it creates a false sense of security that is worse than having no tests at all. Someone looks at the results, sees green, and says we are covered. But the tests are not verifying business rules. They are verifying that the code does what it is supposed to do. Circular. Not validation.

He proposed something worth thinking about. Have AI write business rules from the requirements. Then have AI write unit tests from those rules. Then have AI summarize the tests back into business rules. If the round trip does not meet the original requirements, there is a gap.

Like translating from one language to another and back. If the meaning survived, you probably have a reasonable representation. Nobody does this today. But it is the kind of thinking that separates teams who are deliberate from teams who are just fast.

The Replit escalation

This story captures the governance problem better than any framework.

A customer called and asked us to decommission a site. They had built a replacement using Replit. I looked at it. It was for their charity wing. A brochure site for a program that gives to schools. Simple. Low risk. I thought, fair enough.

A couple of weeks later, the same customer. Can you decommission another site?

This time it was different. This site had been built over two and a half years. It was the result of conversations with 35 brands inside the company. It coordinated with the sticker manufacturer for serial numbers. It had QR codes printed on millions of physical products that had already shipped and were sitting on shelves.

They replaced it with something built in Replit.

The new version kind of worked. Not completely. It was calling their backend APIs. It was hitting their Elastic indexes. It was bypassing every architectural decision that had been made about how their content tree, their data layer, and their publishing pipeline were supposed to work. I do not know if they used a readonly key.

When I pushed back, the answer was the same. No one is stopping me. I am just being asked to produce.

The first request was reasonable. The second one was reckless. And no one within that organization flagged the difference. That is what governance erosion looks like. It is not dramatic. It is incremental. Each step feels small. Until you are standing in a place where millions of physical products point to a site built by marketers in a weekend.

Who should set the guardrails?

In Brazil, someone asked whether Sitecore should be doing more to provide governance frameworks for customers.

My answer was no. And I feel strongly about it.

Sitecore should provide the tools. And they are. Sitecore AI, Agentic Studio, the marketplace, the APIs. That is a massive step forward from where Stream was a year and a half ago.

But governance has to come from the individual organization. Every company operates differently. Every company has different risk tolerances, compliance requirements, and team structures. It is not right for a platform vendor to tell a pharma company, a retail brand, and a nonprofit to follow the same guidelines.

That is the company's job. Legal, the C suite, and engineering leadership. Some are doing it. One of our pharma customers requires every contractor and employee to complete four hours of responsible AI training. They label documentation as public or restricted. They define what can go into a public AI prompt and what cannot. They still require pen testing, intrusion testing, and code coverage thresholds on every build.

That is what governance maturity looks like. It does not come from the CMS vendor. It comes from the people using the CMS.

The role shift nobody is talking about

Another theme that surfaced in every room.

The job is changing.

Developers are not just building anymore. They are reviewing and validating what AI generates. Marketers are not just creating content. They are filtering and approving AI output. The role is shifting from creator to reviewer. Builder to validator.

That is not a bad thing. But it requires a different kind of discipline. And that discipline is not yet fully formed.

One of the attendees in Brazil raised a point that hit hard. He said organizations are using AI as an excuse to not value experience. Architects with over a decade of experience are struggling to find work because companies are asking whether they even need senior people anymore. Whether they even need agencies. Whether three developers with AI tools can replace a team of fifteen.

The short-term math might look like it works. The six-month math almost certainly does not. Because the people you need most are not the ones who can generate. They are the ones who can validate, review, and catch mistakes the AI will not catch on its own.

What three rooms taught me

The published post was the thesis. Unsupervised AI is the risk. Governance matters. Own what you ship.

The conversations taught me something the slides did not capture.

The problems are not theoretical. They are happening right now. Alt tags that hallucinate confidently. Production sites replaced by Replit builds. Debugging sessions where the AI leads you in circles for a day. Codebases with no history. Unit tests that validate nothing. Marketing teams are deploying without engineering awareness. And in every room, the same quiet recognition.

We know this is a problem. We just have not figured out how to talk about it yet.

I think that is the real value of presenting this in person. The slides give people a framework. The conversations give people permission to say what they have been experiencing.

Every team I talked to is experimenting with AI. That is normal. Most do not yet have formal governance. Also normal. But the gap between adoption and oversight is growing faster than most people realize.

The goal is still the same. Not perfection. Guardrails.

But now I have a much clearer picture of what the absence of guardrails actually looks like in practice.

It looks like a picture of a river described as a woman in sunglasses.

It looks like millions of QR codes are pointing to a site that kind of works.

It looks like a developer and an AI stuck in a loop at 2 am, both equally confused.

Those are not hypotheticals. Those are from last month.