Posts

Top-8 Issues of AI-Driven Development and How to Keep Them Out of Production

Catherine Edelveis

20

AI coding is not the problem. Blind trust in AI coding is the problem.

And oh, how tempting it is to trust! The output is clean. It's confident. It even has comments. It looks like it was written by someone experienced who knew exactly what they were doing. But that is the trap, and plenty of developers have already burned themselves on it.

So let's go through the eight ways AI-generated code turns from a productivity boost into debugging debt, security exposure, and production chaos. And also, what to do about each one.

1. The Trust Gap: “Almost Right” Code is Worse than Wrong

The question stopped being can AI write code a long time ago. Of course it can. The question every developer should be asking is: can I trust this code?

Stack Overflow's 2025 survey found that 66% of developers are frustrated by AI solutions that are "almost right, but not quite."

Almost right is the dangerous part. Completely wrong code announces itself. It just won't compile.But what does “almost right” mean? In an enterprise system, "almost right" means

  • The happy path works but the edge cases fail;
  • The timeout handling on a perfectly valid API call does the wrong thing;
  • The concurrency code passes tests locally and then falls over the first time real traffic hits it.

But: the work moves on, and the bug just waits.

Which is why developers now spend their time on the unglamorous part: reading generated code line by line, reproducing the problem, checking whether a dependency actually exists, inspecting whether a generated test tests anything at all. Then asking the model to redo it, or redoing it themselves.

ITPro reports that 45% of developers lose time debugging AI-generated code, and a METR study of experienced open-source developers found that AI assistance actually made them slower, even though they felt faster.

That's the dark side of the trust gap: you feel quicker, but in reality, it is not like that.

The fix starts in the developer's head, before any tooling gets involved. That is: distrust AI code by default. However polished it looks, route it through real testing and vulnerability checks. And do the planning up front. Define your requirements, constraints, and goals before the model writes a line, in the spirit of spec-driven development. Hand it a detailed execution plan to follow rather than one bulk "go build it" prompt, and you keep the model inside boundaries you chose instead of cleaning up after the ones it invented.

2. Security Vulnerabilities in Generated Code

Buggy code is one problem. Insecure code that compiles, passes every test, and ships with a bunch of vulnerabilities, is a worse one.

Veracode's 2025 GenAI Code Security Report found that AI-generated code introduced security flaws in 45% of tests, and no major language came out clean. Java had a particularly rough showing, with over 70% of outputs in their task set landing as vulnerable. Weak defaults, unsafe input handling, poor authentication, SQL injection, insecure serialization, vulnerable dependencies… A little petshop of horrors, I would say.

But wait, it gets worse! A 2026 study on AI coding assistants found that developers relocated security from "write secure code" to "review generated code later." In observed sessions, participants didn’t put security requirements in their initial prompt, even when they knew the risks.

So the productivity gain becomes a vulnerability pipeline: generate fast, review later, miss something, patch under pressure, repeat. Run that loop long enough and one of the misses becomes a data breach.

Security has to be a design-time decision, taken even before the model writes anything. Put your security requirements into the prompt itself. The OWASP Secure Coding Practices checklist is a good starting catalog. Then run SAST, SCA, and secret scanning over whatever comes back, and require a human security review for anything sensitive. Build on a base you trust to receive timely security patches, preferable, hardened base images as they come with tighter security, built-in provenance, and low-to-zero CVEs by default. BellSoft Hardened Images are a solid, fully open-source choice for that.

The model doesn't know your threat model. It doesn’t make you less accountable.

3. Supply Chain Risk: Hallucinated Dependencies

AI does not only hallucinate code. It can also hallucinate dependencies. That quirk has spawned an entire attack class called slopsquatting.

The mechanics are simple. The model confidently suggests a plausible-sounding package that doesn't exist. An attacker registers that name and ships it with a malicious payload. In this scenario, the attacker doesn't have to break your code at all. They just wait for your toolchain to trust a name your AI made up. Trend Micro has a good write-up on how this plays out in practice.

The long game is poisoned automation. A coding agent generates the dependency, CI installs it, scanners don't catch the mismatch, and the malicious package gets normalized inside your build. After that, the attacker is helping themselves to credentials, environment variables, build secrets, and quite possibly your source and infrastructure.

Therefore, never install an AI-suggested dependency on faith. Before anything enters your tree, verify that the package exists, and also who maintains it, its provenance, release history, license, vulnerability status, and whether it belongs in your approved catalog. Lean on SCA, lockfiles, SBOMs, provenance checks, and dependency allowlists, and isolate the install in a sandbox so only validated artifacts make it through. Because as it turns out, a package name is the easiest thing in the world to fake.

4. Untested AI Code Reaching Production

You wouldn't push human-written code to production without testing it. Apparently a lot of teams feel differently about the machine-written code.

Tricentis' 2026 Quality Transformation Report says 60% of organizations ship untested AI-generated code, and 32% do it deliberately because executives are pushing for speed. Meanwhile, GitClear's 2025 research found a sharp rise in copy-pasted and cloned code, including 4x more cloning than before the AI boom.

The short-term result is more bugs and security gaps. The long-term result is the one that costs money. A codebase full of duplicated logic, inconsistent style, weak abstractions, and missing tests is a codebase nobody wants to refactor. But eventually, somebody has to! So, the bill arrives with slower releases, fragile changes, higher incident rates, and a quarter of your engineering capacity spent cleaning up the vibes.

The remedy is boring and effective: make AI code earn its way to production through the same gates everything else does. Define CI quality gates with linting, static analysis, type checks, unit, integration, and security tests, and scan the finished artifacts for known CVEs so critical ones don't ride along into prod.

5. Privacy, IP, and Secret Leakage

There's a quieter risk that has nothing to do with whether the code works: where your code goes while the assistant is helping you write it.

Developers paste proprietary source, stack traces, logs, and even secrets into tools whose data handling isn't always clear enough for an enterprise risk model. Personal AI accounts make it worse, because now company code is flowing through services that sit entirely outside your retention policies and audit controls. A 2025 privacy scorecard found broad opacity across AI coding assistants: opt-out-by-default training, and a near-universal failure to proactively filter secrets out of prompts. Netwrix went further and found credentials stored as plaintext JSON in predictable local paths. And SC World reported GitGuardian findings that public GitHub secret leaks rose 34% in 2025, with AI-assisted commits twice as likely to leak a secret.

Then there's the licensing side. Sometimes generated code closely matches existing public code, which drags in attribution and IP questions. That public snippet might be GPL, Apache, or MIT, each with its own obligations. Plus, it might carry the original's bugs and vulnerabilities along for the ride.

So, lock the perimeter. Use approved enterprise tools only, block secrets in prompts, and forbid personal AI accounts for proprietary code. Run secret scanning locally and in CI, scan your git history, rotate anything that leaked, and reference vaults or environment variables instead of hardcoding credentials. Configure your assistant to flag or block suggestions that match public code, and read what it pulls in before you accept it.

6. Agentic Tooling as a New Attack Surface

Agentic tools have outgrown autocomplete. They read files, run commands, install packages, call other tools, and edit your repo on your behalf. More work handed to the agent is genuinely great, but only until we remember that it also means a bigger blast radius. And we are talking not only about the wiped out production database. It also makes prompt injection a real system action.

A 2026 paper describes how hidden instructions tucked into external artifacts can hijack a coding assistant and turn it into, in their words, an "attacker's shell," running unauthorized commands with the developer's own privileges. The reported attack success rates ran from 41% to 84% across tested payloads, with assistants hunting for credentials, rewriting authentication config, and exfiltrating data.

The IDE is exposed, too: Tom's Hardware covered the IDEsaster research, which uncovered over 30 critical AI IDE flaws enabling data theft and remote code execution. This is what happens when your IDE becomes both the assistant and the attack surface.

The OWASP AI Security Verification Standard states the problem explicitly: the AI coding agent is an actor in your supply chain, with an identity and authority of its own. It can act on its own behalf, or be acted upon by an attacker. AISVS recommends written policies for when AI tools may generate, refactor, or review code, covering the whole SSDLC from design through deployment and monitoring. Give the agent the same scrutiny you'd give a new hire with production access. Because that's what it is.

7. Governance Chaos and Skill Erosion

Some of these risks will never show up in a scanner. This is because they're organizational, and therefore, harder to notice or mitigate.

JetBrains' State of Developer Ecosystem 2025 reports that 68% of developers expect their employers to require AI-tool proficiency. At the same time, they worry about their own skills eroding. Forced adoption compounds the issue, because now agentic AI is one more distributed system to govern, observe, secure, budget, and debug. Teams start optimizing for "AI usage" as if it were the goal, instead of the things that have always been the goal: reliability, maintainability, security, performance, correctness. Down that road lie cognitive overload, surprising AI bills, dulled engineering instincts, and policies that lag behind what developers are actually doing.

And there's a generational trap forming. Companies hesitate to hire juniors because they expect AI to cover that work. Juniors who do get hired lean on AI so heavily that they don't build the instincts seniors are made of. So where, exactly, are the next seniors coming from? That one we get to find out the hard way.

Alas, there's no clear fix for skill erosion. There are no best practices or a checklist to follow. But governance you can do. Write an AI coding policy that names approved tools, review requirements, attribution and traceability, ownership, training, logging, and cost controls; the NIST AI Risk Management Framework is a solid backbone for it. The core discipline is refusing to treat "AI usage" as a success metric, and continuing to measure code quality, reliability, security, and maintainability the old-fashioned way. Whatever else changes, you still own the code you shipped, even the parts you didn't type.

8. Missing Context in Real Codebases

AI is brilliant at isolated problems and clean little demos. Fortunately or not, your production system is neither.

Real codebases come with conventions, weird build logic, and undocumented constraints. The model doesn't know why the code is shaped the way it is. It doesn’t know that one branch handles a customer-specific edge case, or that a migration got abandoned half-finished three years ago, or that there's a method nobody dares refactor because the whole app can go down. So, the AI tool suggests a change that looks perfectly reasonable in isolation, but violates an architectural rule, a domain invariant, a compatibility guarantee, or a performance assumption nobody documented.

Over time that produces architectural drift: more inconsistent patterns and less shared understanding of how the system fits together. It also thins out ownership, because changes can land without anyone fully grasping how they interact with everything around them. The bigger and older the codebase, the more dangerous context-light automation gets.

The answer here is less prompt engineering and more context engineering. Maintain explicit context for the tool with requirements, tests, architecture notes, repository instructions. Create a dedicated context file that spells out the relevant quirks of your codebase, and that the assistant reads before it touches anything. This file might do more for output quality than any amount of prompt sorcery.

So, should you stop using AI to code?

No. I'm not here to talk you out of AI-assisted development, and I'm not going to pretend the safe move is to switch it off. What I’m trying to convey is a simple idea: stop trusting it blindly!

AI-generated code isn't automatically dangerous. It's automatically unverified, and unverified code earns its place the same way everything else does, whoever or whatever wrote it: review, test, scan, isolate, and own the result.

So, set up the guardrails before the generated code bites you:

  • Wire real quality gates into CI: linting, static analysis, type checks, the full test suite;
  • Run SAST, SCA, and secret scanning over everything the model hands you;
  • Scan your built artifacts for CVEs.

None of this is new. It's the discipline we already apply to human code.

The eight problems above are the dark side. The engineering practices that keep that side in check should be the bright side, and a good place to start writing your own guardrails.

Subcribe to our newsletter

figure

Read the industry news, receive solutions to your problems, and find the ways to save money.

Further reading