Rethinking Code Review in the Age of AI

AI promised to help engineers move faster without sacrificing quality, and in many fields, it has. Today, many developers use AI tools in their daily workflow and report that it helps them work faster and increase code output. Actually, our developer survey shows nearly 70% of developers feel that AI agents have increased their productivity. But speed exceeds scrutiny, and this introduces a new type of danger that is difficult to detect and presents many situations where it is more expensive to fix than the right speed.
The problem is not that AI produces “dirty” code. It’s actually the opposite. Code generated by AI is often readable, organized, and follows common patterns. At first glance, it looks like it’s ready for production. However, high quality can be misleading; that seemingly “dirty” code can still cause chaos. The actual gaps tend to be lower, assuming the code is built on top of it.
Quality Marks Are Hard to Spot
AI does not fail in the same way as humans. When an inexperienced or rushed developer makes a mistake, it’s usually obvious to the reviewer: an edge case is missed, the work is incomplete, or the mind is off. When AI-generated code fails, it’s not often because of syntax, but because of context. The confidence AI shows when it’s wrong about historical reality is the same confidence it expresses in the code it shares.
Without a full understanding of the system it contributes to, the model fills in the gaps based on patterns that do not always match the specifics of a particular area. That can lead to code that has no context for data structures, misinterprets how the API behaves, or uses generic security measures that don’t apply to real-world situations or lack the context developers have about the system.
Developers make these new challenges known, reporting that their top frustration is with AI-generated solutions that are almost right but not really, and the second frustration mentioned is the time it takes to prepare those solutions. We see huge benefits in the end of the workflow from fast modeling, but then we pay for it with late cycles, double and triple testing work, or passing debugging problems.
Recent Anthropic discoveries academic report reveals another layer to this reality: among those who use AI tools for code generation, users are less likely to identify missing context or question model assumptions compared to those who use AI for other purposes.
The result is buggy code that goes into early stage reviews and later releases, where it is very difficult to fix as it is often the basis for subsequent code additions.
Update Alone Is Not Enough to Catch AI Slop
If the root problem is missing context, then the most efficient place to fix it is in the notification phase before the code is generated.
In practice, however, much of the information is still at a very high level. They describe the desired outcome but often lack the details to explain how to get there. The model should fill in those gaps on its own without a mountain of context engineers, where it can go wrong. Those disagreements can be between developers, requirements, or other AI tools.
In addition, extraction should be considered as an iterative process. Asking a model to explain its method or call out potential weaknesses can reveal problems before the code is submitted for review. This changes from a single request to a back-and-forth exchange where the engineer asks for guesswork before accepting the AI’s output. This human-in-the-loop approach ensures that developer expertise is always superimposed on AI-generated code, not replaced by it, reducing the risk of subtle errors making it into production.
Because different developers will always have different reporting habits, introducing a shared structure can also help. Teams don’t need complex processes, but they do benefit from common expectations about what good reporting looks like and how assumptions should be validated. Even simple guidelines can reduce recurring problems and make results predictable.
A New Method of Verification
AI hasn’t removed complexity from software development – it’s just shifted where it sits. Teams that once spent a lot of time writing code now have to spend that time validating it. Without adapting the development process to accommodate new AI coding tools, problem detection will continue downstream, where costs increase and debugging becomes more difficult, without realizing savings in other steps.
In an AI-assisted system, better results start with better input. Validation is now a key part of the engineering process, and good code depends on providing a model with clear context based on human-validated company knowledge from the start. Correcting that part has a direct impact on the quality of what follows.
Instead of focusing only on reviewing the finished code, developers now play a more important role in ensuring that the right context is embedded from the start.
When done deliberately and carefully, speed and quality no longer have to be at odds. Teams that successfully integrate early validation into their workflows will spend less time fixing late-stage issues and actually reap the benefits of a faster coding cycle.



