AI coding tools make developers slower but they think they’re faster, study finds

https://www.theregister.com/2025/07/11/ai_code_tools_slow_down/

Download the full paper from metr.org

I want this paper to join the regular stream of AI articles appearing in everybody’s news feed, especially those of the CEOs of Anthropic, OpenAI, Meta, Google, Microsoft, and every other company out there banking on being able to downsize their workforce someday on the promise of the AI hype. As is so often the case in the tech industry, the CEOs are out of touch with reality.

The setup

The study was a randomized controlled trial (RCT) involving 16 experienced open-source developers, each with an average of 5 years of prior experience on mature projects. Developers completed 246 tasks – Github issues to be more specific – with each task randomly assigned to either allow or disallow the use of AI tools. The AI tools they allowed were Cursor Pro, configured with Claude 3.5/3.7 Sonnet LLM.  When AI tools were allowed, developers primarily used Cursor Pro and Claude 3.5/3.7 Sonnet. The study included onboarding and training, required developers to record their screens while working, and collected both quantitative and qualitative data through surveys, interviews, and screen recordings. They tried to mirror real-world development closely. 

The results

Developers were quite confident in AI before the study. On average, they thought they woudl shave 24% off their development time. In contrast, they ended up increasing their development time by 19%! 

They postulated about a lot of factors that could contribute to the slowdown. With such a small population in this study, it’d be hard to find a lot of strong predictors. However, there were some that met significance that were quite telling:

  • Over-optimism about AI usefulness – On average, the developers predicted their implementation time would decrease by 24%. Instead, it increased by 19%.
  • High developer familiarity with repositories – This confused me at first, but as a long-time developer, it made sense. Developers who were intimately familiar with their repositories should trust themselves to work on their issues rather than trust AI. AI will be challenged to match human expertise on a project in which you are already the expert. 
  • Large and complex repositories – This should come as no surprise. Some of the repositories that had the worst outcomes were those with over 1,000,000 lines of code. No human or AI is going to be able to easily make sense of that, especially when the repo has been around a long time and is riddled with legacy, outdated, poorly maintained code. What makes people think that AI is going to somehow do a better job of fixing crap code? It can help a developer refactor it so that it can then meet high standards, and then maybe it’d have a better chance of helping with improving and fixing the code. 
  • Low AI Reliability – Most developers (myself included) regularly deal with AI generating useless code. But it’s confident about its useless output! So convincing! Thus, devs often erroneously just accept it. Unfortunately, we eventually get into a loop where we accept buggy code, ask AI to fix its own buggy code, at which point it introduces another bug. Sometimes, it’ll even go back and re-introduce the bug it originally fixed. It’s a vicious cycle.

I would strongly advise any company considering downsizing its developer workforce to carefully reconsider their plans. AI has its strengths. The developer who blindly uses it with no real knowledge ot software engineering is an incredibly dangerous developer who should be avoided. However, a developer who knows AI’s strengths and weaknesses, who knows when to trust it, when to avoid it, and how to control and leverage it carefully in thier workflow has tremendous promise. They will be in sought-after positions for quite some time. 

Leave a Reply

Your email address will not be published. Required fields are marked *