• About
  • Advertise
  • Privacy & Policy
  • Contact
HK Businesswire
  • Home
  • News
    • All
    • Business
    • Politics
    • PR Newswire
    • Science
    • World
    Xia Baolong concludes HK inspection

    Xia Baolong concludes HK inspection

    Iran deal ‘not final’, says Trump

    Seven Perfect Shuffles Randomize a Deck of Cards. But How Many Sloppy Ones?

    AXI SECURES FSC MAURITIUS LICENCE, BRINGING REGULATED TRADING TO THE WORLD’S FASTEST-GROWING MARKETS

    AXI SECURES FSC MAURITIUS LICENCE, BRINGING REGULATED TRADING TO THE WORLD’S FASTEST-GROWING MARKETS

    CE welcomes Hainan Governor

    CE welcomes Hainan Governor

    Man vs. Machine: 7th-Gen COFE+ Robotic Café Outperforms Elite Baristas in Historic Live Showdown

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • PR Newswire
  • Business
  • World
  • Entertainment
  • Sports
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup

    Alipay Launches AI-Powered Version ‘Abao’ to Streamline Services

    Xiaohongshu Prepares Confidential Hong Kong IPO Filing

    SpaceX Raises $75 Billion in Historic IPO Amid $350 Billion Investor Demand

    Chinese firms double down on tech: Xiaomi, Haier

    Xiaomi Launches MiMo Code AI Programming Assistant to Enter Coding Agent Market

    Apple Unveils Overhauled Siri AI and Major OS Updates at WWDC 2026

    OpenAI launches AI browser Atlas

    OpenAI Files Confidentially for IPO Amid Intensifying AI Competition

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Feature
No Result
View All Result
  • Home
  • News
    • All
    • Business
    • Politics
    • PR Newswire
    • Science
    • World
    Xia Baolong concludes HK inspection

    Xia Baolong concludes HK inspection

    Iran deal ‘not final’, says Trump

    Seven Perfect Shuffles Randomize a Deck of Cards. But How Many Sloppy Ones?

    AXI SECURES FSC MAURITIUS LICENCE, BRINGING REGULATED TRADING TO THE WORLD’S FASTEST-GROWING MARKETS

    AXI SECURES FSC MAURITIUS LICENCE, BRINGING REGULATED TRADING TO THE WORLD’S FASTEST-GROWING MARKETS

    CE welcomes Hainan Governor

    CE welcomes Hainan Governor

    Man vs. Machine: 7th-Gen COFE+ Robotic Café Outperforms Elite Baristas in Historic Live Showdown

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • PR Newswire
  • Business
  • World
  • Entertainment
  • Sports
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup

    Alipay Launches AI-Powered Version ‘Abao’ to Streamline Services

    Xiaohongshu Prepares Confidential Hong Kong IPO Filing

    SpaceX Raises $75 Billion in Historic IPO Amid $350 Billion Investor Demand

    Chinese firms double down on tech: Xiaomi, Haier

    Xiaomi Launches MiMo Code AI Programming Assistant to Enter Coding Agent Market

    Apple Unveils Overhauled Siri AI and Major OS Updates at WWDC 2026

    OpenAI launches AI browser Atlas

    OpenAI Files Confidentially for IPO Amid Intensifying AI Competition

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Feature
No Result
View All Result
HK Businesswire
No Result
View All Result
Home News Science

Can AI really code? Study maps the roadblocks to autonomous software engineering

David Lee by David Lee
16 July 2025
in Science
0
Can AI really code? Study maps the roadblocks to autonomous software engineering
0
SHARES
2
VIEWS
Share on FacebookShare on Twitter

Imagine a future where artificial intelligence quietly shoulders the drudgery of software development: refactoring tangled code, migrating legacy systems, and hunting down race conditions, so that human engineers can devote themselves to architecture, design, and the genuinely novel problems still beyond a machine’s reach. Recent advances appear to have nudged that future tantalizingly close, but a new paper by researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and several collaborating institutions argues that this potential future reality demands a hard look at present-day challenges. Titled “Challenges and Paths Towards AI for Software Engineering,” the work maps the many software-engineering tasks beyond code generation, identifies current bottlenecks, and highlights research directions to overcome them, aiming to let humans focus on high-level design while routine work is automated. “Everyone is talking about how we don’t need programmers anymore, and there’s all this automation now available,” says Armando Solar‑Lezama, MIT professor of electrical engineering and computer science, CSAIL principal investigator, and senior author of the study. “On the one hand, the field has made tremendous progress. We have tools that are way more powerful than any we’ve seen before. But there’s also a long way to go toward really getting the full promise of automation that we would expect.”Solar-Lezama argues that popular narratives often shrink software engineering to “the undergrad programming part: someone hands you a spec for a little function and you implement it, or solving LeetCode-style programming interviews.” Real practice is far broader. It includes everyday refactors that polish design, plus sweeping migrations that move millions of lines from COBOL to Java and reshape entire businesses. It requires nonstop testing and analysis — fuzzing, property-based testing, and other methods — to catch concurrency bugs, or patch zero-day flaws. And it involves the maintenance grind: documenting decade-old code, summarizing change histories for new teammates, and reviewing pull requests for style, performance, and security.Industry-scale code optimization — think re-tuning GPU kernels or the relentless, multi-layered refinements behind Chrome’s V8 engine — remains stubbornly hard to evaluate. Today’s headline metrics were designed for short, self-contained problems, and while multiple-choice tests still dominate natural-language research, they were never the norm in AI-for-code. The field’s de facto yardstick, SWE-Bench, simply asks a model to patch a GitHub issue: useful, but still akin to the “undergrad programming exercise” paradigm. It touches only a few hundred lines of code, risks data leakage from public repositories, and ignores other real-world contexts — AI-assisted refactors, human–AI pair programming, or performance-critical rewrites that span millions of lines. Until benchmarks expand to capture those higher-stakes scenarios, measuring progress — and thus accelerating it — will remain an open challenge.If measurement is one obstacle, human‑machine communication is another. First author Alex  Gu, an MIT graduate student in electrical engineering and computer science, sees today’s interaction as “a thin line of communication.” When he asks a system to generate code, he often receives a large, unstructured file and even a set of unit tests, yet those tests tend to be superficial. This gap extends to the AI’s ability to effectively use the wider suite of software engineering tools, from debuggers to static analyzers, that humans rely on for precise control and deeper understanding. “I don’t really have much control over what the model writes,” he says. “Without a channel for the AI to expose its own confidence — ‘this part’s correct … this part, maybe double‑check’ — developers risk blindly trusting hallucinated logic that compiles, but collapses in production. Another critical aspect is having the AI know when to defer to the user for clarification.” Scale compounds these difficulties. Current AI models struggle profoundly with large code bases, often spanning millions of lines. Foundation models learn from public GitHub, but “every company’s code base is kind of different and unique,” Gu says, making proprietary coding conventions and specification requirements fundamentally out of distribution. The result is code that looks plausible yet calls non‑existent functions, violates internal style rules, or fails continuous‑integration pipelines. This often leads to AI-generated code that “hallucinates,” meaning it creates content that looks plausible but doesn’t align with the specific internal conventions, helper functions, or architectural patterns of a given company. Models will also often retrieve incorrectly, because it retrieves code with a similar name (syntax) rather than functionality and logic, which is what a model might need to know how to write the function. “Standard retrieval techniques are very easily fooled by pieces of code that are doing the same thing but look different,” says Solar‑Lezama. The authors mention that since there is no silver bullet to these issues, they’re calling instead for community‑scale efforts: richer, having data that captures the process of developers writing code (for example, which code developers keep versus throw away, how code gets refactored over time, etc.), shared evaluation suites that measure progress on refactor quality, bug‑fix longevity, and migration correctness; and transparent tooling that lets models expose uncertainty and invite human steering rather than passive acceptance. Gu frames the agenda as a “call to action” for larger open‑source collaborations that no single lab could muster alone. Solar‑Lezama imagines incremental advances—“research results taking bites out of each one of these challenges separately”—that feed back into commercial tools and gradually move AI from autocomplete sidekick toward genuine engineering partner.“Why does any of this matter? Software already underpins finance, transportation, health care, and the minutiae of daily life, and the human effort required to build and maintain it safely is becoming a bottleneck. An AI that can shoulder the grunt work — and do so without introducing hidden failures — would free developers to focus on creativity, strategy, and ethics” says Gu. “But that future depends on acknowledging that code completion is the easy part; the hard part is everything else. Our goal isn’t to replace programmers. It’s to amplify them. When AI can tackle the tedious and the terrifying, human engineers can finally spend their time on what only humans can do.”“With so many new works emerging in AI for coding, and the community often chasing the latest trends, it can be hard to step back and reflect on which problems are most important to tackle,” says Baptiste Rozière, an AI scientist at Mistral AI, who wasn’t involved in the paper. “I enjoyed reading this paper because it offers a clear overview of the key tasks and challenges in AI for software engineering. It also outlines promising directions for future research in the field.”Gu and Solar-Lezama wrote the paper with University of California at Berkeley Professor Koushik Sen and PhD students Naman Jain and Manish Shetty, Cornell University Assistant Professor Kevin Ellis and PhD student Wen-Ding Li, Stanford University Assistant Professor Diyi Yang and PhD student Yijia Shao, and incoming Johns Hopkins University assistant professor Ziyang Li. Their work was supported, in part, by the National Science Foundation (NSF), SKY Lab industrial sponsors and affiliates, Intel Corp. through an NSF grant, and the Office of Naval Research.The researchers are presenting their work at the International Conference on Machine Learning (ICML). 

Tags: Science
David Lee

David Lee

Read More

MIT in the media: For the future of tech, “Massachusetts can absolutely lead”

18 June 2026

Why the Human Genome’s Tangled Physicality May Confound AI

18 June 2026
  • Trending
  • Comments
  • Latest
Clarivate Releases Journal Citation Reports 2026

Clarivate Releases Journal Citation Reports 2026

17 June 2026

HKICPA Supports Government Plan to Boost Corporate Treasury Centres in Hong Kong

12 June 2026
Jabs urged as doctors fear flu season overlap

Ping An Good Doctor Upgrades AI Health Service to Cover 90 Million Monthly Users

17 June 2026

Fluorescent nanosensor enables rapid, first-of-its-kind detection of key gut health biomarker

15 June 2026
Xia Baolong concludes HK inspection

Xia Baolong concludes HK inspection

17 June 2026

Iran deal ‘not final’, says Trump

17 June 2026

Seven Perfect Shuffles Randomize a Deck of Cards. But How Many Sloppy Ones?

17 June 2026
AXI SECURES FSC MAURITIUS LICENCE, BRINGING REGULATED TRADING TO THE WORLD’S FASTEST-GROWING MARKETS

AXI SECURES FSC MAURITIUS LICENCE, BRINGING REGULATED TRADING TO THE WORLD’S FASTEST-GROWING MARKETS

17 June 2026

Recent News

Xia Baolong concludes HK inspection

Xia Baolong concludes HK inspection

17 June 2026

Iran deal ‘not final’, says Trump

17 June 2026

Seven Perfect Shuffles Randomize a Deck of Cards. But How Many Sloppy Ones?

17 June 2026
AXI SECURES FSC MAURITIUS LICENCE, BRINGING REGULATED TRADING TO THE WORLD’S FASTEST-GROWING MARKETS

AXI SECURES FSC MAURITIUS LICENCE, BRINGING REGULATED TRADING TO THE WORLD’S FASTEST-GROWING MARKETS

17 June 2026
HK Businesswire

Stay ahead with the latest insights on Hong Kong’s economy, finance, and investments. From market trends to policy updates, we bring you in-depth analysis and expert opinions.

📩 Subscribe to our newsletter for exclusive updates.
📍 Follow us on social media for real-time news.
📧 Contact us: info@hongkong-invest.com

Follow Us

  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2025 by HKBusinesswire.com

No Result
View All Result

© 2025 by HKBusinesswire.com