Are reasoning models becoming sounding boards for human thought? inside France’s effort to shape the AI conversation; Amazon is working on a smarter Alexa; SoftBank and OpenAI bet on each other

European startups embrace DeepSeek despite risks; an AI founder’s struggle to buy out his investors; no one knows how to price AI tools; AI-native companies think, act and grow differently

Feb 07, 2025

∙ Paid

In recent months, a new breed of reasoning models has begun to emerge from the research labs of the tech world. These systems, displaying early signs of what some might call a rudimentary form of intelligence (and marketed as “AI agents”), use a technique called test-time compute, which divides a larger prompt fed to a language model at inference time into smaller tasks that then become new prompts for the model to tackle.

An early adopter of reasoning models has been the software engineering community. Software engineers, traditionally engaged in solving complex analytical problems that can be broken into sub-problems, have integrated these new reasoning models into their creative processes. Instead of expecting the models to autonomously produce flawless code, engineers are leveraging them as interactive partners—similar to having a knowledgeable colleague or an advanced debugging tool. When a engineer explains a problem or conceptualizes an algorithm, the model responds with suggestions, counterpoints, and even alternative perspectives that the engineer might not have considered.

This interplay is not dissimilar to the dynamic seen during pair programming, where two engineers work a little bit like two people driving in a rally car race, with one person driving (or typing) and the other navigating. The driver carries out the navigator’s instructions, but has the opportunity to make corrections or ask for clarification. Very often, the driver and navigator also switch roles during pair programming. As these reasoning models get better at coding, they provide a reflective surface for ideas: they challenge, refine, and sometimes reaffirm an engineers’s logic.

There have been four developments this week which went mostly unnoticed but that have made me question the current narrative around AI agents.

First, GitHub announced significant enhancements to its AI-powered coding assistant, GitHub Copilot, introducing a new "agent mode" that enables Copilot to autonomously iterate on its code outputs, identify errors, and implement automatic fixes. The GitHub “agent” can suggest terminal commands and prompt engineers to execute them, as well as analyze runtime errors with self-healing capabilities. In this mode, Copilot performs the tasks requested by the engineer and also infers and executes additional necessary subtasks to ensure the primary request functions correctly. This includes catching and correcting its own errors, reducing the need for developers to manually intervene. GitHub’s CEO Thomas Dohmke introduced the new Copilot feature by claiming that AI pair programming is now entering its “peer programming” phase, where any engineer can have access to dozens of helpful AI colleagues to complete a project.

Then, Meta's engineering team unveiled the Automated Compliance Hardening (ACH) tool, a new system that leverages language models to enhance software testing. ACH employs mutation-guided, LLM-based test generation to strengthen platforms against regressions by automatically creating and detecting specific faults within source code. Engineers provide plain-text descriptions of potential bugs which ACH then utilizes to generate relevant faults (mutants) in the code, even if the descriptions are incomplete or contradictory. ACH also generates corresponding unit tests to identify and address the faults, ensuring that the tests effectively catch the specified bugs. Unlike traditional automated test generation methods that primarily aim to increase code coverage, ACH concentrates on identifying and rectifying particular types of faults, often enhancing coverage as a byproduct. Meta has successfully implemented ACH across various app surfaces, including Facebook Feed, Instagram, Messenger, and WhatsApp, and the company’s engineers have found ACH valuable for fortifying code against specific concerns and have observed additional benefits, even when the generated tests do not directly address a particular issue.

These emergent use cases in coding invite a broader reflection: perhaps we have mischaracterized the role of reasoning models from the outset. Instead of viewing them as standalone repositories of knowledge and problem solvers on a path to (super)human intelligence, it may be more fruitful to consider them as external cognitive aids—sounding boards that amplify and refine human thought processes.

Such a reframing carries implications for the current positioning of AI agents. A sounding board does not operate in isolation; its effectiveness is intrinsically linked to the quality of the input it receives. In this light, the phenomenon of varied output quality becomes less an indictment of the model’s performance and more a testament to the variability in human expertise. A post-doctoral researcher, armed with a deep and nuanced understanding of their field, is likely to extract richer, more complex insights from the model than might a undergraduate student still grappling with foundational concepts.

The crux of this paradigm lies in the interaction between human and machine. Just as a high-quality conversation requires thoughtful questions, well-articulated challenges, and a readiness to critically evaluate feedback, so too does effective engagement with reasoning models. The models are adept at echoing the sophistication of the prompts they receive—a quality that underscores the role of user expertise in determining the ultimate utility of these systems.

Should we, perhaps, focus more on developing a collaborative cognitive ecosystem where human intelligence is augmented—not replaced—by machine reasoning? This question leads to the week’s third development: in a pre-print paper published on arXiv and presented at the International Association for Safe and Ethical AI Conference, several researchers from Hugging Face bluntly proposed for fully autonomous AI agents not to be developed, arguing that we shouldn’t cede all human control to "agentic systems.” It’s not a view I agree with fully, but what I’m open to is the possibility that, while chasing after AGI (or ASI), we may discover more useful human-machine interfaces along the way.

Margaret Mitchell presenting the paper at the IASEAI 25 conference

In disciplines ranging from biomedical research to economics, the integration of reasoning models as intellectual sounding boards could improve collaborative problem-solving, where human creativity and computational logic intersect in new ways as exemplified by the fourth development of the week: Joshua Gans, an economist and a professor from the Rotman School of Management at the University of Toronto, used OpenAI’s o1-pro model to generate a paper in an hour based on an idea of his, and then published it in a peer reviewed journal, with adequate disclosure. Reflecting on his experience, Gans discussed the evolving role of AI in the preliminary stages of research, termed "presearch." He believes AI tools, particularly large language models, might make research “cheaper than search” and therefore transform how individuals gather and synthesize information before engaging in deeper research activities.

The o1-assisted paper was published by the Economics Letters journal

So if the true promise of these reasoning models lies not in their capacity to independently solve problems, but in their ability to catalyze human insight, then the trajectory of AI development may need to be reexamined.

Rather than investing solely in models that attempt to simulate or replicate human thought, a more balanced approach might emphasize the symbiosis between human and machine. In this vision, the role of the AI agent becomes that of an architect of dialogue—a designer of systems that not only process data but also actively engage with the creative, often unpredictable, flow of human reasoning.

And now, here are this week’s news:

❤️Computer loves

Our top news picks for the week - your essential reading from the world of AI

Keep reading with a 7-day free trial

Subscribe to Computerspeak by Alexandru Voica to keep reading this post and get 7 days of free access to the full post archives.