Elon Musk tries to sell us Grok 4 in a lab coat; tech platforms are struggling with AI video; Amazon deepens Anthropic investment; Meta is trying to win the AGI race with money
Games are the new battleground for actors and AI; the most popular AI coding tools for engineers; inside the AI scraping fight that is changing the web; India scrambles to achieve AI independence
You have to hand it to Elon Musk: nobody sells a moonshot like the guy who is literally selling us a moonshot. His latest pitch (posted, naturally, on X) is that xAI’s new model, Grok 4, is already “PhD-level in everything, better than PhD.” He then adds that he’d be “shocked” if Grok 4 fails to invent useful new technologies before 2026.
Bold claim, followed by an awkward silence from the four participants in the hostage video—sorry, “livestream." Also: pixie dust with a sprinkle of nonsense.
Reasoning models are still autocorrect on rocket fuel, powered by inference-time compute. In fact, the best way to think about reasoning models is this: if the machine learning of the 2010s was a car with cruise control, the reasoning models of today are vehicles with Level 2 autonomy. Don’t get me wrong: Level 2 is incredibly impressive compared to cruise control, but it’s not full autonomy. The problem of course is that there are some goofballs who fall for Musk’s marketing and think Tesla’s “Full Self-Driving” is a Level 5 system when it fact it’s not, resulting in so many collisions there’s a dedicated Wikipedia page to keep track of them.
Reasoning models work by hoovering up oceans of text and learn statistically plausible ways to arrange words, which are then improved using chain-of-thought (CoT) prompting. But as Yann LeCun pointed out, CoT is still (incredibly good!) knowledge accumulation and retrieval, not human intelligence. A PhD (and I know this because I’m surrounded by a lot of them) does more than regurgitate facts. They frame questions, decide what not to believe, design experiments, and argue with colleagues in the hallway (particularly about the quality and quantity of what’s inside the snack drawer). Grok 4 or equivalents from OpenAI or Google do none of those things unaided.
Even some of the best reasoning benchmarks mostly boil down to multiple-choice quizzes or very advanced trivia questions. Get enough training data and a big enough GPU cluster, and your model will ace the test. That’s impressive, and it gets product teams a shiny leaderboard badge, but it isn’t the same as actually operating at the frontier of a discipline.
AI companies love to present a chart where performance climbs smoothly past undergraduate, graduate, and finally PhD. Terrific marketing. In the real world, graduating from PhD-level capability to genuine discovery involves touching messy reality: running assays, shipping prototypes, persuading grant committees, enduring peer review. None of that lives in a token stream.
Yes, language models can spit out novel protein sequences or plausible circuit diagrams. But novel is not the same as useful, and certainly not the same as commercially adopted. Saying Grok 4 will “discover new, useful technologies” by 2026 is like saying Google Search would cure cancer by 2010 because it could find every article about oncogenes. Tools accelerate discovery; they do not replace it.
A core limitation is that today’s models are still largely one-shot conversationalists. They reason once, in a vacuum, and return an answer. A scientist, by contrast, iterates: build hypothesis → test → dump half the theory in the trash → repeat. Until an AI can operate an autonomous feedback loop (planning experiments, controlling robots, interpreting raw data, revising hypotheses) it remains a super-autocomplete, not a scientist.
Could we bolt on robotics, simulation, and automated lab platforms? Absolutely. Companies such as Insilico Medicine and Isomorphic Labs are already walking that path. But the leap from plausible X demo to fully automated R&D pipeline is gargantuan, and definitely not scheduled by a Musk-approved calendar app.
None of this is meant to dismiss the raw power of modern models, despite the attempts from ghouls like Gary Marcus and Emily M. Bender who keep yapping about the limitations while flogging human-made slop in the form of lousy books and newsletters. “Regurgitation” at trillion-token scale is transformative: patent lawyers draft claims faster, junior programmers ship code that compiles on the first try, and small biotech teams get literature reviews that used to take months. That’s real economic juice.
Moreover, as AI labs chase the mirage of superintelligence (RIP AGI, it was nice knowin’ ya!), they will keep stumbling into practical breakthroughs (better compression, clever retrieval augments, at scale networking of GPUs) that migrate into everyday software. One could argue the search for superintelligence is the best R&D subsidy earth has ever seen; even if AGI forever recedes over the horizon, the side-effects are priceless.
Musk’s bravado is partly brand management: he needs xAI to look as inevitable as SpaceX felt in 2008. Investors fund inevitability. Regulators fear it. Engineers quit comfortable jobs to join it. Declaring Grok 4 a PhD-slayer signals we’re already there—come aboard or get left behind. The statement works even if it’s untrue; what matters is narrative gravity.
Instead of asking whether Grok 4 beats every PhD, ask simpler questions:
Does it reduce the cost of getting to “draft one” of a technical document to nearly zero?
Does it surface non-obvious connections across literature that a human might miss?
Does it shorten iteration cycles in simulation-heavy fields such as materials science?
Early evidence says yes on all three. That is plenty revolutionary. It does not require anthropomorphizing the model or pretending we’re about to birth “a nation of benevolent AI geniuses.” (I almost vomited in my mouth when I copied/pasted that headline from Wired.)
Musk’s prophecy of world-changing discoveries by 2026 will almost certainly be wrong on specifics. But the direction of travel (models growing more capable, more integrated with tools, more useful to human experts) is entirely right. Dismissing the tech because its loudest champions oversell it would be as silly as ignoring early internet protocols because the pets.com sock puppet promised eternal profits.
The smartest stance is realism wrapped in optimism: call out overreach, while doubling down on the real advantages reasoning engines already deliver. Humanity’s next breakthroughs will almost certainly involve an AI partner. They just won’t arrive on a timeline dictated by a livestream on X.
And now, here are the week’s news:
❤️Computer loves
Our top news picks for the week - your essential reading from the world of AI
WSJ: Was That Amazing Video in Your Feed Real or AI? Tech Platforms Are Struggling to Let You Know
FT: Amazon weighs further investment in Anthropic to deepen AI alliance
The Verge: Meta is trying to win the AI race with money — but not everyone can be bought
Wired: How Video Games Became the New Battleground for Actors and AI Protections
Business Insider: These are the most popular AI coding tools among engineers
WSJ: The AI Scraping Fight That Could Change the Future of the Web
Business Insider: How Google found its AI hype guy
MIT Technology Review: Inside India’s scramble for AI independence
WSJ: The Companies Betting They Can Profit From Google Search’s Demise
The New York Times: The Coder ‘Village’ at the Heart of China’s A.I. Frenzy
Keep reading with a 7-day free trial
Subscribe to Computerspeak by Alexandru Voica to keep reading this post and get 7 days of free access to the full post archives.