Researchers have a better definition for low resource languages; Physical Intelligence is a $1bn robotics startup; Slack surveys 17,000 workers about AI; tech giants investing in "sovereign AI"

Big labs struggle to build more advanced AI; xAI's supercomputer freaks out rivals; Amazon bets big on in-house AI chips; Donald Trump's AI policies explained; Andrew Ross Sorkin meets his AI clone

Nov 15, 2024

∙ Paid

I’m from Romania, the sixth largest member of the European Union by population. Despite being surrounded by mostly Slavic nations, Romanians speak a dialect of Vulgar Latin which separated from Western Romance languages such as Italian or Spanish between the 5th and the 8th centuries. Romanian has evolved from fewer than 2,500 words to a lexicon of over 150,000 words in its contemporary form by demonstrating a high degree of lexical permeability, reflecting contact with the indigenous Thraco-Dacian as well as the languages of various nations that invaded or ruled over parts of Romania, such as Russian, Greek, Hungarian, German, Turkish, or languages that served as cultural models during and after the Age of Enlightenment, in particular French, Italian and English.

Today, there are over 25 million speakers of Romanian around the world and yet, for many AI researchers, it is considered a low resource language. This may surprise you because the term low resource language immediately brings up the image of an isolated community with thousands of speakers living somewhere in the Global South, and not an entire country with tens of millions of people communicating in a language directly derived from Latin.

A new paper presented this week by researchers from MBZUAI and UC Berkeley at the EMNLP conference in Miami explains why Romanian or Cherokee (a language spoken by 2,000 people in the United States) can both be considered low resource, albeit for different reasons.

The paper reasons that the relationship between low and high resource languages can be best understood through Zeno’s paradox of Achilles and the tortoise. Imagine Achilles forever pursuing a slow-moving tortoise with a head start. Despite his speed, Achilles can never quite reach the tortoise—a metaphor for the perpetual struggle of low resource languages to catch up to the ever-moving target set by well-resourced counterparts like English or Chinese.

Historically, low resource languages have been defined in a binary way by putting them in direct contrast with high-resource languages in terms of data availability. However, this approach oversimplifies the reality that resourcedness sits on a spectrum that is also influenced by critical socio-political and cultural dimensions. Criteria for labeling languages as low resource range from the number of speakers and available linguistic resources to economic conditions and digital presence. For example, while Quechua is spoken by millions, it is still considered low resource due to its lack of linguistic data and infrastructure for AI tasks. By contrast, some languages with smaller speaker populations but a richer linguistic database are better represented in AI research.

With these factors in mind, the researchers propose that we should evaluate low resource languages by looking at four key dimensions that affect the resourcedness of a language:

Socio-political context: Economic and historical factors shape language resources, especially for languages marginalized within their countries. Many communities lack the financial means to create language resources, and in some cases, language policies prioritize dominant languages, further marginalizing others.
Human and digital resources: low resource languages often lack essential human resources, such as linguistic experts or native speakers, and digital tools. The scarcity of trained AI researchers from these communities exacerbates the issue.
Artifacts and infrastructure: These include curated linguistic data, computational tools, and other resources for language technology development. For many languages, even if data exists, it may lack consistency or standardization, complicating efforts to build effective language technologies such as AI models. In the case of Romanian, many existing datasets contain a significant amount of noise, with samples in Slavic languages being incorrectly labeled as Romanian partly because of the wrong assumption that Romanian is a Slavic language since it is the language of an Eastern European country.
Community agency: The degree of involvement by native-speaking communities significantly impacts the creation and adoption of language tools. Technologies built without considering community needs may have minimal real-world application, limiting their effectiveness.

This precise classification of low resource languages is crucial for developing targeted interventions. Without a clear definition, measuring progress and addressing specific needs for each language becomes nearly impossible. For example, if one language lacks community agency while another lacks linguistic data, they require different strategies to achieve technological equity.

But there’s another reason why this paper stood out to me: when ChatGPT exploded into the mainstream, the AI community began to debate when and how we will make the leap from foundation models to AGI. The problem was that everyone was using their own definition of AGI so Google DeepMind, Anthropic, OpenAI and others began publishing their views on the topic, including frameworks for classifying the capabilities of AI. These positions then converged to a relatively stable definition of AGI that allowed for society at large to move from debating the progress we’re making in advancing AI to measuring it.

So while low resource languages may never fully overtake high resource languages, we now have a clearer definition of these concepts which hopefully will lead to faster progress or, at least, a more accurate measurement of the distance between Achilles and the tortoise.

And now, here are this week’s news:

❤️Computer loves

Our top news picks for the week - your essential reading from the world of AI

⚙️Computer does

AI in the wild: how artificial intelligence is used across industry, from the internet, social media, and retail to transportation, healthcare, banking, and more

TechCrunch: ChatGPT can now read some of your Mac’s desktop apps
VentureBeat: This startup's AI platform could replace 90% of your accounting tasks—here's how
The Verge: YouTube is testing music remixes made by AI
The Verge: Particle is a new app using AI to organize and summarize the news
MIT Technology Review: Generative AI taught a robot dog to scramble around a new environment
Variety: Jerry Garcia’s AI-Created Voice Can Now Narrate Audiobooks, Articles and More
The Verge: Instagram could let AI generate a profile picture for you
Axios: Putting AI to work for public defenders
The Telegraph: NHS doctors get AI assistant to listen to appointments and make notes
The Telegraph: Blind woman has better than 20/20 vision after AI surgery
Washington Post: Randy Travis’s beautiful baritone was lost. AI helped him sing again.

🧑‍🎓Computer learns

Interesting trends and developments from various AI fields, companies and people

Washington Post: AI travel influencers are here. Human travelers hate it.
TechCrunch: TikTok plugs Getty Images into its AI-generated ads and avatars
Wired: The First Entirely AI-Generated Video Game Is Insanely Weird and Fun
Axios: NHL updates massive video trove, readying for an AI world
TechCrunch: Tiger Global-backed InVideo launches gen AI-based video creation
TechCrunch: AI pioneer Francois Chollet leaves Google
WSJ: The Wall Street Journal is testing AI article summaries
The Verge: More AI-generated ads are coming to TikTok
New York Times: Are A.I. Clones the Future of Dating? I Tried Them for Myself.
The Information: The Enterprise Search App That Got Google and OpenAI’s Attention
MIT Technology Review: Google DeepMind has a new way to look inside an AI’s “mind”
Business Insider: Amazon's AI chatbot Q is entering enemy turf by integrating with Microsoft's Office 365
Business Insider: Instead of killing jobs, there's a strange AI hiring boom happening, according to Marc Andreessen
Fortune: Europe’s AI industry watches Trump’s return with a mix of fear and hope
Business Insider: Golin's first chief AI officer shares the company's strategy for using AI to transform public relations
The Telegraph: Shakespeare’s poetry ‘not as good as AI’
Fortune: Elon Musk’s xAI safety whisperer just became an advisor to Scale AI
Bloomberg: OpenAI Nears Launch of AI Agent Tool to Automate Tasks for Users
VentureBeat: You can now run the most powerful open source AI models locally on Mac M4 computers, thanks to Exo Labs
Business Insider: The race for the best AI model is 'heated,' a TeamViewer tech executive says — here's how the company is leveraging it
Business Insider: Inside Forward's failed attempt to revolutionize the doctor's office
Axios: Study: Growth of AI adoption slows among U.S. workers
TechCrunch: DeepL launches DeepL Voice, real-time, text-based translations from voices and videos
TechCrunch: Marc Benioff says it’s ‘crazy talk’ that AI will hurt Salesforce, wants a billion AI agents in a year
TechCrunch: Almost all of this year’s top 40 startups at Station F use AI
TechCrunch: Perplexity brings ads to its platform
New York Times: Stand-Up, Drama and Spambots: The Creative World Takes On A.I.
CNBC: Startup CEO says humans won’t be needed for translation in 3 years as it launches AI app
VentureBeat: ServiceNow rolls out enterprise AI governance capabilities to accelerate production deployment
FT: Recruiters urge candidates to use AI to apply for jobs
Fortune: Glassdoor CEO talks about the hottest jobs in the AI boom—and the one job he thinks is phasing out
Sifted: Google, Meta and some of France's top universities: Where Mistral poaches its top talent from
MIT Technology Review: The AI lab waging a guerrilla war over exploitative AI
TechCrunch: Amazon attempts to lure AI researchers with $110M in grants and credits
Bloomberg: OpenAI Co-Founder Returns to Startup After Monthslong Leave
The Information: Ex-OpenAI CTO Murati’s New Team Takes Shape
VentureBeat: Qwen2.5-Coder just changed the game for AI programming—and it's free
VentureBeat: Magic Story launches AI-based media platform for children to create their own adventures
VentureBeat: Box continues to expand beyond just data sharing, with agent-driven enterprise AI studio and no-code apps
Fortune: Recession could create an ‘abrupt shift’ in AI adoption: ‘That’s when you really see the effects of automation’
Fortune: T&T’s CEO says AI may cause power shortages and it could be ‘the next big social issue in the United States’
Reuters: Baidu bolsters AI lineup with enhanced text-to-image tech, no-code app builder
The Verge: Google’s AI ‘learning companion’ takes chatbot answers a step further
CNBC: China’s Alibaba releases AI search tool for small businesses in Europe and the Americas
FT: China’s Baidu joins Meta in race to make AI-integrated smart glasses
Lex Fridman: Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity
VentureBeat: Google DeepMind open-sources AlphaFold 3, ushering in a new era for drug discovery and molecular biology
Reuters: Vatican unveils AI services for St. Peter's Basilica ahead of Jubilee
Import AI: Tencent's new Hunyuan model is a MoE triumph, and by some measures is world class
Wired: I Went Birding With the World’s First AI-Powered Binoculars
TechCrunch: OpenAI loses another lead safety researcher, Lilian Weng
TechCrunch: X is testing a free version of AI chatbot Grok
The Verge: Spotify’s AI is no match for a real DJ
The Verge: How to use the latest AI video editing tools in Google Photos
VentureBeat: Multimodal RAG is growing, here's the best way to get started
Wired: The AI Machine Gun of the Future Is Already Here
AFP: Robot artist Ai-Da’s portrait of Alan Turing shatters auction records selling for over $1 million—a first for AI artwork
MIT Technology Review: A bold AI movement is underway in Africa—but it is being held up
Business Insider: Google's head of research on whether 'learn to code' is still good advice in the age of AI
Business Insider: Nvidia CEO says there's 'no question' that we'll all be working alongside AI employees
Business Insider: Here's how far we are from AGI, according to the people developing it
Business Insider: Indeed prepares for 2025 rollout of its new AI tool, Pathfinder, which aims to help job seekers
Reuters: Web Summit kicks off in Lisbon as tech leaders weigh Trump’s return

Keep reading with a 7-day free trial

Subscribe to Computerspeak by Alexandru Voica to keep reading this post and get 7 days of free access to the full post archives.