Hello and welcome to Eye on AI. In this newsletter…why a legal AI startup shows there’s more to the AI boom than just foundational models; Zoox starts offering robotaxi rides in San Francisco; it’s worryingly easy to jailbreak LLM-powered robots; is foundation model progress topping out?
There’s been a lot of talk this past week about whether the progress of general-purpose foundation models might be hitting a wall and what that means for the AI boom. (More on this in the Brain Food section below.)
Some skeptics, such as Gary Marcus, are predicting a reckoning on par with the dotcom crash. I will be discussing this very topic tomorrow at Web Summit in Lisbon, moderating a center stage conversation at 4:25 pm local time on “Is the AI Bubble About to Burst?” with Moveworks CEO Bhavin Shah and the AI Now Institute’s co-executive director Sarah Myers West. You can check it out on the Web Summit livestream!
My view is that even if foundation model progress is decelerating, it may not matter as much for companies implementing AI applications for specific industries as it does for companies such as OpenAI whose $157 billion valuation seems largely predicated on achieving artificial general intelligence (AGI). Or at least it’s predicated on a scenario in which OpenAI remains at the forefront of model development and has some kind of defensible moat around its business, which won’t be the case if building ever bigger LLMs doesn’t confer a significant capability advantage to justify the cost.
It’s about solutions, not models
Many of these AI application companies are in the business of selling a solution to a specific industry problem, not selling one particular AI model or some vague concept like “general purpose intelligence.” In many cases, these solutions do not require AGI—or even necessarily any further leaps in AI capabilities. In some cases, just coupling together several existing models and fine-tuning them on data relevant to a particular professional task is all that’s required to create a pretty good business.
A great example of this from the world of legal tech is Robin AI. The company was founded in 2019 by Richard Robinson, a former lawyer at the firm Clifford Chance, and James Clough, a former machine learning researchers at Imperial College and Kings College London. Robin doesn’t just sell companies a particular piece of technology. Instead, it sells legal services to large corporations—with some of those services delivered automatically through AI software, and some of those services delivered by human lawyers and paralegals on Robin’s payroll, who are assisted by technology, including AI, that Robin has developed.
“It’s a combination of doing things that the models are currently capable of, but also investing in what is just out of reach today, and then using humans in the loop to bridge the capability gap,” Robinson tells me.
Bridging the Expectations Gap
He acknowledges that “there is a gap between what people expect the models can do and what they can actually do reliably.” For instance, he says, the most advanced AI models are now excellent at summarization and pretty good at translation. But they can’t yet negotiate a complex legal document reliably nor can they draft a brief for a court case accurately. “They can do parts of the task, but nothing like the whole thing,” he says.
But—and here’s the crucial thing—Robin AI has a viable business even with those gaps. And it will still have a viable business even if those gaps close only slowly, or perhaps even never close at all.
That’s because, while some customers do just buy the software from Robin, others outsource an entire legal task to the company—and it is up to Robin to figure out how best to deliver that task at a given price.
“We have people, but they are highly optimized with our technology and that massively reduces the cost,” Robinson says, noting that the company does not engage in labor arbitrage by hiring paralegals in low-cost countries like India or the Philippines. Instead, it has lawyers and paralegals on the payroll in New York, London, and Singapore—but they can work much faster assisted by Robin’s legal copilot technology. And that tech doesn’t just consist of foundation models developed by the likes of OpenAI, Anthropic, and Meta, but also a whole host of other technologies, including search algorithms and old-fashioned hard-coded rules, all chained together in a complex workflow.
A $25 million “B Plus” Round for Robin
In a sign of confidence in Robin’s prospects, Eye on AI can report that the company has closed a “Series B Plus” round of $25 million, on top of its initial $26 million Series B fundraising announced in January. This brings the total amount Robin AI has raised to date to $61.5 million.
Investors in the new funding round include the venture arm of PayPal, billionaire Michael Bloomberg’s family office, which is called Willets, and the University of Cambridge—all of which are also customers of Robin AI. The original Series B round was led by Temasek. The company did not disclose its valuation following the latest investment. It said it is currently earning $10 million in annual recurring revenue.
Robinson says the company wanted to take on further investment, even though it still has plenty of financial runway left from the initial Series B, in part to add additional features to a product called “Reports” that has proved especially popular with customers. Reports allows users to ask unlimited questions about a set of documents. It uses Anthropic’s Claude model under the hood to help power its responses. Robinson says the company is hoping to add even more reasoning abilities to what Reports can do—but that using the most advanced foundation models adds to the company’s costs, which is why having additional funding in the bank is helpful.
Robin AI is also in competition with a lot of deep-pocketed rivals, including Harvey AI, which is backed by OpenAI and this past summer raised a $100 million funding round at a $1.5 billion valuation. It is also competing with products from Thomson Reuters, which owns Westlaw and which has acquired several legal AI startups, including Casetext, which it bought for $650 million in 2023.
Analyzing 10,000 contracts in hours
In one recent case, Robin says it helped an unnamed U.S. biotech firm deal with a data breach—reviewing 10,000 contracts, across 30 different contract types, to understand what the biotech’s obligations were in terms of notifying counterparties about the breach. Using the Reports product, as well as Robin’s human legal experts, Robin says the biotech was able to identify the 50 highest priority contracts that required notification in just hours, and have an action plan for all 10,000 contracts within 72 hours. It estimated that this saved the biotech company 93% of the time and 80% of the estimated $2.6 million it would have taken to hire an outside law firm to manually review the contracts. That’s value companies are deriving from AI today. And it’s value that is not going away, even if GPT-5 proves not to be as big an advance on GPT-4 as GPT-4 was on GPT-3.
With that, here’s more AI news.
Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn
**Before we get the news: If you want to learn more about what’s next in AI and how your company can derive ROI from the technology, join me in San Francisco on December 9 and 10 for Fortune Brainstorm AI. We’ll hear about the future of Amazon Alexa from Rohit Prasad, the company’s senior vice president and head scientist, artificial general intelligence; we’ll learn about the future of generative AI search at Google from Liz Reid, Google’s vice president, search; and about the shape of AI to come from Christopher Young, Microsoft’s executive vice president of business development, strategy, and ventures; and we’ll hear from former San Francisco 49er Colin Kaepernick about his company Lumi and AI’s impact on the creator economy. You can view the agenda and apply to attend here. (And remember, if you write the code KAHN20 in the “Additional comments” section of the registration page, you’ll get 20% off the ticket price—a nice reward for being a loyal “Eye on AI” reader!)
AI IN THE NEWS
Robotaxi company Zoox launches in San Francisco. The company’s autonomous taxis will initially be available only to Zoox employees and operate only in one neighborhood, SoMa, Zoox said in a blog post. Still, this marks Zoox’s entry into a second market, following Las Vegas where the company has operated autonomously on public roads since 2023 and where it has now expanded its operations to cover the Vegas Strip. Unlike some competing self-driving cars, Zoox’s robotaxis lack manual controls.
Google DeepMind and sister company Isomorphic make AlphaFold 3 publicly available for research. The two Alphabet divisions said in an updated blog post they were making the model weights and code of AlphaFold 3 available for free to academic researchers on GitHub. The model can predict the structure and interactions of every type of biological molecule, including proteins, DNA, RNA, ligands and more. It could help researchers in myriad ways. But commercial use of the model by drug discovery companies is not permitted under the AlphaFold 3 license terms.
Chinese company Tencent claims title of most capable open weight AI model. The Chinese internet giant unveiled its Hunyuan-Large model, and said it beat Meta’s Llama 3.1 405B model on a range of benchmark tests. As with Meta’s models, Hunyuan is an “open model” but not truly an open-source one, since the model weights are made public, but not the data on which the model was trained. You can read more about Hunyuan and the benchmark results in a paper Tencent published here.
EYE ON AI RESEARCH
It turns out that jailbreaking LLM-powered robots is just as easy as jailbreaking LLM-powered chatbots. That’s perhaps not surprising, but it is disturbing. Researchers have found that large language models are relatively easy to jailbreak—getting the AI system to jump its guardrails and provide outputs that it is not supposed to and that might be dangerous (like giving someone a recipe for building a bomb or telling someone to self-harm). But this kind of jailbreaking is even more dangerous when the LLM controls a real robot that can take actions in the world and might cause direct physical harm.
Researchers at the University of Pennsylvania developed a piece of software called RoboPAIR, designed to automatically find prompts that will jailbreak an LLM-controlled robot and tested it on three different robot systems. In each case, RoboPAIR achieved 100% success rate in overcoming the robot’s guardrails within a few days of trying. The system even worked against Go2, a robot control system whose code is not publicly-available, meaning RoboPAIR could only look at the robot’s response to prompts for clues as to how to shape an attack to beat its guardrails. You can read more about the research in a story in IEEE Spectrum here.
FORTUNE ON AI
Art made by humanoid robot sells for $1 million at auction—by Chris Morris
Think Donald Trump’s AI policy plans are predictable? Prepare to be surprised—by Sharon Goldman
Duolingo’s new eyerolling emo chatbot Lily briefly replaces CEO on investor call to showcase its AI technology—by Christiaan Hetzner
AI CALENDAR
Nov. 19-22: Microsoft Ignite, Chicago
Nov. 20: Cerebral Valley AI Summit, San Francisco
Nov. 21-22: Global AI Safety Summit, San Francisco
Dec. 2-6: AWS re:Invent, Las Vegas
Dec. 8-12: Neural Information Processing Systems (Neurips) 2024, Vancouver, British Columbia
Dec. 9-10: Fortune Brainstorm AI, San Francisco (register here)
Dec. 10-15: NeurlPS, Vancouver
Jan. 7-10: CES, Las Vegas
BRAIN FOOD
Are AI’s scaling laws broken? Back in 2020, researchers at OpenAI posited that LLMs followed what they called scaling laws—that taking the same basic model design but making the model larger and training it on more data would lead to an increase in performance proportional to the increase in model size and data. The OpenAI researchers called these scaling laws because they wanted to evoke laws of physics—inexorable truths—but they were never more than observations of what had seemed to work at the moment. And now there is growing evidence that they aren’t holding any longer—that an increase in model size and data may, after a certain point, yield diminishing returns.
OpenAI has found that its latest AI model, codenamed Orion, which was supposed to be a successor to its GPT-4 model, has, despite being larger and trained on more data, failed to beat GPT-4 on some key metrics, according to a blockbuster report from The Information that cited unnamed company employees. In particular, Orion’s skill at tasks such as coding, was not improved, and might have even been worse, than GPT-4o’s.
As a result, the publication reported, OpenAI is having to fall back on other techniques to improve Orion’s performance. This may include fine-tuning the model more after its initial training, as well as merging the base Orion model with a system more similar to OpenAI’s o1 “Strawberry” model, which is trained with reinforcement learning to use a search process across multiple possible response pathways to “reason” its way to a better answer.
What will this mean for the whole AI boom? It’s unclear, but it certainly makes OpenAI’s path to AGI—and that of the other companies that now say they are pursuing that goal, from Google DeepMind to Meta to Amazon—look more difficult. The good news though is that this set back may mean companies will look more seriously at other AI architectures and algorithms that might be much more learning efficient—using less data, less computer hardware, and less energy. And that should be good for the world, even if it might not be good news for OpenAI.