You can listen to the audio edition of this essay on Spotify here.
Introduction
Air Street Press recently co-published a piece with our friend Moritz on how data acquisition strategies for AI-first companies had changed since 2016. In doing so, we encountered the inevitable think pieces about how ‘data is the new oil’. In fact, the phrase continues to appear in government consultations and Parliament to this day. In this piece we dive into why the analogy is misleading.
While it’s easy to brush this off, language matters, and the forms of short-hand we use have an outsized impact on how we perceive the world.
This is only heightened when we approach the new and the unfamiliar. The number of AI-related analogies has exploded considerably in the past few years. At the same time, advances in capabilities have created a new generation of AI-first companies that are fighting to position themselves in emerging markets.
Both these trends shed some light on the promise and perils of analogies and provide lessons for policymakers and entrepreneurs alike.
Oil, parrots, and Shoggoth
Comparisons between data and oil have been used to make points about a range of issues from geopolitics and antitrust through to the environmental impact of AI and the value of data.
The analogy never made much sense. Unlike oil, the following is true of data (and by extension AI):
It’s a growing, not an expendable resource;
Its routinely reused;
Synthetic variants are easy to generate in a scalable way thanks to LLMs;
Obtaining or creating it is not a high risk, capital intensive business. Moving or copying it has little marginal cost;
The price has only ever trended downwards, due to growth in supply, lowered storage and transmission costs, and cheaper compute;
The geopolitical advantage remains hypothetical.
As well as being a bad analogy in and of itself, it conjured up a number of unhelpful associations from a policy standpoint.
Firstly, a scarcity mindset. Data became innately precious, a source of strategic advantage in and of itself, to be guarded through localisation laws. In the UK, government and activist groups began assigning implausibly high value to the data held by the country’s National Health Service, contributing to a reluctance to share it with potential private technology parties.
Secondly, it fuelled a number of distorted but influential assumptions about AI and geopolitics. Kai-Fu Lee emerged as the leader of this camp, arguing that as the “Saudi Arabia of data”, China had an innate advantage in the AI race. This analogy was always shaky. Oil gives Saudi Arabia leverage, but not global leadership, and much of this lies in its ability to control supply (to print money). This clearly is not the case with China and data. However, these snappy arguments drowned out the more sophisticated voices pointing out the relationship between volume of data and advantage in specific domains of strategic importance isn’t always clear cut.
The challenge with these phrases is that they simultaneously sound meaningful (they trigger associations), while lacking any specific stable meaning (the associations are largely subjective). When the data scientist Clive Humby first said “data is the new oil” at a marketing summit in 2006, he was making a relatively banal point about how data needed to be refined before it was useful. The dozens of permutations and interpretations all came later.
In a similar way, we often see the phrase ’stochastic parrot’ thrown around by people expressing skepticism about the capabilities or significance of LLMs. At a recent event, a senior member of the national security community actually told us that the government didn’t need to engage closely with the technology as “it’s just a stochastic parrot”. As well as being a poor analogy that doesn’t account for emergent capabilities, it’s been routinely … parroted with little regard for its origins. The authors of the original stochastic parrot paper never argued that LLMs lacked power or relevance, the concern was that their lack of reasoning ability made them actively dangerous, precisely because they were powerful.
When we get to the frontier and increasingly think ahead to hypothetical future capabilities, the lure of colorful shorthand becomes even stronger. While few people in the AI world were sad to see the death of the 2016-2019 era Terminator comparisons, its successors weren’t improvements and have contributed to an irrational, vibes-driven conversation about safety. ‘God-like AI’, which helped fuel x-risk panic in the corridors of power, fell into the category of sounding meaningful but lacking specificity. Whereas, on closer inspection, ‘Shoggoth’ memes appear to be a darker manifestation of ‘stochastic parrots’.
We’re seeing this play out in the governance debate, with calls for an International Atomic Energy Agency (IAEA) for AI’ implicitly evoking comparisons between advanced AI and nuclear. Not only is the way AI developed completely different (highly decentralized, overwhelmingly by private actors), it places a hypothetical armageddon at the center of the conversation. This does not frame the discussion in a balanced or constructive way.
Does it matter?
Matthijs Maas of the Centre for Law & AI published a paper last year that persuasively made the case that popular metaphors had shaped policymakers’ response to technology throughout history. The framing of the internet as ‘cyberspace’ led the US military to view it as another ‘domain’ of conflict, along with the land, sea, air, and space and was used to support the creation of US Cyber Command.
On the legal front, tech companies have fought off calls to hold them responsible for the content they host by arguing that, like newspapers’ editorial choices, they are protected speech. We’re currently seeing a fresh ‘battle of the analogies’ taking off in copyright disputes around generative models, with Stable Diffusion and Midjourney being accused of building “sophisticated collage tools, with the output representing nothing more than a mash-up of training data”.
From law through to military strategy, it turns out words matter very much.
ChatGPT for analogies
The temptation of the easy analogy isn’t just a challenge for policy-makers. It’s equally dangerous for entrepreneurs.
In our line of work, we encounter thousands of ideas a year - whether it’s a pitch deck, a LinkedIn profile, or a snatched conversation at a meet-up (our events page is here). The single most lackluster thing anyone can say is that they’re building a “ChatGPT for X”.
Defining what you’re building by reference to somebody else’s product is a sign to investors and customers that you lack confidence. By definition, you’re starting a conversation by saying what you’re not working on. It also ties perceptions of your company to someone else’s brand, which by definition you cannot control. If GPT-5 disappoints an expectant world, while Claude 4 stuns us, do you hurriedly re-do your elevator pitch?
There is precedent for this. Many readers will remember the 2016-2018 enthusiasm for the “Uber for X” model of positioning that followed on from the long-forgotten “Facebook for X” era.
In fact, when we look back at Uber’s original pitch deck from 2008, they fell into the same trap:
If Uber had launched as “the NetJets of Limos” we would likely never have heard from it again. Fortunately, they went in a different direction and positioned themselves boldly, without reference to the incumbents they were disrupting:
If we consider the companies we admire, they never do this. Lambda Automata don’t describe themselves as the ‘BAE Systems for AI’ or ‘Anduril but European’. Similarly, our friends at Profluent don’t tell the world they’re building the ChatGPT for proteins.
In much the same way that hazy reasoning distorts policy, it can also lead to bad strategy. The “Uber for X” businesses largely failed, whether it was Uber for meal ingredients, laundry, parking spaces, car washes or home repairs. This was partly down to market conditions, but also I believe because the snappy shorthand obscured what made Uber successful. Many interpreted the Uber formula as “service people find useful but on an app”. This skipped out multiple layers of complexity.
Uber took something:
That was a commodity, where there’s only limited variation in skill or user need - taxi rides;
Where immediacy was genuinely important - no one likes missing a flight or being stood in the rain;
Where the existing market was inefficient - demand spikes weren’t consistently matched with under-utilized supply.
Instead, many of these start-ups focused on more specialized services, where user need or the skill of the provider genuinely varied, or targeted non-real pain points. They’d not understood Uber at all.
Closing thoughts
Abolishing all short-hand would be both impossible and profoundly undesirable. It’s incredibly useful to be able to refer to a complex or unfamiliar concept by reference to something known. However, policymakers, investors, entrepreneurs, and others would be better served by doing this more selectively.
This could mean using analogies or comparisons for individual features, while avoiding lazy totalising short-hand. In the policy world, it involves accepting that there isn’t a comprehensive or perfect analogy for advanced AI. Some future capabilities might seem ‘God-like’ while Shoggoth can remind us that reinforcement learning from human feedback should be approached with caution.
You may not want to call your product “ChatGPT for X”, but it’s perfectly reasonable to say that, for instance, users would see a familiar ChatGPT-style interface. Alongside help with hiring, marketing, and product, we work closely with founders on story-telling precisely because these nuances matter. Either way, a founder should be the chief evangelist for their company, which means having the courage to chart their own course, simplify the complex, and convince other people that it matters.
Shoggoths, are less a darker manifestation of stochastic parrot, and more a digital incarnation of an actively malicious actor.