The AI-Powered Web Is Eating Itself

Credits

Hamilton Mann is the 2025 recipient of the Thinkers50 Distinguished Achievement Award for Digital Thinking. An AI Researcher, he is also a lecturer at INSEAD and HEC Paris. He is the author of “Artificial Integrity: The Paths to Leading AI Toward a Human-Centered Future” (Wiley, 2024).

Suppose you’re craving lasagna. Where do you turn for a recipe? The internet, of course.

Typing “lasagna recipe ideas” into Google used to surface a litany of food blogs, each with its own story: a grandmother’s family variation, step-by-step photos of ingredients laid out on a wooden table, videos showing technique and a long comment section where readers debated substitutions or shared their own tweaks. Clicking through didn’t just deliver instructions; it supported the blogger through ads, affiliate links for cookware or a subscription to a weekly newsletter. That ecosystem sustained a culture of experimentation, dialogue and discovery.

That was a decade ago. Fast forward to today. The same Google search can now yield a neatly packaged “AI Overview,” a synthesized recipe stripped of voice, memory and community, delivered without a single user visit to the creator’s website. Behind the scenes, their years of work, including their page’s text, photos and storytelling, may have already been used to help train or refine the AI model.

You get your lasagna, Google gets monetizable web traffic and for the most part, the person who created the recipe gets nothing. The living web shrinks further into an interface of disembodied answers, convenient but ultimately sterile.

This isn’t hypothetical: More than half of all Google searches in the U.S. and Europe in 2024 ended without a click, a report by the market research firm SparkToro estimated. Similarly, the SEO intelligence platform Ahrefs published an analysis of 300,000 keywords in April 2025 and found that when an AI overview was present, the number of users clicking into top-ranked organic search results plunged by an average of more than a third.

Users are finding their questions answered and their needs satisfied without ever leaving the search platform.

Until recently, an implicit social contract governed the web: Creators produced content, search engines and platforms distributed it, and in return, user traffic flowed back to the creators’ websites that sustained the system. This reciprocal bargain of traffic in exchange for content underwrote the economic, cultural and information-based fabric of the internet for three decades.

Today, the rise of AI marks a decisive rupture. Google’s AI Overviews, Bing’s Copilot Search, OpenAI’s ChatGPT, Anthropic’s Claude, Meta’s Llama and xAI’s Grok effectively serve as a new oligopoly of what are increasingly being called “answer engines” that stand between users and the very sources from which they draw information.

This shift threatens the economic viability of content creation, degrades the shared information commons and concentrates informational power.

To sustain the web, a system of Artificial Integrity must be built into these AI “answer engines” that prioritizes three things: clear provenance that consistently makes information sources visible and traceable, fair value flows that ensure creators share in the value even when users don’t click their content and a resilient information commons that keeps open knowledge from collapsing behind paywalls.

In practical terms, that means setting enforceable design and accountability guardrails that uphold integrity, so AI platforms cannot keep all the benefits of instant answers while pushing the costs onto creators and the wider web.

Ruptured System

AI “answer engines” haven’t merely made it easier to find information, they have ruptured the web’s value loop by separating content creation from the traffic and revenue that used to reward it.

AI companies have harvested and utilized the creative labor of writers, researchers, artists and journalists to train large language models without clear consent, attribution or compensation. The New York Times has filed lawsuits against OpenAI and Microsoft, alleging that the tech giants used its copyrighted articles for this purpose. In doing so, the news organization claims, they are threatening the very business model of journalism.

In fact, AI threatens the business model of digital content creation across the board. As publishers lose traffic, there remains little incentive for them to keep content free and accessible. Instead, paywalls and exclusive licensing are increasingly the norm. This will continue to shrink the freely available corpus of information upon which both human knowledge and future AI training depend.

The result will be a degraded and privatized information base. It will leave future AI systems working with a narrower, more fragile foundation of information, making their outputs increasingly dependent on whatever remains openly accessible. This will limit the diversity and freshness of the underlying data, as documented in a 2024 audit of the “AI data commons.” 

“The living web is shrinking into an interface of disembodied answers, convenient but ultimately sterile.”

At the same time, as more of what is visible online becomes AI-generated and then reused in future training, these systems will become more exposed to “model collapse,” a dynamic documented in a 2024 Nature study. It showed that when real data are replaced by successive synthetic generations, the tails of the original distribution begin to disappear as the model’s synthetic outputs begin to overwrite the underlying reality they were meant to approximate. 

Think of it like making a photocopy of a photocopy, again and again. Each generation keeps the bold strokes and loses the faint details. Both trends, in turn, weaken our ability to verify information independently. In the long run, this will leave people relying on systems that amplify errors, bias and informational blind spots, especially in niche domains and low-visibility communities.

Picture a procurement officer at a mid-sized bank tasked with evaluating vendors for a new fraud-detection platform. Not long ago, she would have likely turned to Google, LinkedIn or industry portals for information, wading through detailed product sheets, analyst reports and whitepapers. By clicking through to a vendor’s website, she could access what technical information she might need and ultimately contact the company. For the vendor, each click also fed its sales pipeline. Such traffic was not incidental; it was the lifeblood of an entire ecosystem of marketing metrics, job underwriting, marketing campaigns and specialized research.

These days, the journey looks different. A procurement officer’s initial query would likely yield an AI-generated comparison condensing the field of prospects into a few paragraphs: Product A is strong on compliance; product B excels at speed; product C is cost-effective. Behind this synthesis would likely lie numerous whitepapers, webinars and case studies produced by vendors and analysts — years of corporate expertise spun into an AI summary.

As a result, the procurement officer might never leave the interface. Vendors’ marketing teams, seeing dwindling click-driven sales, might retreat from publishing open materials. Some might lock reports behind steep paywalls, others might cut report production entirely and still others might sign exclusive data deals with platforms just to stay visible.

The once-diverse supply of open industry insight would contract into privatized silos. Meanwhile, the vendors would become even more dependent on the very platforms that extract their value.

Mechanisms At Play

The rupture we’re seeing in the web’s economic and informational model is driven by five mutually reinforcing mechanisms that determine what content gets seen, who gets credited and who gets paid. Economists and product teams might call these mechanisms intent capture, substitution, attribution dilution, monetization shifts and the learning loop break. 

Intent capture happens when the platform turns an online search query into an on-platform answer, keeping the user from ever needing to click the original source of information. This mechanism transforms a search engine’s traditional results page from an open marketplace of links essentially into a closed surface of synthesized answers, narrowing both visibility and choice. 

Substitution, which takes place when users rely on AI summaries instead of clicking through to source links and giving creators the traffic they depend on, is particularly harmful. This harm is most pronounced in certain content areas. High substitution occurs for factual lookups, definitions, recipes and news summaries, where a simple answer is often sufficient. Conversely, low substitution occurs for content like investigative journalism, proprietary datasets and multimedia experiences, which are harder for AI to synthesize into a satisfactory substitute.

The incentives of each party diverge: Platforms are rewarded for maximizing query retention and ad yield; publishers for attracting referral traffic and subscribers; and regulators for preserving competition, media plurality and provenance. Users, too, prefer instant, easily accessible answers to their queries. This misalignment ensures that platforms optimize for closed-loop satisfaction while the economic foundations of content creation remain externalized and underfunded.

Attribution dilution compounds the effect. When information sources are pushed behind dropdowns or listed in tiny footnotes, the credit exists in form but not in function. Search engines’ tendency to simply display source links, which many do inconsistently, does not solve the issue. These links are often de-emphasized and generate little or no economic value, creating a significant consent gap for content used in AI model training. When attribution is blurred across multiple sources and no value accrues without clicks or compensation, that gap becomes especially acute. 

“AI ‘answer engines’ have ruptured the web’s value loop by separating content creation from the traffic and revenue that used to reward it.”

Monetization shifts refer to the redirected monetary value that now often flows solely to AI “answer engines” instead of to content creators and publishers. This shift is already underway, and it extends beyond media. When content promoting or reviewing various products and services receives fewer clicks, businesses often have to spend more to be discovered online, which can raise customer acquisition costs and, in some cases, prices. 

This shift can also impact people’s jobs: Fewer roles may be needed to produce and optimize web content for search, while more roles might emerge around licensing content, managing data partnerships and governing AI systems. 

The learning loop break describes the shrinking breadth and quality of the free web as a result of the disruptive practices of AI “answer engines.” As the information commons thins, high-quality data becomes a scarce resource that can be controlled. Analysts warn that control of valuable data can act as a barrier to entry and concentrate gatekeeper power.

This dynamic is comparable to what I refer to as a potential “Data OPEC,” a metaphor for a handful of powerful platforms and rights-holders controlling access to high-quality data, much as the Organization of Petroleum Exporting Countries (OPEC) controls the supply of oil.

Just as OPEC can restrict oil supply or raise prices to shape global markets, these data gatekeepers could restrict or monetize access to information used to build and improve AI systems, including training datasets, raising costs, reducing openness and concentrating innovation power in fewer hands. In this way, what begins as an interface design choice cascades into an ecological risk for the entire knowledge ecosystem.

The combined effect of these five mechanisms is leading to a reconfiguration of informational power. If AI “answer engines” become the point of arrival for information rather than the gateway, the architecture of the web risks being hollowed out from within. The stakes extend beyond economics: They implicate the sustainability of public information ecosystems, the incentives for future creativity and the integrity of the informational commons.

Left unchecked, these forces threaten to undermine the resilience of the digital environment on which both creators and users depend. What is needed is a systemic redesign of incentives, guided by the framework of Artificial Integrity rather than artificial intelligence alone.

Artificial Integrity

Applied to the current challenge, Artificial Integrity can be understood across three dimensions: information provenance integrity, economic integrity of information flows and integrity of the shared information commons.

Information provenance integrity is about ensuring that sources are visible, traceable and properly credited. This should include who created the content, where it was published and the context in which it was originally presented. The design principle is transparency: Citations must not be hidden in footnotes. 

Artificial Integrity also requires that citations carry active provenance metadata, a verifiable, machine-readable signature linking each fragment of generated output to its original source, allowing both users and systems to trace information flows with the same rigor as a scientific citation. 

That introduces something beyond just displaying source links: It’s a systemic design where provenance is cryptographically or structurally embedded, not cosmetically appended. In this way, provenance integrity becomes a safeguard against erasure, ensuring that creators remain visible and credited even if the user doesn’t click through to the original source.

Economic integrity of information flows is about ensuring that value flows back to creators, not only to platforms. Artificial Integrity requires rethinking how links and citations are valued. In today’s web economy, a link matters only if it is clicked, which means that sources that are cited but not visited capture no monetary value. In an integrity-based model, the very act of being cited in an AI-generated answer would carry economic weight, ensuring that credit and compensation flow even when user behavior stops at the interface.

This would realign incentives from click-chasing to knowledge contribution, shifting the economy from performance-only to provenance-aware. To achieve this, regulators and standards bodies could require that AI “answer engines” compensate not only for traffic delivered, but also for information cited. Such platforms could implement source prominence rules so that citations are not hidden in footnotes but embedded in a way that delivers measurable economic value. 

Integrity of the shared information commons is about ensuring that the public information base remains sustainable, open and resilient rather than degraded into a paywalled or privatized resource. Here, Artificial Integrity calls for mandatory reinvestment of AI platform revenues into open datasets as a built-in function of the AI lifecycle. This means that large AI platforms such as Google, OpenAI and Microsoft would be legally required to dedicate a fixed percentage of their revenues to sustaining the shared information commons. 

“AI platforms cannot keep all the benefits of instant answers while pushing the costs onto creators and the wider web.”

This allocation would be architecturally embedded into their model development pipelines. For example, a “digital commons fund” could channel part of Google’s AI revenues into keeping resources like Wikipedia, PubMed or open academic archives sustainable and up to date. Crucially, this reinvestment would be hardcoded into retraining cycles, so that every iteration of a model structurally refreshes and maintains open-access resources alongside its own performance tuning. 

In this way, the sustainability of the shared information commons would become part of the AI system’s operating logic, not just a voluntary external policy. In effect, it would ensure that every cycle of AI improvement also improves the shared information commons on which it depends, aligning private platform incentives with public information sustainability.

We need to design an ecosystem where these three dimensions are not undermined by the optimization-driven focus of AI platforms but are structurally protected, both in how the platforms access and display content to generate answers, and in the regulatory environment that sustains them.

From Principle To Practice

To make an Artificial Integrity approach work, we would need systems for transparency and accountability. AI companies would have to be required to publish verifiable aggregated data showing whether users stop at their AI summaries or click outward to original sources. Crucially, to protect the users’ privacy, this disclosure would need to include only aggregated interactions metrics reporting overall patterns. This would ensure that individual user logs and personal search histories are never exposed. 

Independent third-party auditors, accredited and overseen by regulators much like accounting firms are today, would have to verify these figures. Just as companies cannot self-declare their financial health but must submit audited balance sheets, AI platforms would no longer be able to simply claim they are supporting the web without independent validation.

In terms of economic integrity of information flows, environmental regulation offers a helpful analogy. Before modern environmental rules, companies could treat pollution as an invisible side effect of doing business. Smoke in the air or waste in the water imposed real costs on society, but those costs did not show up on the polluter’s balance sheet.

Emissions standards changed this by introducing clear legal limits on how much pollution cars, factories and power plants are allowed to emit, and by requiring companies to measure and report those emissions. These standards turned pollution into something that had to be monitored, reduced or paid for through fines and cleaner technologies, instead of being quietly pushed onto the public. 

In a similar way, Artificial Integrity thresholds could ensure that the value that AI companies extract from creators’ content comes with financial obligations to those sources. An integrity threshold could simply be a clear numerical line, like pollution limits in emissions standards, that marks the point at which an AI platform is taking too much value without sending enough traffic or revenue back to sources. As long as the numbers stay under the acceptable limit, the system is considered sustainable; once they cross the threshold, the platform has a legal duty to change its behavior or compensate the creators it depends on.

This could be enforced by national or regional regulators, such as competition authorities, media regulators or data protection bodies. Similar rules have begun to emerge in a handful of jurisdictions that regulate digital markets and platform-publisher relationships, such as the EU, Canada or Australia, where news bargaining and copyright frameworks are experimenting with mandatory revenue-sharing for journalism. Those precedents could be adapted more broadly as AI “answer engines” reshape how we search online.

These thresholds could also be subject to standardized independent audits of aggregated interaction metrics. At the same time, AI platforms could be required to provide publisher-facing dashboards exposing the same audited metrics in near real-time, showing citation frequency, placement and traffic outcomes for their content. These dashboards could serve as the operational interface for day-to-day decision-making, while independent audit reports could provide a legally verified benchmark, ensuring accuracy and comparability across the ecosystem.

In this way, creators and publishers would not be left guessing whether their contributions are valued. They would receive actionable insight for their business models and formal accountability. Both layers together would embed provenance integrity into the system: visibility for creators, traceability for regulators and transparency for the public. 

“Artificial Integrity thresholds could ensure that the value that AI companies extract from creators’ content comes with financial obligations to those sources.”

Enforcement could mix rewards and penalties. On the reward side, platforms that show where their information comes from and that help fund important public information resources could get benefits such as tax credits or lighter legal risk. On the penalty side, platforms that ignore these integrity rules could face growing fines, similar to the antitrust penalties we already see in the EU.

This is where the three dimensions come together: information provenance integrity in how sources are cited, economic integrity of information flows in how value is shared and the integrity of the shared information commons in how open resources are sustained.

Artificial Integrity for platforms that deliver AI-generated answers represents more than a set of technical fixes. By reframing AI-mediated information search not as a question of feature tweaks but as a matter of design, code and governance in AI products, it addresses a necessary rebalancing toward a fairer and more sustainable distribution of value on which the web depends, now and in the future.

Comments (0)

AI Article