Just as traditional SEO no longer guarantees visibility in AI search engines, HTML alone isn’t enough either.
AI systems prefer structured formats or APIs to ingest and surface content more efficiently. And “markdown” has quickly become the common language used by AI systems and agents.
Plus, as the volume of non-human agents visiting web pages continues to increase (one in every 31 visits to a site is from a non-human agent, per Tollbit’s latest report), the inevitability of needing to cater better to these agents’ requirements rises.
Now, background tech is being adjusted to better accommodate this. Take Cloudflare’s Markdown for Agents feature, unveiled last week as a prime example. It means that publishers – or any digital IP owner that’s a customer of Cloudflare’s – can automatically convert HTML into structured markdown with a single toggle.
Let’s strip back what markdown means for the wider AI ecosystem.
What exactly is markdown?The term markdown itself has existed for decades, but it has more recently been applied to large language models (LLMs) and how they prefer to ingest information.
HTML is for browsers, not AI models. It contains layout, style, and navigation information that humans or browsers use. But that’s mostly irrelevant to LLMs.
“Lots of the core parts of an HTML website – the footer, the CSS (styling) – are necessary to render the web page, but aren’t necessary if you just want the ideas in the content,” said Will Allen, vp of product for Cloudflare.
And as the company says on its blog announcing its new feature: “A simple ‘About Us’ on a page in markdown costs roughly three tokens; its HTML equivalent – <h2 class=”section-title” id=”about”>About Us</h2> – burns 12 to 15 tokens, and that’s before you account for the <div> wrappers, nav bars, and script tags that pad every real web page and have zero semantic value.”
Wait, what?Think of it like this: Feeding an LLM raw HTML is like giving a chef your entire kitchen – the utensils, the fridge, the sink and every random ingredient – when all they really need is the recipe. Whereas markdown is like handing them the recipe: structured, essential and easy to follow. Stuff like headings, lists, links and tables – all clearly structured and easy for AI to parse.
So how does it work?When an agent requests content from a page, they can include a specific ‘header’ in their request that effectively says ‘we’d prefer if you sent us the text only, not the entire HTML,’ said Allen. In Cloudflare’s version, if a website owner selects the Markdown for Agents feature, their HTML will automatically be converted into markdown – for text content, not images or video.
But many companies support markdown for agents; it’s not a proprietary product.
How is this good for AI companies?In short: less waste. Crawling gazillions of sites is (surprise!) not actually that efficient for the AI crawlers. So not having to crawl every single piece of information on the open web, but instead be given a shortcut to the “goods” needed to fulfil a prompt query effectively, wastes less computation and therefore lowers processing complexity and costs for the AI models.
Tokens…remind me?They’re the chunks of text that AI models like LLMs process. So chunks of words or spaces, symbols, and punctuation – they all count as tokens. There are input tokens, which are the prompt instructions a user will send to the AI chatbot and output tokens, which are what the AI generates. And the more tokens needed, the higher the cost (for the LLMs) because it drives up compute costs without improving the results, and the slower the responses.
Markdown sounds a lot like LLM.txt…They’re very similar. LLMs.txt is a specific type of markdown file that sits at the root of a website (digiday.com/llms.txt for example) to help AI models understand the site’s content.
There has been hockey stick-like growth (1,835%) in the number of sites using LLM.txt since last June, according to visual website experience platform Webflow. While over 20% of enterprise brands are experimenting with LLM.txt in Webflow, per the company.
“You need to think now about how you design and build your site both for the human audience and for a bot or LLM,” said Webflow CEO Linda Tong.
If only! Not in terms of payment, no. But there are some upsides. For example, whether you’re a news publisher or a brand with products to sell, AI answer engines are still littered with errors. And if your brand is associated with false information, it can result in loss of consumer trust or a loss in product sales.
“The clarity element from AEO [answer engine optimization – also known as GEO] is that it wants to really understand facts, and it really wants to be able to pull information out,” said Tong. “And so if the way that you structure those paragraphs, or you structure content on your site, isn’t easily understood by an LLM, it starts to hallucinate and it misrepresents you,” she said.
But errors can also be made simply because an LLM or any AI model can’t understand the nuance of intricately human-written prose.
A beautifully written article, for instance, that’s laden with metaphors and carries a build-up or theme across multiple paragraphs with smooth transitions – an enjoyable way for a reader to absorb it – an LLM will process the text chunk by chunk, often treating each paragraph as a separate unit. That means ideas that span multiple paragraphs can get lost or fragmented, because the model handles each block independently, noted Tong.
So this could reduce the amount of errors that arise in AI answer engines?In theory, yes. That could be helpful for brands that want their products to appear with the correct information and context around them. And brands will want their products surfaced within answer engines – their business models aren’t under as much direct threat perhaps as publishers reliant on referral traffic and digital ad revenue.
“I think for a business that doesn’t make money from ads these sorts of things are great,” said Paul Bannister, chief strategy officer at Raptive. “For ad-supported businesses, these are not very useful until there is also a payment model in place. But likely, these tools (like markdown for agents) are a necessary ingredient to get to a place where AI platforms do pay,” he said.
That’s what Cloudlfare’s vp of strategic partnerships for media, creators and AI, Lara Cohen, says too. “Our ultimate goal here is to create a flywheel where there’s benefit back to the publishers and content owners and to the LLMs, and keep a healthy internet that has a lot of different LLMs who can access our content, and a lot of different content owners who are continuing to be able to flourish, even though, you know, traditional referral search has been dropping so dramatically,” she said.
The hope is that by saving a ton of money on inefficient computing costs tied to the unnecessary burden of crawling everything, it will free up funds.
Whether or not those savings are returned to publishers, well, you’ll be hard pushed to find a publisher exec who believes that. But who knows? “If it’s cheaper to pay a publisher for the content than to pay a scraper company – which the AI companies pay tons of money to – then the AI companies will do it,” said Bannister.
So what happens if you’ve got bot blockers on?You can keep them on, or block some and not others, like ones you have an AI licensing deal with, for instance. If you want them blocked, they would also be blocked from being able to access the markdown.
“If you do have markdown, I would definitely route the allowed bots there,” said Justin Wohl, vp of strategy at Aditude and consultant for Salon.
Wohl said that how you use markdown will depend on your crawler strategy. If you’re trying to block all bots and await direct compensation for allowing them to crawl, then “don’t waste resources on markdown versions of your site right now, work on product for your human readers,” he said.
But if you are letting bots crawl your sites and hope to be rewarded in the form of search traffic or citations/links in generative AI outputs, then put a markdown version of content on your development roadmap, he added.
Comments (0)