A post hit the Hacker News frontpage this week that sparked one of those rare discussions where everyone walks away thinking differently. The claim: the "small web" - personal blogs, indie projects, single-person SaaS tools, hobby sites - is not a niche curiosity. It is actually massive, and it is growing faster than the corporate web.
The data backing this up is interesting. Someone ran a large-scale crawl of the web and categorized sites by characteristics - single author, no tracking scripts, no cookie banners, minimal JavaScript, chronological content. The small web, by this definition, is not a handful of nostalgic bloggers. It is millions of active sites producing original content that Google's algorithm increasingly buries under SEO-optimized corporate content farms.
Why This Matters for AI and Automation
If you are building AI agents that consume web content, the small web is probably the most valuable data source you are ignoring. Corporate content is optimized for search engines, not for information density. It is padded, keyword-stuffed, and structured around advertising. Small web content is written by people who actually know things, for an audience that actually wants to learn.
The training data problem compounds this. Most AI models are trained predominantly on corporate web content because that is what crawlers find most easily. The small web is harder to crawl (diverse hosting, no sitemaps, non-standard structures) but richer in genuine expertise. There is a real opportunity for AI systems that specifically target small web content for retrieval and training.
The Discovery Problem
The biggest challenge with the small web is finding things. Google used to be good at this - early Google surfaced small, high-quality sites remarkably well. Modern Google buries them under corporate results, ads, and AI-generated summaries. The HN discussion spawned several alternative approaches:
Curated directories. People are literally going back to Yahoo-style human-curated link directories. It sounds retro, but it works. A thoughtful human categorizing 10,000 small web sites produces a better discovery experience than an algorithm processing 10 billion corporate pages.
Webring revivalism. Webrings - the 1990s concept where related sites link to each other in a ring - are making a comeback. Several modern webring platforms launched in the past year, and they are growing fast. The social graph of small web sites linking to each other creates a natural discovery mechanism.
Small web search engines. Dedicated search engines that only index the small web. They use signals like site size, author count, tracking presence, and content originality to filter out corporate content. The results are refreshingly useful for technical topics.
What the Agent Community Should Do
The agent builder community is particularly well-positioned to benefit from the small web. Here is why:
Most agent builders write about their work. They publish blog posts, maintain documentation, share configurations, write tutorials. This content lives on personal sites, small hosting platforms, GitHub Pages, and niche blogs. It is exactly the kind of high-signal, practitioner-written content that agents should be trained on and retrieving from.
But our tools are optimized for the big web. When an agent does a web search, it gets corporate results. When it does RAG over web content, the embeddings are dominated by SEO-optimized text. We need better tooling for targeting small web content specifically.
A practical step: build a curated index of small web sites in your domain. For AI agents, that might be 500-1000 personal blogs, indie project sites, and practitioner-written resources. Use this as a preferred retrieval source before falling back to general web search. The quality difference is noticeable.
The Bigger Picture
The small web represents something important - humans creating for humans, without intermediaries optimizing for engagement or advertising. In an era of AI-generated content flooding the web, the small web is the last reliable source of genuine human expertise and opinion.
That the small web is thriving and growing - not just surviving as a niche curiosity - is one of the most encouraging things I have read all year. We should be building tools that support it, not tools that make it invisible.