There's a famous saying that every founder's first company is consumer. Every founder's second company is B2B.

Consumer startups are notoriously hard. You aren't packaging software to enterprises that can afford to pay thousands of dollars every month for a B2B AI SaaS wrapper. You are trying to tap into fundamental human behaviors, productize, and then capitalize on them. You are trying to vie for attention in an age where the dominant players (Meta, TikTok) essentially own the entire market. A good friend once told me that if you want to succeed in consumer, you have to make people rich, make them famous, or get them laid. It takes a particular type of founder to even want to play in the space.

As someone who has been building in consumer with Gabble for the last ~2 years, for a while I wondered if I'd made a mistake. I don't wonder anymore. The shift happening in AI training, RL, and finetuning has quietly made consumer interaction data one of the most strategically valuable assets in tech.

If you talk to any investor today, it's very hard to convince them to invest in a pure consumer play. The past decade of consumer "failures" has essentially made the category untouchable. For every TikTok, there are hundreds of failures. The best-known consumer startups of the 2010s are all "failures" and partly responsible for the radioactivity the space has.

Clubhouse, despite still being operational, overraised, couldn't justify its valuation, and faded with the pandemic. BeReal, despite getting acquired for ~$600 million and arguably being the best-known consumer product in years, eventually reached stagnancy and was forced to sell. Houseparty, despite reaching millions of DAUs, couldn't sustain itself and now sits somewhere in Fortnite's tech stack.

These companies are only "failures" in the eyes of investors whose sole mission is to return a fund. To everyone else, they should be considered massive successes.

HQ Trivia, YikYak, Path. I could go on for hours. Each of these companies achieved some success, tapped into mainstream culture and consumer habits, but then died. They raised huge amounts of venture dollars, were hyped by investors, and were well-positioned to take over, but couldn't figure out how to sustain their momentum. Naturally, a trend emerged and made it incredibly hard for GPs to convince LPs that it was worth deploying capital into consumer. As a result, there hasn't been a truly generational consumer company in years.

The data gap nobody's talking about

Since 2023, things have shifted. The AI boom changed everything. Building is easier. Shipping is easier. Productivity is through the roof. What the problem now is — and this is the part nobody is talking about out loud yet — is that the frontier labs are running out of the one thing that actually makes models good: human data.

The next frontier model advantage won't be compute or architecture; it will be proprietary human interaction data, and the companies sitting on it are consumer startups that most VCs wrote off.

The open web has largely been ingested. Common Crawl, Wikipedia, GitHub, and Reddit have all been consumed. The labs are in licensing wars with publishers, fighting legal battles with content platforms, and increasingly turning to synthetic data to fill the gap. However, synthetic data is not the solution they're hoping for. Models trained heavily on it start to collapse. They reinforce their own patterns, lose variance, and gradually become less capable of generalizing to the messy, unpredictable texture of actual human behavior. Organic data, generated by humans, reigns supreme.

Mercor was the clearest early proof of this. In its early days, it was a hiring platform. Underneath, it produced one of the most valuable structured datasets of human technical reasoning ever assembled — tens of thousands of interviews, problem-solving sessions, and evaluation outcomes. They figured out that the data their product naturally generated was worth selling to labs, and made a fortune doing so.

The labs can't do this themselves

The obvious objection: doesn't this just consolidate around the labs? If ChatGPT and Claude are becoming default interfaces for everything, surely they capture all the interaction data worth having? Not quite. The data the labs generate from their own products is generic by design: question, answer, task, completion. It feeds back into post-training, but only the shallow end of it. What the labs actually need is differentiated, high-context, behaviorally specific data — the kind that only gets generated when real people are genuinely engaged with something that isn't just a chatbot.

Cooking apps. Language learning. Dating. Fitness. Gaming. Anywhere humans are making decisions, expressing preferences, reasoning through problems, or changing their minds, there is a post-training signal. That data doesn't live inside a ChatGPT conversation log. It lives inside the products people actually use. Differentiated consumer apps aren't competing with the labs. They're producing what the labs cannot generate for themselves.

What comes next

We're about to see something interesting. A proliferation of consumer apps — hundreds, maybe thousands — all quietly positioning their data as an asset. Many will pitch themselves as AI training data plays and, frankly, will be able to justify doing so. More interestingly, retention may not be the only path to a valuable dataset. A viral app with terrible long-term retention can still produce enormously useful data if the interactions themselves are rich and the volume is there.

Houseparty ran from 2016 to 2021, during which millions of people had genuine, emotionally charged social conversations. It died, but the data didn't. Churn doesn't kill data value. Shallowness does.

Given all this, it's likely that Meta will win the AI wars. Meta has been assembling the most comprehensive picture of human social behavior ever constructed — across Facebook, Instagram, WhatsApp, Threads. Billions of users. Decades of interaction data. Genuine behavioral diversity across cultures, languages, and age groups. The frontier labs are racing to catch up on data that Meta has been accumulating since 2004.

Consumer is still hard. You still have to go viral and build a product people love — that hasn't changed. But something else has. The labs have already set aside billions to acquire data they can't generate themselves. Reddit makes about $60 million per year selling its data to Google. That's just one lab. The demand is real, and the money is there.

The most valuable consumer platforms of the next decade won't just be media companies or ad businesses. They'll be data vendors. What matters is finding a unique, defensible way to generate human interaction data at scale. The more specific and structured, the more valuable.

At Gabble, we're building exactly that — a live debate platform where humans argue against agents and each other in real time. Every debate generates something the labs can't get anywhere else. We've hosted thousands of them. If you're a founder or investor thinking about this space: kit@gabble.world

The smart money missed consumer the first time. It won't miss it twice.