Data ·AI ·Economics

The Next 5 Trillion-Dollar Frontier

What if the next massive wealth creator isn't in energy or tech hardware—but in the everyday data we all generate? Global GDP is ~$105T annually. A first principles breakdown for a $5T market in 10 years.

Zion Darko
Zion Darko
February 2, 2026
10 min read
The Next 5 Trillion-Dollar Frontier
"Mansa Musa — The Richest Man That Ever Lived"

Global GDP is ~$105T annually. To assume that a single market will soon be worth nearly 5% of that is preposterous, until you reason from first principles. The following is that first principles breakdown for a $5T market in 10 years.

The Three Pillars of AI: Compute, Architecture, and Data

In the age of AI, we have three pillars: Compute (everyone has the same GPUs), Architecture (everyone has access to the same models), and Data. Everything else is noise.

NVIDIA, TSMC, ASML, etc. have dominated compute by either clear technological dominance or by hedging on a fundamental compute shift before it was obvious.

Currently architecture companies are the fastest-growing companies of our age (OpenAI, Cursor, xAI, Anthropic, etc). These companies have pioneered foundational and applied architectures to achieve massive consumer and developer dominance over the entire AI field. They say: "Sell shovels in a gold rush."

Compute and base architectures are rapidly commoditizing. Like most oil reserves, the capital density of these is drying up, or at least collapsing to a few players. Especially in the compute space.

Data: The Last Untapped Reserve

The last reserve is Data. Billions have been spent on pre-training, fine-tuning, RL, and now human data.

The reserves we've exhausted:

  • Massive public web
  • Archived data
  • Public-private partnerships
  • Human-labeled data
  • Video data and more

The typical data sources have saturated. How much more data is left, and how many people are trying to capture it? This increased demand for data has led to a spike in data acquisition capex. Companies like Scale AI at labelling and Mercor, Surge, and Micro1 at human labor arbitrage.

Note: It's currently not easy or feasible for the core labs to collect the final data themselves. They would spread themselves thin, and because of their consumer angle, they believe an equally viable strategy is ecosystem lock-in or leveraging the services of the data acquisition companies.

The data economy is currently worth hundreds of billions [1]. Both traditional and AI data.

With this, we come to the gold mine. If we scale the existing data economies, the market I will reveal next is $5 trillion.

User Data

Realistic estimate: if you live in the West, you produce 5–20 GB of data per day, spiking higher with video uploads and location tracking. This is Passive Background Data.

This is from:

  • Location services — GPS pings, travel routes, places visited
  • Messages — texts, emails, and voice notes across platforms
  • Social media — posts, likes, comments, shares, and follows
  • Apps & games — usage patterns, preferences, and in-app behavior
  • Entertainment — watch history, music taste, reading habits
  • What you say, see, and do — voice assistants, camera rolls, search queries
  • In-person interactions via wearables — health metrics, movement, biometrics, and more

Most of this data is platform locked and disjoined (can only draw links/value once combined) and therefore currently commercially worthless/low value. Value only emerges after aggregation (how do my eating habits affect my shopping routine), identity mapping (this new user has this identity and attributes), and behavior modelling (measure effects of changes to reach desired user behaviors).

In the US, we spend nearly half our waking hours on our devices [2], and this is only increasing (teens spend more than half). This means we will (assuming birth rates don't drop significantly) have an increasing net amount of user data every year. And the more systems and interfaces work in parallel (multiple wearables or apps/screens), the more multiplicative this becomes.

Also take into consideration that wearables haven't even reached true capacity for capturing all our physical data.

In a grim comparison, humans were used as thermal energy sources (batteries) in The Matrix; in the same way, we are all batteries of large volumes of data that can ethically power our environment.

A New Hero

If the opportunity were purely technical, incumbents would already dominate it. The constraint is neither compute or models, but fragmented user data, incentivizing user behavior and regulation.

There are currently 0 players that solve this for the user data at scale.

No major platform can aggregate cross-app behavioral data without triggering regulatory pressure or user backlash.

They either lack the:

  • User trust
  • User-aligned permission
  • Cross-platform visibility
  • Solution to user fatigue

Lacking user trust is obvious, but what about the rest?

A shift in user sentiment and data privacy laws (19 states in the US as of 2026 have enacted comprehensive consumer privacy statutes [3][4]) means users are now in a better position than ever to demand and use products with open data or aggregating data engines.

  • Portability rights give rise to new aggregation layers
  • Opt-out gives rise to platform fragmentation and value-focused business models
  • Consent requirements give rise to user fatigue

Google doesn't know what you do on Hinge, nor Instagram your ChatGPT activity.

And finally, no one wants to repeat themselves, whether that's typing, sign ins or context.

This creates space for a simple user-aligned aggregation layer that operates with explicit permission, portability, and local-first trust.

*These laws give you access, delete, and opt-out control of your digital data. These are the foundations of what I would call data ownership. Currently known as consumer privacy rights.

Why 5T?

On average, if you live in a Western civilization, you produce 5–20 GB of real personal data every day [5] — your messages, location pings, social posts, voice clips, camera snaps, app traces, and wearables catching your moves.

Scale that: There are roughly 1 billion people in the West (US, EU, UK, Canada, Australia, etc.). Even at the low end (say 5 GB/day average), that's ~1,825 exabytes per year.

The value of user data today (which is curated, premium, aggregated, and cleaned) is ~$100 per TB*. Consider also the historical data still being stored by platforms on their users, which goes as far back as 10 years. However, we apply a 50% discount to historical data because it's less valuable — behaviors change, privacy rules require purging stale data, and AI/ad systems prioritize recent signals.

So, every person producing 5 GB of data per day at $100/TB for the next 10 years with 10 years historical data (50% value). That is a low end of $2.7 trillion in 10 years.

Personal data volume is growing rapidly, ~15–20% per year [6]. This growth is driven by the proliferation of connected devices, sensors, video services, and wearables. If we then apply a 15% CAGR over 10 years, this amounts to a conservative value of $5.25T in 10 years.

*This is derived from dividing annual per-user economic value (low end of $182, based on Meta US/Canada ARPU ~$217, Google US ~$460, and Proton 2025 US estimate ≥$700 total extraction) by annual data volume per user (~1.825 TB at 5 GB/day) [7]. ARPU from social/search companies is the widely accepted proxy for the economic value of user data via ads/insights (the primary monetization model). Direct per-TB sales are rare.

Incentivizing

Incentivizing the individual is key here. It's clear the only way to do this is to appeal to personal value beyond money.

The problem with pay-for-data business models is they come off as transactional to consumers. Users then feel they are giving up something too valuable. This is also reminiscent of crypto companies (and some of the "pay for your data" companies are).

This is not how you get people to hand over their data. Even Google doesn't do this. And they've amassed the biggest volume of user data by a single entity thus far.

The small payouts are only significant enough at scale in third world economies, in the same way we outsource labor to third world countries because it's economically attractive there. No pay-for-your-data company has thus far taken off in the West.

Note: Companies like Mercor or Micro1 work on a different plane here, which is arbitrage of human skills whilst capturing data, not recording or accumulating passive user data.

The solution? Make people realize that the utilization of their data can lead to better health, relationships, saving time and money, and more. This is done by personalization.

The exchange is therefore not money-for-data. It is user value-for-context.

Like ChatGPT Health, Gemini Personal Intelligence, Poke, Ash AI, etc. — the more these systems know about you, the better they can serve you. A user can instantly feel a huge benefit. And people only care about what you can do for them.

Personalization also requires trust. Many overlook this since historically people have accepted their lack of trust for big platforms. But what has passed won't necessarily continue. Balaji recently pointed this out. As the cost of storage and compute goes down, we can offer increasing security parity to big tech. This takes the form of on-device/private compute, portability, verifiable services, and more. Meaning we can offer personalization and value with the same guarantees people are used to. More on this in separate writing.

Data as Personal Currency: Beyond Consumer Apps

Immediate use cases are personalized assistants, apps, and experiences (the GTM for Onairos). It's expected the general public will be exposed to AI via these channels; however, that's just the surface.

Imagine:

  • What if your commute data was used daily to end traffic? — Real-time route optimization, smart traffic lights, reduced emissions city-wide
  • What if every grocery run you ever made optimized the city's waste system? — No more overflowing bins, predictive restocking, zero-waste supply chains
  • Robot assistants that can match and adapt to your mood and personality — companions that learn your communication style, energy levels, and preferences over time
  • What if your anonymized opinions weren't guessed by Cambridge Analytica 2.0 — but actually aggregated to show what the country really thinks?

Data as currency isn't just about cash — it's about leverage. Leverage over your life, your city, and your government. And the kicker: you decide who uses it, how much, and for what.

Mastering this tone and ethos is very important in order to win over the people, who are the owners of this data market.

The Bottom-Up Problem

Many will then ask, what's stopping Meta or ChatGPT from adding it? This adds a fundamental user fatigue variable where people:

  • Don't want to repeatedly consent and give permissions
  • Don't trust big tech
  • But want the amazing user experience

The problem is bottom-up, not top-down, and once solved, will result in a >$5T dollar market.

Onairos exists to become this user-aligned aggregation layer and the multitudes of advancements that come after that. Our vision for capturing this market is clear, not because of the market size, but because the market size signifies a greater impact for our users and partner companies.


References

[1] Grand View Research. "Data Broker Market Size, Share & Trends Analysis Report" (2025). grandviewresearch.com

[2] DataReportal / Exploding Topics. "Screen Time Statistics 2025." datareportal.com

[3] IAPP. "US State Privacy Legislation Tracker" (February 2026 update). iapp.org

[4] Bloomberg Law. "State Privacy Legislation Tracker" (2026). bloomberglaw.com

[5] DataReportal. "Digital 2025: Global Overview Report." datareportal.com

[6] Proton. "What's your data really worth? (2025 update)." proton.me

[7] Statista. "Volume of data created worldwide 2010–2029" (2026 update). statista.com

Author

Zion Darko

Zion Darko

Founder & CEO

Inventor and Dreamer and CEO.