Siddharth Ramakrishnan

Writing

DePIN 🤝 AI

November 19, 2024

The AI boom has pushed datacenter infrastructure into the spotlight. Over the past five years, GPUs have become 50x faster, and the rest of the supporting datacenter infrastructure—networking, storage, and memory—has also improved by 4–5x. Companies continue to race to optimize metrics like MFU (Model FLOPS Utilization) and SFU (System FLOPS Efficiency), all in the hopes of capturing the ultimate prize of AGI (read as: the preferred AI provider for most enterprise workloads). The question for the crypto community arises: can DePIN help assist the AI ecosystem in this AGI moonshot?

DePIN offers an alternative to traditional centralized datacenter models by tapping into existing, distributed hardware and resources. There's a lot of attention around decentralized training, but the immediate opportunities may lie elsewhere—particularly in inference and data. Let’s break down the current state of the AI infrastructure landscape and explore where DePIN can add the most value today.

Training vs. Inference

Decentralized training is an exciting concept being explored by teams like Nous and Prime Intellect. While these efforts hold promise for the future, the current landscape suggests that colocated GPUs still reign supreme for AI training. Hyperscalers are continuing to build large, centralized datacenters rather than smaller, distributed ones, and for good reason. The networking demands of training AI models favor proximity and integration. Until decentralized training overcomes significant bottlenecks, it’s difficult to see how DePIN could meaningfully enhance compute capacity for leading edge LLM training in the near term (it could still enhance other, simpler ML / deep learning training workloads though).

However, inference is a different story. Inference doesn’t require the same complex GPU interconnects that training does. A GPU only needs to run a pretrained model and return the result, making the hardware setup much simpler. Companies like Kuzco, with their platform Inference.net, are demonstrating how DePIN can create value here. By incentivizing people to run inference tasks on idle local machines—they have a lot of macbooks online—they're unlocking unused compute power.

This is an area where DePIN has clear potential. As we’ve seen with newer cloud companies like Together and SF Compute, the demand for GPUs is still massive. While the cutting-edge chips are snapped up by hyperscalers, there’s a persistent need for GPUs in other segments. SF Compute, for instance, serves academics who are priced out of the H100 market, offering access to GPUs for experiments and smaller-scale workloads.

There’s also a sort of Jevons Paradox at play here: every incremental drop in GPU prices spurs new demand. If GPUs became cheaper still, we’d likely see a wave of innovation in non-LLM machine learning workloads, like classification or metadata tagging. Older GPUs running older models at lower costs could open the door to economic feasibility for tasks that are currently out of reach. (For example, see my earlier post on LLMs and classification.)

Another advantage of inference is its tolerance for hardware degradation. While training requires meticulous monitoring to avoid checkpoint rollbacks due to GPU errors, inference operates more like traditional compute. Failed requests can simply be rerouted to another GPU, making it a more forgiving and scalable use case for decentralized networks.

Data: The Most Promising Opportunity

The conversation around AI scaling laws is heating up (e.g., Sam Altman’s tweet), and if the major labs are indeed hitting a wall, data becomes even more critical. Bespoke, domain-specific models trained on highly specialized datasets could be the next frontier. We’ve already seen this trend in models optimized for coding (e.g., Code Llama, Qwen 2.5 Coding) and math (e.g., Harmonic.fun).

On the "web2" side, companies like Scale AI and Labelbox dominate the data labeling market for large models. Meanwhile, DePIN projects like Grass are stepping up to provide data through methods like residential VPNs and web scraping. Grass is a strong start, but the AI ecosystem may soon need more than just larger volumes of scraped public data. As the field matures, the focus will likely shift toward specialized, high-quality datasets—for example, datasets curated by domain experts, such as Math PhDs crafting proofs for math-specific models.

DePIN networks have a natural edge here. Their decentralized and global nature allows them to source data at scale, often at lower costs, while also enabling contributions from diverse, specialized communities. DePIN could play a transformative role by building networks that:

The Path Forward

While DePIN may not yet be poised to revolutionize AI training, its potential in inference and data is already becoming clear. By leveraging untapped resources—whether unused GPUs for inference or global contributors for data—DePIN can meaningfully lower costs and expand the ecosystem’s capacity.

The key will be identifying niches where decentralization provides a real advantage over centralized solutions. As AI continues to evolve, the value of accessible, high-quality infrastructure and data will only grow, and DePIN is uniquely positioned to fill that gap.