Mind the data gap: DeAI requires more diverse datasets | Opinion
![Mind the data gap: DeAI requires more diverse datasets | Opinion](https://crypto.news/app/uploads/2025/02/crypto-news-Mind-the-data-gap-option03-1380x820.webp)
Disclosure: The views and opinions expressed here belong solely to the author and do not represent the views and opinions of crypto.news’ editorial.
Artificial intelligence is all the rage. Yet beneath the hype surrounding decentralized AI (DeAI) lies a critical flaw: a dearth of diverse, secure, verifiable data. On-chain datasets are simply too limited to train truly powerful models. This risks ceding the AI future to centralized behemoths, which have unfettered access to the vast data troves of the web.
DeAI’s promise—democratized, transparent, and robust AI—hinges on bridging this data gap. Clever cryptography offers a route.
The beauty of conventional AI lies in its gluttony. The more data it devours, the smarter it becomes. But this advantage is also its Achilles’ heel. Centralized AI models are trained on data often harvested without explicit consent, raising thorny questions of privacy and control.
DeAI, built on blockchain’s principles of decentralization and transparency, offers an appealing alternative. Yet, most data onchain comes from financial transactions or DeFi. Small language models especially require more precise data for fine-tuning. This leaves DeAI models starved of the rich and varied datasets needed to refine them to the competitive levels expected of the latest models.
Such datasets are available outside web3, with The Pile and Common Crawl each containing data from billions of unique sources. The depth of existing verified web2 data sources, as much as the volume of data, is what has enabled centralized AI providers to refine their GPTs as far and as fast as they have.
Recreating the same level of data onchain is not feasible on a competitive timescale. And while some AI firms have run afoul of data creators who accuse them of stealing exactly the type of nuanced data discussed here, there is another way to get more data onchain—make it safer.
Building bridges
This is where cryptography comes in. Zero-knowledge proofs, already making waves in blockchain scalability and privacy, offer a potent solution. Two techniques in particular—zero-knowledge fully homomorphic encryption (zkFHE) and zero-knowledge TLS (zkTLS)—hold the key to unlocking web2’s data for DeAI.
zkFHE allows computations to be performed on encrypted data without decrypting it. Imagine training an AI model on sensitive medical records without ever exposing the raw patient data. This is the power of zkFHE. It allows DeAI models to learn from vast, privacy-protected datasets, vastly expanding their training possibilities.
zkTLS extends this principle to internet communication. It allows users to prove possession of certain data from a website—say, a credit score or social media activity—without revealing the underlying information. This is crucial for integrating the wealth of data residing in web2’s silos into DeAI systems. For instance, a decentralized credit scoring model could leverage zkTLS to access authenticated financial data from traditional institutions without compromising their confidentiality.
Advantage, DeAI?
The implications are profound. By combining zkFHE and zkTLS, DeAI can tap into the vastness of web2’s data while preserving the core tenets of privacy and decentralization. This could level the playing field, allowing DeAI to compete with and perhaps even surpass centralized AI.
Consider the development of large language models currently dominated by well-funded tech giants. These models require colossal amounts of text data for training. By leveraging zkTLS, DeAI developers could access and utilize publicly available web data in a privacy-preserving manner, creating more democratic and transparent LLMs.
There are, of course, challenges. Implementing zkFHE and zkTLS is computationally intensive, requiring significant advances in hardware and software. Standardization and interoperability are also crucial for widespread adoption. But the potential rewards are immense.
In the race for AI supremacy, data is the ultimate fuel. By embracing cryptographic solutions like zkFHE and zkTLS, DeAI can access the fuel it needs to perform. This is not just about building smarter AI; it’s about building a more democratic and equitable AI future.