Robotics will break AI unless we fix data verification first | Opinion
Disclosure: The views and opinions expressed here belong solely to the author and do not represent the views and opinions of crypto.news’ editorial.
During this year’s flagship robotics conference, six of the field’s most influential researchers gathered to debate a simple, but loaded question: Will data solve robotics and automation?
- Scale vs. theory misses the real problem — robotics doesn’t just need more data or better models, it needs trustworthy data; unverified inputs make autonomy fragile outside controlled environments.
- Hallucinations become dangerous in the physical world — errors that are tolerable in text (like false citations) can cause real harm when robots act on corrupted, spoofed, or misaligned data.
- Verifiable, trustless data is the missing layer — cryptographic provenance and coordination systems (e.g., on-chain verification) are necessary to make robotics safe, auditable, and reliable at scale.
On one side were the optimists of scale, arguing that vast demonstration datasets and gigantic models will finally give robots something like physical common sense. On the other hand were the defenders of theory, insisting that physics and mathematical models give data its meaning and are essential for real understanding.
Both camps are essentially right about what they emphasize. And both quietly assume something they barely mention: that the data they feed these systems can be trusted in the first place. As robots start to move from the premises of carefully controlled factories into homes, hospitals, and streets, that assumption becomes dangerous. But before we argue whether data will solve robotics, we should confront a more urgent question. Will robotics actually break artificial intelligence without verifiable, tamper-proof data provenance?
When robotics leaves the lab, assumptions break
AI continues to struggle with differentiating fact from fiction. A recent Stanford University study found that even 24 of the most advanced language models still cannot reliably distinguish between what is true in the world and what a human believes to be true. In the study, a user tells the AI that they believe humans only use 10% of their brains, a claim that is scientifically false but widely held. When the user then asks, “What fraction of our brain do I believe is being used?”, the model should recognize the user’s belief and answer, “You believe humans use 10% of their brain.” Instead, the AI ignores the user’s stated belief and corrects them by insisting that humans use 100% of their brains.
This example captures the core issue. Current AI systems struggle to separate factual reality from a human’s perception of reality. They often conflate their own knowledge with the beliefs of the person they’re interacting with, which becomes a serious limitation in domains that require sensitivity to human perspective, such as medicine, education, or personal assistance. This raises key concerns for AI deployed outside curated lab environments, where it fails to adapt to the unpredictable and messy nature of the real world.
Deloitte, a prominent auditing and consulting firm, for example, was reprimanded twice this year for citing AI-hallucinated errors in official reports. The latest was a $1.6 million healthcare plan for the Newfoundland and Labrador government in Canada, which included “at least four citations which do not, or appear not to, exist”. However, hallucinations in large language models are not a glitch; they are a systemic result of how models are trained (next-word prediction) and evaluated (benchmarks rewarding guessing over honesty). OpenAI predicts that as long as incentives remain the same, hallucinations are likely to persist.
When hallucinations leave the screen and enter the physical world
These limitations become far more consequential once AI is embedded in robotics. A hallucinated citation in a report might seem embarrassing, but a hallucinated input in a robot navigating a warehouse or home can be dangerous. The thing with robotics is that it cannot afford the luxury of “close enough” answers. The real world is full of noise, irregularities, and edge cases that no curated dataset can fully capture.
The mismatch between training data and deployment conditions is precisely why scale alone will not make robots more reliable. You can throw millions more examples at a model, but if those examples are still sanitized abstractions of reality, the robot will still fail in situations a human would consider trivial. The assumptions baked into the data become the constraints baked into the behavior.
And that is before we even consider data corruption, sensor spoofing, drift in hardware, or the simple fact that two identical devices never perceive the world in exactly the same way. In the real world, data is not just imperfect; it is vulnerable. A robot operating from unverified inputs is operating on faith, not truth.
But as robotics moves into open, uncontrolled environments, the core problem is not just that AI models lack “common sense.” It’s that they lack any mechanism to determine whether the data informing their decisions is accurate in the first place. The gap between curated datasets and real-world conditions is not just a challenge; it is a fundamental threat to autonomous reliability.
Trustless AI data is the foundation of reliable robotics
If robotics is ever going to operate safely outside of controlled environments, it needs more than better models or bigger datasets. It needs data that can be trusted independently of the systems consuming it. Today’s AI treats sensor inputs and upstream model outputs as essentially trustworthy. But in the physical world assumption collapses almost immediately.
This is why failures in robotics rarely stem from a lack of data, but from data that fails to reflect the environment the robot is actually operating in. When the inputs are incomplete, misleading, or out of sync with reality, the robot fails long before it ever “sees” the problem. The real issue is that today’s systems were not built for a world where data can be hallucinated or manipulated.
Pantera Capital’s $20 million investment in OpenMind, a project described as “Linux on Ethereum” for robotics, reflects a growing consensus: if robots are to operate collaboratively and reliably, they will need blockchain-backed verification layers to coordinate and exchange trusted information. As OpenMind’s founder Jan Liphardt put it: “if AI is the brain and robotics is the body, coordination is the nervous system”.
And this shift is not limited to robotics. Across the AI landscape, companies are beginning to bake verifiability directly into their systems, from governance frameworks like EQTY Lab’s new verifiable AI oversight tool on Hedera, to infrastructure designed for on-chain model validation, such as ChainGPT’s AIVM layer-1 blockchain. AI can no longer safely operate without cryptographic assurance that its data, computations, and outputs are authentic, and robotics continues to further amplify this need.
Trustless data directly addresses this gap. Instead of accepting sensor readings or environmental signals at face value, robots can verify them cryptographically, redundantly, and in real time. When every location reading, sensor output, or computation can be proven rather than assumed, autonomy stops being an act of faith. It becomes an evidence-based system capable of resisting spoofing, tampering, or drift.
Verification fundamentally rewires the autonomy stack. Robots can cross-check data, validate computations, produce proofs of completed tasks, and audit decisions when something goes wrong. They stop inheriting errors silently and start rejecting corrupted inputs proactively. The future of robotics will not be unlocked by scale alone, but by machines that can prove where they were, what they sensed, what work they performed, and how their data evolved over time.
Trustless data does not just make AI safer; it makes reliable autonomy possible.