Tether’s QVAC pushes multi‑billion‑parameter AI models onto phones and consumer GPUs

Mar 17, 2026 at 10:00 PM UTC

Edited by Dorian Batycka

News

Tether QVAC BitNet LoRA on‑device AI diagram

Tether’s QVAC Fabric integrates BitNet LoRA to fine‑tune and run multi‑billion‑parameter AI models on consumer GPUs and flagship phones, pushing serious AI work to the edge.

Summary

QVAC Fabric brings BitNet LoRA fine‑tuning and inference to AMD and Intel GPUs, Apple’s Metal stack, and high‑end mobile GPUs, claiming 2–11x speedups over CPU baselines and up to 90% lower memory use.
Tether says it has fine‑tuned models up to 3.8 billion parameters on Pixel 9, Galaxy S25, and iPhone 16, and up to 13 billion parameters on iPhone 16, pushing on‑device AI far beyond today’s typical sub‑3B demos.
The release fits Tether’s pivot from pure stablecoin issuer to infrastructure player, complementing earlier QVAC initiatives like the 41‑billion‑token Genesis I dataset and local AI Workbench to challenge Big Tech’s AI moat.

Tether’s AI division has quietly shipped one of its most aggressive non‑stablecoin bets to date: a cross‑platform BitNet LoRA framework, integrated into its QVAC Fabric stack, that can train and run multi‑billion‑parameter language models directly on consumer‑grade GPUs and flagship smartphones. If the numbers hold up outside Tether’s own benchmarks, this pushes on‑device AI from “cute demo” territory into something systemically relevant for both hardware vendors and crypto‑aligned infra investors.

The new QVAC Fabric release brings BitNet LoRA fine‑tuning and inference to AMD and Intel GPUs, Apple’s Metal ecosystem, and a range of mobile GPUs in a single framework. Tether claims that, on flagship devices, GPU‑based inference is between 2 and 11 times faster than CPU baselines, while memory usage drops by as much as 90% versus full‑precision models. In practice, this means you can squeeze significantly larger models, or more concurrent sessions, onto the same hardware envelope—critical for phones and laptops where thermal and RAM ceilings are non‑negotiable.

https://twitter.com/paoloardoino/status/2033894861783376196?s=20

The headline numbers are provocative: Tether’s team says it has completed fine‑tuning of models up to 3.8 billion parameters on devices like the Pixel 9, Galaxy S25, and iPhone 16, and has pushed fine‑tuning to as large as 13 billion parameters on the iPhone 16 specifically. That is a sharp escalation from the current norm, where most “on‑device AI” marketing still revolves around sub‑3B parameter models or offloads heavier workloads to the cloud. If reproducible, this suggests a future where serious personalization and domain‑specific adaptation can happen locally, without shipping user data off‑device.

Strategically, this fits Tether’s ongoing pivot from pure stablecoin issuer to broader infrastructure operator. The company has already plowed billions into energy, mining, and media; now it is adding edge‑AI tooling to the portfolio, with the related QVAC and BitNet LoRA code open‑sourced on GitHub for developers to inspect and build on. Open sourcing is not altruism—it is distribution. If QVAC becomes a default path for indie devs and small labs to push models onto consumer hardware, Tether buys cultural and technical relevance in a stack that sits well outside banking regulation’s direct line of fire.

For markets, the immediate impact is narrative, not P&L. There is no token here, no obvious “farm this yield” angle. But there is a clear macro story: as more AI work migrates to the edge, infrastructure power shifts from centralized hyperscalers toward whoever controls key toolchains and hardware abstraction layers. Tether is signaling that it intends to be one of those players, leveraging its balance sheet to seed primitives that reduce dependence on any single cloud or jurisdiction. For crypto, an ecosystem increasingly obsessed with AI‑adjacent plays, this is a reminder that not every serious bet needs a ticker symbol attached.

For now, the obvious questions are technical: how BitNet LoRA’s claimed speedups and memory reductions compare against incumbents like llama.cpp, MLC, or Qualcomm’s own SDKs on the same devices; what the energy and thermal trade‑offs look like in real‑world use; and how permissive the licenses are for commercial deployment. But if even a conservative slice of Tether’s claims prove out under independent benchmarking, QVAC Fabric’s BitNet LoRA integration will mark a tangible step toward turning high‑end smartphones into viable training and inference rigs for mid‑sized language models—shifting AI one notch closer to the edge, and giving Tether yet another foothold in critical digital infrastructure.