Table of Contents
- 1. Multiverse Computing launches CompactifAI for local AI models
- 2. Introduction to CompactifAI and Its Purpose
- 3. Features of the CompactifAI App
- 3.1 Local AI Model Execution
- 3.2 Integration with Cloud Models
- 4. Compressed Models from Leading AI Labs
- 4.1 Collaboration with OpenAI and Others
- 4.2 Benefits of Model Compression
- 5. The Ash Nazg System for Processing Routing
- 6. Consumer Adoption and Download Statistics
- 7. Targeting Businesses with API Portal
- 7.1 Self-Serve API for Developers
- 7.2 Real-Time Usage Monitoring Features
- 8. The Future of AI with Compressed Models
Multiverse Computing launches CompactifAI for local AI models
Hybrid Edge-to-Cloud AI Access
– What launched: CompactifAI, combining an offline-capable chat app with a new self-serve API portal for compressed models.
– What it enables: a tiny on-device model (Gilda) for local/offline prompts when hardware allows, plus automatic fallback to cloud models when it doesn’t.
– Why it matters now: as compute availability and cost volatility push teams to consider edge and on-prem options, compression offers a path to reduce reliance on external infrastructure.
- Multiverse Computing has launched CompactifAI, pairing an offline-capable chat app with a new self-serve API portal for compressed AI models.
- The CompactifAI app runs a tiny local model called Gilda on-device when hardware allows, and automatically falls back to cloud models when it doesn’t.
- Multiverse says it has compressed models from major labs including OpenAI, Meta, DeepSeek, and Mistral, aiming to cut inference cost and reduce reliance on external compute.
- A routing layer dubbed Ash Nazg decides whether a prompt runs locally or in the cloud—trading resilience and privacy for capability when it switches.
- Early consumer traction appears limited: Sensor Tower data shows fewer than 5,000 downloads in the past month, underscoring a stronger enterprise focus.
Introduction to CompactifAI and Its Purpose
As AI supply chains face volatility—rising costs, tighter compute availability, and growing concern about dependency on cloud providers—Multiverse Computing is betting on a different path: make powerful models smaller, cheaper to run, and increasingly capable on local hardware.
The Spanish startup has introduced CompactifAI, the name it uses both for its quantum-inspired model compression technology and for a new consumer-facing app designed to demonstrate what compressed models can do. In Multiverse’s own technical framing, CompactifAI uses quantum-inspired tensor networks to restructure and compress model weights, with retraining used to recover much of the original quality. Alongside the app, Multiverse is also rolling out an API portal aimed squarely at developers and enterprises that want production access to them without going through marketplaces such as AWS.
Efficient Models, Flexible Deployment
A quick way to understand CompactifAI
– Problem: Frontier models are expensive to run and often assume reliable cloud access.
– Approach: CompactifAI compresses a model’s internal weight matrices using quantum-inspired tensor networks (Multiverse describes this as tensor-network decompositions such as Matrix Product Operators), then re-trains to recover quality.
– Deployment impact: A smaller model can fit on more hardware (edge/on-prem) and reduce inference spend; when it can’t fit, products like the CompactifAI app can route the request to a larger cloud model.
Features of the CompactifAI App
CompactifAI is positioned as a familiar AI chat experience—ask a question, get an answer—but with an emphasis on running models at the edge when possible.
Adaptive Local-to-Cloud Handling
How a prompt gets handled (typical flow)
1) User asks a question in the app
2) Device check: available RAM/storage (and practical performance headroom)
3) If sufficient: run Gilda locally → response works offline and data stays on-device for that interaction
4) If insufficient: route via API to a cloud model → response is available, but the interaction no longer has the same offline/on-device privacy and resilience properties
5) User experience goal: keep the chat consistent while hiding the hardware complexity
Local AI Model Execution
The app embeds Gilda, which Multiverse describes as small enough to run locally and offline. That enables a version of AI assistance that can keep data on-device and continue functioning without connectivity—an increasingly attractive proposition for users and teams working in constrained or sensitive environments.
There is a practical limitation: local execution depends on a device having sufficient RAM and storage. Multiverse notes that many older phones—particularly older iPhones—may not meet the requirements.
Integration with Cloud Models
When a device can’t run Gilda effectively, the app switches to cloud models via API. This preserves usability and capability, but it changes the core value proposition: once a request is routed to the cloud, the app no longer offers the same “offline, on-device” privacy and resilience advantages for that interaction.
Multiverse also uses the app to showcase access to larger models, including routing to gpt-oss-120b via API, highlighting a hybrid approach similar in spirit to other on-device-plus-cloud strategies in the market.
Compressed Models from Leading AI Labs
Multiverse’s broader pitch is not just “offline chat,” but a portfolio of compressed models that can reduce inference costs and expand deployment options across edge, on-prem, and cloud environments.
Collaboration with OpenAI and Others
The company says it has compressed models originating from major AI labs, including OpenAI, Meta, DeepSeek, and Mistral AI. One recent example is HyperNova 60B 2602, which Multiverse says is built on gpt-oss-120b, an OpenAI model with publicly available underlying code.
Multiverse claims its compressed derivative can deliver faster responses at lower cost than the original—an advantage it frames as particularly relevant for agentic coding workflows, where models execute multi-step tasks autonomously and inference efficiency becomes a bottleneck.
Compression Gains With Minimal Loss
| Model / family (as described publicly) | Origin / base model | What Multiverse says it changes | Reported deltas (where stated) | Source context |
|---|---|---|---|---|
| Gilda (in CompactifAI app) | Multiverse local model | Runs on-device for offline chat when hardware allows | No public % deltas in this article; positioned as “small enough to run locally and offline” | Company positioning via product launch coverage |
| HyperNova 60B 2602 | Built on gpt-oss-120b (OpenAI code publicly available, per article) | Compressed derivative aimed at faster/lower-cost inference | “Faster responses at lower cost” (no numeric deltas provided here) | Company claim reported in launch coverage |
| Llama “Slim” releases (examples cited in Multiverse materials) | Llama 3.1-8B / Llama 3.3-70B | Tensor-network compression + retraining | Multiverse materials report ~80% compression, ~2–3% precision drop, and efficiency gains (e.g., up to ~84% energy efficiency, ~40% faster inference, ~50% lower op cost) for specific releases | Company-reported figures; an independent benchmark is also published for at least one Llama compression (see below) |
| Independent benchmark (Llama 3.1-8B compression) | Llama 3.1-8B | Evaluates efficiency/energy impacts of CompactifAI-style compression | Reports “significant reductions” in compute/energy with “negligible” accuracy loss (exact metrics depend on the benchmark setup) | Sopra Steria sustAIn team benchmark on arXiv (2025) |
Benefits of Model Compression
The appeal of compression is straightforward: smaller models can be cheaper to run, easier to deploy, and less dependent on scarce external compute. Multiverse’s CompactifAI approach is based on quantum-inspired tensor networks, which the company says can achieve extreme size reductions while preserving much of a model’s utility.
In Multiverse’s published materials, the company claims compression rates of up to 95% with only a ~2–3% drop in precision, alongside efficiency gains such as up to 84% better energy efficiency, around 40% faster inference, and roughly 50% lower operational costs (figures presented by the company for specific compressed-model releases).
In practice, the benefits Multiverse is selling to enterprises cluster around:
– Lower compute spend for inference at scale
– More predictable deployment without relying entirely on third-party cloud capacity
– New form factors, including models that can run on edge devices where connectivity is limited or intermittent
The Ash Nazg System for Processing Routing
A key piece of the CompactifAI app is its automatic routing system, Ash Nazg, which decides whether a prompt should be handled locally or sent to the cloud.
The name is a Tolkien reference, but the function is pragmatic: it abstracts away hardware constraints and tries to deliver a consistent user experience. The tradeoff is clear: when Ash Nazg routes a request to the cloud, the interaction loses the app’s main edge advantage—local-only processing—even if it gains access to more capable models.
On-Device vs Cloud Tradeoffs
– Local (on-device) path: strongest on privacy (data stays on the device for that prompt) and resilience (works without connectivity), but limited by RAM/storage and typically by model capability.
– Cloud (API) path: broader capability and compatibility across older devices, but weaker on the app’s core “edge” promise for that interaction (data leaves the device; requires network).
– Operational implication: the more often routing falls back to cloud, the more CompactifAI behaves like a conventional chat app—so device mix and use case constraints matter.
Consumer Adoption and Download Statistics
CompactifAI’s early consumer footprint appears modest. According to Sensor Tower, the app recorded fewer than 5,000 downloads in the past month.
Downloads Signal Product Positioning
What the download number can (and can’t) tell you
– What it supports: CompactifAI doesn’t yet look like a mass-market consumer breakout; it reads more like a product showcase for edge/offline capability.
– What it doesn’t prove: downloads alone don’t reveal retention, enterprise pilots, or whether usage is concentrated among developers evaluating the tech.
– Why it still matters: it aligns with the rollout emphasis on an API portal and production controls rather than consumer growth loops.
That reinforces what Multiverse’s product rollout suggests: the app is likely more of a showcase than a mass-market play—an on-ramp to demonstrate compressed-model performance and edge execution, while the company’s commercial momentum remains centered on enterprise deployments.
Targeting Businesses with API Portal
Multiverse’s clearest expansion move is aimed at organizations that want compressed models in production, with tooling that supports operational control and cost visibility.
Self-Serve API for Developers
The company is launching a self-serve API portal that provides direct access to its compressed models—explicitly positioning it as an alternative to distribution through platforms like the AWS Marketplace.
CEO Enrique Lizaso said the portal gives developers access “with the transparency and control needed to run them in production,” framing the offering as infrastructure for real workloads rather than experimentation.
Production API Portal Criteria
If you’re evaluating the API portal for production, look for:
– Model catalog clarity: which compressed variants exist, what they’re derived from, and what tradeoffs are expected
– Usage visibility: real-time monitoring (Multiverse highlights this) plus exportable logs/metrics
– Cost controls: quotas, rate limits, and predictable pricing units for inference
– Deployment options: ability to run via API now, and a path to edge/on-prem where required
– Governance basics: access keys/roles and auditability appropriate to your environment
Real-Time Usage Monitoring Features
One of the portal’s headline features is real-time usage monitoring, reflecting a central enterprise motivation for smaller models: cost management.
As organizations weigh smaller models against frontier-scale LLMs, the decision often comes down to unit economics—latency, throughput, and the ability to forecast and control spend. Multiverse is signaling that compressed models are not just a research novelty, but a lever for predictable, measurable operational efficiency.
The Future of AI with Compressed Models
Transforming Accessibility and Efficiency
Compressed models are increasingly positioned as a practical counterweight to the “bigger is better” era of AI. As small-model capabilities improve—Mistral, for example, has recently updated its small-model lineup—enterprises are more willing to trade a slice of peak performance for major gains in cost, speed, and deployability.
Multiverse’s bet is that compression can narrow the gap further, enabling models that are “good enough” for many tasks while being dramatically easier to run—especially in settings where cloud access is expensive, unreliable, or undesirable.
The Role of Multiverse Computing in AI Evolution
Multiverse is already working with more than 100 global customers, including the Bank of Canada, Bosch, and Iberdrola, and it is rumored to be raising a new €500 million round at a valuation above €1.5 billion after a $215 million Series B last year.
If compressed models become a default layer in AI deployment—powering edge devices, on-prem installations, and cost-optimized cloud services—Multiverse’s strategy of pairing a consumer demo (CompactifAI) with enterprise plumbing (the API portal) could help it move from a specialist compression vendor to a mainstream supplier in the AI stack.
This lens is shaped by Martin Weidemann’s work building and scaling technology businesses in regulated, cost-sensitive environments across Latin America, where unit economics, reliability, and data-handling constraints often determine whether AI can move from demo to production.
This article reflects publicly available information at the time of writing and summarizes product claims and early adoption signals. Some performance figures come from company statements tied to specific releases, and real-world results may vary by hardware, workload, and evaluation setup. Funding and valuation details described as rumored are uncertain and may change as new information becomes public.
I am MartĂn Weidemann, a digital transformation consultant and founder of Weidemann.tech. I help businesses adapt to the digital age by optimizing processes and implementing innovative technologies. My goal is to transform businesses to be more efficient and competitive in today’s market.
LinkedIn

