How Is

Velum Labs

Using AI?

Automated privacy-first data quality OS with ML anomaly detection and encryption.

Using statistical anomaly detection for data issues, causal lineage inference for root cause, and homomorphic encrypted inference for privacy.

Company Overview

Builds an automated, privacy-first data quality operating system that continuously monitors, diagnoses, and remediates data issues across any enterprise data stack using ML-driven anomaly detection, automated root cause analysis, and fully homomorphic encryption.

Product Roadmap & Public Announcements

Velum Labs has publicly described building ontology-powered data contracts, a content-level firewall for granular access control, privacy-preserving ML via fully homomorphic encryption, and multi-provider LLM support through their open-source Thor framework. They've also announced platform-agnostic deployment (CLI, social, enterprise) and extensible plugin/manager systems for hot-swappable AI agent components. All aimed at becoming the default infrastructure layer for automated data quality across regulated industries.

Signals & Private Analysis

GitHub activity on their Thor conversational LLM framework (Rust/Go) and Kit function-calling toolkit signals investment in agentic AI orchestration and developer tooling. Job titles like "Head of Cyber Operations" (Kuala Lumpur) hint at international expansion and enterprise security positioning. Conference and community engagement suggest partnerships with dbt and CI/CD ecosystem players. Founder academic ties to Stanford and Harvard quantum computing research point toward next-gen encrypted compute capabilities. Strong indicators of a compliance-first GTM targeting finance, healthcare, and legal verticals where data trust commands premium pricing.

Velum Labs

Machine Learning Use Cases

Statistical anomaly detection
For
Risk Reduction
Data

<p>Automated ML-driven monitoring that continuously detects data drift, schema changes, null spikes, and silent data loss across enterprise pipelines in real time.</p>

Layman's Explanation

It's like having a tireless security guard watching every data pipeline 24/7, instantly flagging anything that looks wrong before it causes damage.

Use Case Details

Velum Labs deploys continuous statistical anomaly detection models across enterprise data pipelines to identify data drift, unexpected null distributions, schema changes, and silent data loss in real time. The system ingests production query traffic to build a live dependency graph, then applies time-series forecasting and distribution-shift detection algorithms to establish dynamic baselines for every monitored dataset. When values deviate beyond learned thresholds — whether a column's null rate spikes, a foreign key relationship breaks, or upstream schema changes propagate silently — the platform triggers alerts with full lineage context. Unlike rule-based approaches that require manual threshold configuration, Velum Labs' ML models learn normal patterns organically and adapt to seasonal and structural changes, dramatically reducing both false positives and missed incidents. The system integrates natively with dbt, CI/CD pipelines, and existing orchestration tools, meaning alerts surface where engineers already work. This approach transforms data quality from a reactive, ticket-driven process into a proactive, automated discipline.

Analogy

It's like replacing a smoke detector that only beeps when the house is already on fire with an AI nose that smells the wiring getting warm.

Causal lineage inference
For
Cost Reduction
Engineering

<p>Automated root cause tracing and remediation engine that uses ML-powered lineage analysis to pinpoint the source of data quality issues and deploy fixes through existing workflows.</p>

Layman's Explanation

Instead of engineers spending hours tracing a data bug through dozens of tables, the system instantly finds the broken link and fixes it.

Use Case Details

When a data quality anomaly is detected, Velum Labs' ML engine automatically traces the issue through the full data lineage graph — from dashboards and downstream consumers back through transformations, joins, and source ingestion — to identify the precise root cause. The system uses causal inference techniques combined with graph neural network-style traversal of the operational hypergraph to rank the most probable failure points, factoring in historical incident patterns, recent schema changes, and upstream deployment events. Once the root cause is identified, the platform generates a proposed fix — whether it's a SQL transformation patch, a dbt model update, or a source configuration change — and surfaces it as a pull request or CI/CD action for human review. For recurring issues, the system automatically codifies the fix as an enforceable data contract, creating a self-healing feedback loop. This dramatically reduces the engineering hours spent on data firefighting and transforms incident response from a manual, expertise-dependent process into an automated, reproducible workflow. The remediation engine integrates with git-based workflows, ensuring full auditability and version control of every fix applied.

Analogy

It's like having a plumber who not only finds the exact leaky pipe in your walls using X-ray vision but also patches it and installs a sensor so it never leaks again.

Homomorphic encrypted inference
For
Product Differentiation
IT-Security

<p>Privacy-preserving machine learning pipeline that enables data quality monitoring, anomaly detection, and analytics to run directly on encrypted data using fully homomorphic encryption (FHE), eliminating the need to decrypt sensitive information.</p>

Layman's Explanation

Your data stays locked in a vault the entire time, but the AI can still read it through the walls and tell you if something's wrong.

Use Case Details

Velum Labs integrates fully homomorphic encryption (FHE) into its data quality platform, allowing all ML inference — including anomaly detection, distribution analysis, and root cause scoring — to execute directly on encrypted data without ever decrypting it. This is a breakthrough capability for regulated industries (finance, healthcare, legal) where data cannot be exposed to third-party systems, even for quality monitoring. The platform encrypts data at the source, runs statistical and ML models on the ciphertext, and returns encrypted results that only the data owner can decrypt. This architecture means Velum Labs never sees customer data in plaintext, eliminating an entire class of privacy and compliance risks. The FHE pipeline is optimized for the specific operations required by data quality workflows — comparisons, aggregations, and threshold checks — rather than general-purpose encrypted compute, which allows practical performance on real-world datasets. Combined with the content-level firewall for granular access control, this creates a zero-trust data quality layer that satisfies even the most stringent regulatory frameworks (HIPAA, SOX, GDPR, PCI-DSS). The founders' background in quantum computing research at Harvard further positions this capability as future-proof against quantum decryption threats.

Analogy

It's like a doctor diagnosing your X-ray while it's still sealed in a lead envelope — they give you the results, but they never actually see the image.

Key Technical Team Members

  • Benjamin Muñoz Cerro, Co-Founder & CEO
  • Alen Rubilar Muñoz, Co-Founder & CTO
  • Mohd Sastero, Head of Cyber

Velum Labs combines rare expertise in quantum computing research and privacy-preserving cryptography (homomorphic encryption) with production-grade ML engineering, allowing them to offer automated data quality that works on encrypted data , a capability no competitor currently matches at the infrastructure layer.

Velum Labs

Funding History

  • 2025 | Benjamin Muñoz-Cerro and Alen Rubilar-Muñoz found Velum Labs.
  • 2025 | Open-source Thor LLM framework and Kit toolkit released on GitHub.
  • 2026 | Accepted into Y Combinator Winter 2026 batch.

Velum Labs

Competitors

  • Data Observability: Monte Carlo, Bigeye, Anomalo, Metaplane (monitoring-focused).
  • Data Quality Platforms: Ataccama, Talend, Informatica (legacy enterprise). dbt-native Testing: Elementary Data, dbt built-in tests (developer-focused).
  • AI-Native Data Quality: Soda.io, Validio, Great Expectations (open-source).
  • Privacy-Preserving Compute: Duality Technologies, Enveil, Zama (FHE-focused, not data quality).
More

Companies
Get Every New ML Use Cases Directly to Your Inbox
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.