A data engineering leader at a multinational automotive manufacturer put it to me directly last quarter: "We have spent four years and tens of millions of dollars building the data foundation. We still can't tell you, with confidence, how many unique customers we have." Every AI and analytics initiative the business is asking them to deliver inherits this foundational gap. When models are trained on fragmented identities and inconsistent customer context, they yield confident, but fundamentally wrong answers about who to target, what to offer, and when to act. The infrastructure investment might be substantial, but the customer context layer underneath it is missing. In an AI-driven ecosystem, the cost of this missing layer has become impossible to ignore.
To solve this, we built Bring Your Own Compute (BYOC). Rather than forcing data to migrate to a third-party platform, BYOC allows enterprises to run heavy customer intelligence workloads as governed processes directly within their own Snowflake or Databricks environments.
This represents a deliberate architectural shift away from monolithic data silos and toward a composable, lakehouse-native ecosystem.
The hardest problem in customer data isn't where it lives. It's knowing who it belongs to.
Identity Resolution at petabyte scale is not a storage problem and not an orchestration problem. It is a machine learning (ML) problem with a specific shape.
It turns deterministic matching, probabilistic matching, and transitive logic operating across messy first-party personally identifiable information (PII) that adapts every time a customer shares a new email, changes a phone number, or buys on a new device.
Snowflake and Databricks are excellent at running ML workloads. Neither was designed to solve that particular problem on its own.
Two categories of vendors have tried to solve it with shortcuts.
The activation-first approach, championed by Hightouch and the broader reverse-ETL category, has done useful work moving data efficiently between warehouses and downstream tools. The limitation is assuming customer identity is already solved. When that foundation is incomplete, activation just moves imperfect data faster.
Most customer data platforms (CDPs) were originally architected for an era when customer data lived primarily outside the warehouse. The vendor's environment was the natural place to run customer intelligence workloads, and for many enterprises, it still is. What's changed is that enterprise data has migrated into the lakehouse. BYOC reflects that shift, giving teams the option to run customer intelligence workloads inside the same environment as the rest of their data and AI stack.
Another option is to build Identity Resolution in-house. Plenty of enterprises have tried. Most of the data leaders I've talked to can tell you the year the project started. None of them are eager to talk about a finish line. The manufacturer I mentioned at the top of this post spent 18 months on a prior implementation effort before partnering with us. Once we stepped in, we helped get the project across the finish line, accelerating implementation by 72% compared to the previous approach.
Should the intelligence come to the data, or the data go to the intelligence?
Every step in our architecture has been an answer to that question. The answer at enterprise scale has been clear for years. Each step has moved Amperity closer to the lakehouse, not further from it.
We started with Amperity Bridge in 2024. Bridge gave us native read and write access against Snowflake, Databricks, and BigQuery so customer data could stay in the lakehouse while Amperity processed it. The replication step previous platforms required largely went away. Bridge became one of the fastest-adopted features in our history.
Bring Your Own Storage (BYOS) came next. Your customer data lives in your own S3 or Azure storage rather than ours, with delegated credentials and your governance policies in force. The duplicate storage footprint went away too.
With BYOC, customer intelligence runs directly inside your governed environment. Processing executes on your Snowflake or Databricks cluster. Your customer data never leaves the perimeter your security team already approved. The only data that surfaces in the Amperity interface is scoped job results.
We also recognize that enterprise data science teams have unique, domain-specific requirements. So in the near future, we will expand the stack by introducing the BYOM (Bring Your Own Model) support. With BYOM, brands will be able to ingest their own proprietary ML models directly into the Amperity workflow running inside their lakehouse, on top of the high-fidelity identity graph generated by Amperity.
Why running on your compute changes the math
Running customer intelligence inside your own compute cluster fundamentally alters the deployment timeline and economics of data projects:
Zero-Trust Security Alignment: Because data at rest stays at rest, there is no new third-party data movement path for your Chief Information Security Officer (CISO) to vet. Your existing governance policies, audit trails, and RBAC frameworks remain active. For highly regulated sectors like healthcare and financial services and for enterprises operating under frameworks like GDPR, CCPA, or HIPAA, this eliminates quarters of security reviews.
Streamlined Data Lineage: Processing data where it sits removes the cross-environment syncs that bottleneck petabyte-scale Identity Resolution. The same engineering team that runs your lakehouse runs your customer intelligence workloads, on the same cluster, with the same tooling. For many enterprises, this can compress the path from raw data to trusted segments from quarters to weeks.
Optimized Cloud Spend: The economics work differently as well. With BYOC, selected Amperity workloads can run in the customer’s Snowflake or Databricks environment, helping teams leverage existing lakehouse investments and align customer intelligence workloads with their broader data architecture, governance model, and compute strategy. They can now draw down pre-committed cloud spend (like Snowflake credits or Databricks Units) directly.
What this means for AI
Most conversations I'm having with chief marketing officers (CMOs) and chief data officers (CDOs) right now circle the same question: how do we get from AI ambition to production-grade AI-ready deployment? The limiting factor is rarely the model orchestration layer; it’s the quality of the customer data feeding it.
The cost of leaving that question unanswered is no longer theoretical. An AI agent trained on duplicate profiles will recommend a win-back offer to a customer who is already loyal. A model fed inconsistent identity data will personalize the wrong product to the wrong household. AI doesn't make those mistakes by accident. It makes them faster, more confidently, and at greater scale than any team of marketers could on their own. Worse, when those decisions need to be explained or audited, the data lineage often breaks the moment customer profiles leave the lakehouse environment where the model lives.
AI-ready customer data requires strict technical guarantees:
Resolved Identity: Deterministic and probabilistic links must be calculated across every disparate channel.
Explainable Attributes: Profile traits must have deterministic, auditable lineage back to the raw source data.
Co-location: The customer context layer must sit natively alongside the vector stores, semantic layers, and LLM runtimes powering your AI stack.
Amperity's own agentic assistants are a useful example. The Identity Resolution Assistant continuously refines match accuracy across contextual identity graphs and surfaces decisions your team can see, explain, and adjust. The Customer Data Assistant translates plain-language business questions into segments and journeys, no SQL required. Both are AI workloads that depend on the same customer intelligence layer they help produce.
When those assistants run inside your lakehouse alongside the rest of your AI stack, they share governance, lineage, and freshness with the models and pipelines around them. A downstream agent that needs to act on a high-value customer profile is not waiting for a sync. A model that needs to explain a prediction can trace the lineage without crossing a vendor boundary. That is what running the customer intelligence layer natively inside the lakehouse delivers, and what BYOC is built to make possible at enterprise scale.
The market is shifting. Customer intelligence is no longer a separate silo bolted onto the marketing stack. It is evolving into what it always should have been: a governed, high-performance workload running natively inside the enterprise data infrastructure organizations already own and trust. That is the shift BYOC is built for, and the direction Amperity is being built around.
The future of customer data infrastructure belongs to platforms that respect data ownership, embrace architectural flexibility, and deliver true operational optionality. That is the architecture we are building with BYOC and BYOM.
BYOC is now in preview for Snowflake and Databricks. If you're attending Snowflake Summit or Databricks AI Summit in June 2026, we'll be running live demos at both. To see how Amperity runs natively on the modern lakehouse, visit our lakehouse solutions page.
