Lakehouse-native CDP: Why we built Amperity BYOC

Q: What is Amperity BYOC?

Amperity Bring Your Own Compute (BYOC) is an architectural deployment option that runs Amperity's core processing workloads (Identity Resolution, customer data table generation, queries, segments, campaigns, and journeys) directly on your Databricks compute. Your customer data never leaves your environment. Only scoped job results surface in the Amperity interface.

Q: How is BYOC different from Amperity Bridge and BYOS?

The three deployment options work together. Amperity Bridge enables native read and write access against your lakehouse. Bring Your Own Storage (BYOS) keeps your customer data in your own S3 or Azure storage. BYOC runs Amperity's processing workloads on your Databricks compute, completing the architecture where intelligence comes to your data.

Q: Does Amperity BYOC work with Databricks?

Yes. BYOC is in preview for Databricks. Amperity workloads run natively on your existing compute environment. This means workloads draw down the Databricks units your organization has already pre-purchased.

Q: Does customer data leave my environment with Amperity BYOC?

No. Sensitive customer data remains inside your Databricks environment for processing. Security teams evaluate a narrower data path, and only scoped job results surface back to the Amperity interface. Your existing governance policies, audit trail, and access controls apply throughout.

Q: How does Amperity compare to building Identity Resolution in-house?

In-house Identity Resolution at petabyte scale requires sustained investment in machine learning, data engineering, and governance tooling. Most enterprise teams that have attempted it can describe the year the project started but not a finish line. One multinational automotive manufacturer implemented Amperity 72% faster than their previous failed in-house attempt.

Q: What does "Your data is your data" mean in practice?

It means Amperity does not lock your data into a proprietary format or vendor-controlled silo. Your underlying data foundation, resolved identities, and customer attributes reside inside your own cloud storage and lakehouse, preserving complete data ownership and long-term architectural optionality.

A data engineering leader at a multinational automotive manufacturer put it to me directly last quarter: "We have spent four years and tens of millions of dollars building the data foundation. We still can't tell you, with confidence, how many unique customers we have." Every AI and analytics initiative the business is asking them to deliver inherits this foundational gap. When models are trained on fragmented identities and inconsistent customer context, they yield confident, but fundamentally wrong answers about who to target, what to offer, and when to act. The infrastructure investment might be substantial, but the customer context layer underneath it is missing. In an AI-driven ecosystem, the cost of this missing layer has become impossible to ignore.

To solve this, we built Bring Your Own Compute (BYOC). Rather than forcing data to migrate to a third-party platform, BYOC allows enterprises to run heavy customer intelligence workloads as governed processes directly within their own Databricks environments.

This represents a deliberate architectural shift away from monolithic data silos and toward a composable, lakehouse-native ecosystem.

The hardest problem in customer data isn't where it lives. It's knowing who it belongs to.

Identity Resolution at petabyte scale is not a storage problem and not an orchestration problem. It is a machine learning (ML) problem with a specific shape.

It turns deterministic matching, probabilistic matching, and transitive logic operating across messy first-party personally identifiable information (PII) that adapts every time a customer shares a new email, changes a phone number, or buys on a new device.

Databricks is excellent at running ML workloads, but it wasn't designed to solve that particular problem on its own.

Two categories of vendors have tried to solve it with shortcuts.

The activation-first approach, championed by Hightouch and the broader reverse-ETL category, has done useful work moving data efficiently between warehouses and downstream tools. The limitation is assuming customer identity is already solved. When that foundation is incomplete, activation just moves imperfect data faster.

Most customer data platforms (CDPs) were originally architected for an era when customer data lived primarily outside the warehouse. The vendor's environment was the natural place to run customer intelligence workloads, and for many enterprises, it still is. What's changed is that enterprise data has migrated into the lakehouse. BYOC reflects that shift, giving teams the option to run customer intelligence workloads inside the same environment as the rest of their data and AI stack.

Another option is to build Identity Resolution in-house. Plenty of enterprises have tried. Most of the data leaders I've talked to can tell you the year the project started. None of them are eager to talk about a finish line. The manufacturer I mentioned at the top of this post spent 18 months on a prior implementation effort before partnering with us. Once we stepped in, we helped get the project across the finish line, accelerating implementation by 72% compared to the previous approach.

Should the intelligence come to the data, or the data go to the intelligence?

Every step in our architecture has been an answer to that question. The answer at enterprise scale has been clear for years. Each step has moved Amperity closer to the lakehouse, not further from it.

We started with Amperity Bridge in 2024. Bridge gave us native read and write access so customer data could stay in the lakehouse while Amperity processed it. The replication step previous platforms required largely went away. Bridge became one of the fastest-adopted features in our history.

Bring Your Own Storage (BYOS) came next. Your customer data lives in your own S3 or Azure storage rather than ours, with delegated credentials and your governance policies in force. The duplicate storage footprint went away too.

With BYOC, customer intelligence runs directly inside your governed environment. Processing executes on your Databricks cluster. Your customer data never leaves the perimeter your security team already approved. The only data that surfaces in the Amperity interface is scoped job results.

We also recognize that enterprise data science teams have unique, domain-specific requirements. So in the near future, we will expand the stack by introducing the BYOM (Bring Your Own Model) support. With BYOM, brands will be able to ingest their own proprietary ML models directly into the Amperity workflow running inside their lakehouse, on top of the high-fidelity identity graph generated by Amperity.

Why running on your compute changes the math

Running customer intelligence inside your own compute cluster fundamentally alters the deployment timeline and economics of data projects:

Zero-Trust Security Alignment: Because data at rest stays at rest, there is no new third-party data movement path for your Chief Information Security Officer (CISO) to vet. Your existing governance policies, audit trails, and RBAC frameworks remain active. For highly regulated sectors like healthcare and financial services and for enterprises operating under frameworks like GDPR, CCPA, or HIPAA, this eliminates quarters of security reviews.
Streamlined Data Lineage: Processing data where it sits removes the cross-environment syncs that bottleneck petabyte-scale Identity Resolution. The same engineering team that runs your lakehouse runs your customer intelligence workloads, on the same cluster, with the same tooling. For many enterprises, this can compress the path from raw data to trusted segments from quarters to weeks.
Optimized Cloud Spend: The economics work differently as well. With BYOC, selected Amperity workloads can run in the customer’s Databricks environment, helping teams leverage existing lakehouse investments and align customer intelligence workloads with their broader data architecture, governance model, and compute strategy. They can now draw down pre-committed cloud spend (like Databricks Units) directly.

What this means for AI

Most conversations I'm having with chief marketing officers (CMOs) and chief data officers (CDOs) right now circle the same question: how do we get from AI ambition to production-grade AI-ready deployment? The limiting factor is rarely the model orchestration layer; it’s the quality of the customer data feeding it.

The cost of leaving that question unanswered is no longer theoretical. An AI agent trained on duplicate profiles will recommend a win-back offer to a customer who is already loyal. A model fed inconsistent identity data will personalize the wrong product to the wrong household. AI doesn't make those mistakes by accident. It makes them faster, more confidently, and at greater scale than any team of marketers could on their own. Worse, when those decisions need to be explained or audited, the data lineage often breaks the moment customer profiles leave the lakehouse environment where the model lives.

AI-ready customer data requires strict technical guarantees:

Resolved Identity: Deterministic and probabilistic links must be calculated across every disparate channel.
Explainable Attributes: Profile traits must have deterministic, auditable lineage back to the raw source data.
Co-location: The customer context layer must sit natively alongside the vector stores, semantic layers, and LLM runtimes powering your AI stack.

Amperity's own agentic assistants are a useful example. The Identity Resolution Assistant continuously refines match accuracy across contextual identity graphs and surfaces decisions your team can see, explain, and adjust. The Customer Data Assistant translates plain-language business questions into segments and journeys, no SQL required. Both are AI workloads that depend on the same customer intelligence layer they help produce.

When those assistants run inside your lakehouse alongside the rest of your AI stack, they share governance, lineage, and freshness with the models and pipelines around them. A downstream agent that needs to act on a high-value customer profile is not waiting for a sync. A model that needs to explain a prediction can trace the lineage without crossing a vendor boundary. That is what running the customer intelligence layer natively inside the lakehouse delivers, and what BYOC is built to make possible at enterprise scale.

The market is shifting. Customer intelligence is no longer a separate silo bolted onto the marketing stack. It is evolving into what it always should have been: a governed, high-performance workload running natively inside the enterprise data infrastructure organizations already own and trust. That is the shift BYOC is built for, and the direction Amperity is being built around.

The future of customer data infrastructure belongs to platforms that respect data ownership, embrace architectural flexibility, and deliver true operational optionality. That is the architecture we are building with BYOC and BYOM.

BYOC is now in preview for Databricks. If you're attending Databricks AI Summit in June 2026, we'll be running live demos at both. To see how Amperity runs natively on the modern lakehouse, visit our lakehouse solutions page.

Lakehouse-Native FAQs

What is Amperity BYOC?

How is BYOC different from Amperity Bridge and BYOS?

Does Amperity BYOC work with Databricks?

Does customer data leave my environment with Amperity BYOC?

How does Amperity compare to building Identity Resolution in-house?

What does "Your data is your data" mean in practice?