An identity graph is a data structure that maps a person's fragmented identifiers, like emails, device IDs, loyalty accounts, and transaction records, to the real human behind them. It connects records across systems so an enterprise can recognize the same customer across channels, devices, and sessions.
Key takeaways
An identity graph is the connection layer beneath your customer profile, not the profile itself. It maps which records belong to which person, and how confident the system is about each connection.
Production identity graphs combine three matching methods: deterministic, probabilistic, and transitive. Each handles a different kind of evidence, and accurate enterprise identity requires all three working together.
The biggest architecture decision is first-party vs. third-party. Third-party graphs extend reach beyond your customer base; first-party graphs give you accuracy, ownership, and durability as privacy regulation and platform changes reshape the identity landscape.
What an identity graph is
Every enterprise has the same customer recorded in different systems. The same person shows up as one record in your e-commerce platform, another in your loyalty program, another in your call center logs, and several more in your marketing tools. Each system uses its own identifier and its own matching logic. None of them agree.
An identity graph is the layer that connects those records. Technically, it's a graph in the mathematical sense: nodes are identifiers, and edges are confirmed or inferred connections between them. The graph isn't the customer profile, but the connection map that makes building an accurate profile possible.
The profile is what your teams read. The graph is what determines whether the profile is correct.
How identity graphs work
Three matching methods do most of the work. Most production identity graphs use all three together.
Deterministic matching
Deterministic matching joins records on identifiers that are known to be the same person: a hashed email, a logged-in user ID, a loyalty account number, a hashed phone. Confidence is high. The match is either exact or it isn't.
The limitation is recall. Deterministic matching only sees customers who provide a consistent, persistent identifier across every system. Most customers don't. A customer who buys in-store with a credit card, browses anonymously on the website, opens an email on their phone, and calls support from a different number leaves four different records with no exact join key between them.
Probabilistic matching
Probabilistic matching uses machine learning to evaluate whether two records likely belong to the same person, based on behavioral patterns, demographic signals, name and address variants, device fingerprinting, and timing.
The trade-off is recall versus precision. Probabilistic matching catches connections that deterministic matching misses. It also produces false positives if it's poorly tuned. Enterprise graphs that lean too hard on probabilistic methods without governance create downstream problems: wrong-recipient emails, misattributed purchases, and audience inflation that doesn't survive measurement.
Transitive matching
Transitive matching connects records through chains. If record A matches record B with high confidence, and record B matches record C with high confidence, the graph can infer A and C belong to the same person, even when no direct match between them exists.
Most identity graph explainers skip this step. In practice, transitive matching is what turns a graph from a static join table into a connected customer view. It's also where bad logic does the most damage: one weak link in the chain can collapse hundreds of records into a single incorrect cluster.
Identity graph vs. Identity Resolution
The two terms get used interchangeably, but they describe different things.
Identity Resolution is the process of running matching logic against customer data and deciding which records belong together. It's an active workload. It runs on a schedule, ingests new data, and produces output.
An identity graph is the output of that process. It's the structured representation of which records map to which person, and how confident the system is about each connection.
One resolution process can produce more than one graph. The same underlying matching logic, tuned differently, can generate a probabilistic-biased graph optimized for marketing reach and a deterministic-biased graph optimized for compliance traceability. This is the assumption most identity tooling doesn't surface, and it has implications for how teams should think about the layer underneath their customer data.
First-party vs. third-party identity graphs
Two architectures exist, and they answer different questions.
A third-party identity graph is built and maintained by an external provider. These companies aggregate identifiers across the open web, license them to brands, and provide a syndicated graph that brands query for cross-device or cross-publisher matching. The advantage is reach beyond your own customers. The disadvantage is that you don't own the underlying logic, the data isn't yours, and the durability of the graph depends on the provider's data partnerships and the regulatory environment around third-party identifiers.
A first-party identity graph is built from a brand's own customer records: transactions, loyalty data, support interactions, web and app behavior, email engagement. The brand controls the matching logic, owns the resolved output, and stores the graph inside its own data infrastructure. The trade-off is reach. A first-party graph only knows the customers a brand has direct relationships with. For enterprise brands with significant customer bases, that limitation is increasingly acceptable, because the customers who matter are the ones already in the brand's data.
The market prefers first-party graphs as third-party signal continues to degrade under privacy regulations, browser changes, and platform restrictions.
What identity graphs are used for
The use cases vary, but they share a common requirement: the graph has to be right.
Cross-channel personalization
Recognizing the same customer across email, web, mobile, in-store, and paid channels. Without a graph underneath, every channel reverts to its own definition of the customer, and personalization defaults to lowest-common-denominator messaging.
Measurement and attribution
Connecting touchpoints across the customer journey to outcomes. Without Identity Resolution, marketing measurement counts the same customer multiple times, undercounts cross-channel paths, and produces attribution numbers that no team trusts.
AI model training and customer-facing AI
Training data quality is the largest determinant of model quality. Models trained on fragmented customer records learn from fragmented examples and produce predictions that reflect the fragmentation. AI agents serving customers in real time need a consistent identity to retrieve the right context for each interaction.
Compliance, consent, and governance
Honoring consent, processing deletion requests, and proving lineage all depend on knowing which records belong to which person. A regulator doesn't accept "we have multiple definitions of this customer" as an answer to a deletion audit.
How to build an identity graph
Two paths, and the right one depends on what the brand is trying to solve.
Buying a third-party graph
Working with a third-party provider is appropriate when the use case requires reach beyond your owned customer base: paid media targeting, prospect enrichment, cross-publisher attribution. The graph is hosted and maintained by the provider, and the brand queries it as a service.
Building a first-party graph in your warehouse
Building inside the brand's data infrastructure is appropriate when the use case requires accuracy on known customers: personalization, lifecycle marketing, compliance, AI training, and customer service. This path requires Identity Resolution capability that runs against the brand's source data, produces a stable customer identifier, and outputs the graph in a form the warehouse and downstream tools can use.
Amperity is one of the platforms enterprise brands use for this path. The resolved output lives inside the brand's Snowflake, Databricks, or BigQuery environment, with no data movement required.
The limits of a single identity graph
For most of this article, we've treated "the identity graph" as singular. In practice, that's the assumption that breaks first in enterprise environments.
Marketing teams need a graph tuned for reach. Compliance teams need a graph tuned for traceability. These aren't preferences. They're different match tolerances applied to the same customer data, and a single graph forces one team's tolerance to win at the expense of the others.
The most mature enterprise identity strategies have moved past the assumption that a customer needs one definition. They build a shared identity foundation from first-party data, then operate multiple purpose-built graphs on top of it, each tuned for the decisions a specific team has to make. The foundation stays consistent. The graphs above it serve the work.
See what Identity Resolution looks like on your data
Identity Resolution inside your data estate needs specificity and accuracy to be useful.
The Amperity Data Diagnostic uses your actual customer data to show how your records get matched or misidentified, where identity fragmentation is hiding business cost, and how that fragmentation distorts metrics like customer lifetime value, retention, and segment performance. Results in 48 hours.
