What Is an Identity Graph? A Complete Guide

Q: What's the difference between an identity graph and Identity Resolution?

Identity Resolution is the process; an identity graph is the output. Resolution is the active workload that ingests customer data and decides which records belong together. The graph is the structured representation of those decisions, showing which identifiers map to which person and how confident the system is about each connection.

Q: Is an identity graph the same as a customer profile?

No. The customer profile is what your teams read. The identity graph is the layer underneath that determines which records get rolled up into a profile in the first place. A profile shows attributes like name, email, and purchase history. The graph defines the connections that make those attributes belong to the right person.

An identity graph is a data structure that maps a person's fragmented identifiers, like emails, device IDs, loyalty accounts, and transaction records, to the real human behind them. It connects records across systems so an enterprise can recognize the same customer across channels, devices, and sessions.

Key takeaways

An identity graph is the connection layer beneath your customer profile, not the profile itself. It maps which records belong to which person, and how confident the system is about each connection.
Production identity graphs combine three matching methods: deterministic, probabilistic, and transitive. Each handles a different kind of evidence, and accurate enterprise identity requires all three working together.
The biggest architecture decision is first-party vs. third-party. Third-party graphs extend reach beyond your customer base; first-party graphs give you accuracy, ownership, and durability as privacy regulation and platform changes reshape the identity landscape.

What an identity graph is

Every enterprise has the same customer recorded in different systems. The same person shows up as one record in your e-commerce platform, another in your loyalty program, another in your call center logs, and several more in your marketing tools. Each system uses its own identifier and its own matching logic. None of them agree.

An identity graph is the layer that connects those records. Technically, it's a graph in the mathematical sense: nodes are identifiers, and edges are confirmed or inferred connections between them. The graph isn't the customer profile, but the connection map that makes building an accurate profile possible.

The profile is what your teams read. The graph is what determines whether the profile is correct.

How identity graphs work

Three matching methods do most of the work. Most production identity graphs use all three together.

Deterministic matching

Deterministic matching joins records on identifiers that are known to be the same person: a hashed email, a logged-in user ID, a loyalty account number, a hashed phone. Confidence is high. The match is either exact or it isn't.

The limitation is recall. Deterministic matching only sees customers who provide a consistent, persistent identifier across every system. Most customers don't. A customer who buys in-store with a credit card, browses anonymously on the website, opens an email on their phone, and calls support from a different number leaves four different records with no exact join key between them.

Probabilistic matching

Probabilistic matching uses machine learning to evaluate whether two records likely belong to the same person, based on behavioral patterns, demographic signals, name and address variants, device fingerprinting, and timing.

The trade-off is recall versus precision. Probabilistic matching catches connections that deterministic matching misses. It also produces false positives if it's poorly tuned. Enterprise graphs that lean too hard on probabilistic methods without governance create downstream problems: wrong-recipient emails, misattributed purchases, and audience inflation that doesn't survive measurement.

Transitive matching

Transitive matching connects records through chains. If record A matches record B with high confidence, and record B matches record C with high confidence, the graph can infer A and C belong to the same person, even when no direct match between them exists.

Most identity graph explainers skip this step. In practice, transitive matching is what turns a graph from a static join table into a connected customer view. It's also where bad logic does the most damage: one weak link in the chain can collapse hundreds of records into a single incorrect cluster.

Identity graph vs. Identity Resolution

The two terms get used interchangeably, but they describe different things.

Identity Resolution is the process of running matching logic against customer data and deciding which records belong together. It's an active workload. It runs on a schedule, ingests new data, and produces output.

An identity graph is the output of that process. It's the structured representation of which records map to which person, and how confident the system is about each connection.

One resolution process can produce more than one graph. The same underlying matching logic, tuned differently, can generate a probabilistic-biased graph optimized for marketing reach and a deterministic-biased graph optimized for compliance traceability. This is the assumption most identity tooling doesn't surface, and it has implications for how teams should think about the layer underneath their customer data.

First-party vs. third-party identity graphs

Two architectures exist, and they answer different questions.

A third-party identity graph is built and maintained by an external provider. These companies aggregate identifiers across the open web, license them to brands, and provide a syndicated graph that brands query for cross-device or cross-publisher matching. The advantage is reach beyond your own customers. The disadvantage is that you don't own the underlying logic, the data isn't yours, and the durability of the graph depends on the provider's data partnerships and the regulatory environment around third-party identifiers.

A first-party identity graph is built from a brand's own customer records: transactions, loyalty data, support interactions, web and app behavior, email engagement. The brand controls the matching logic, owns the resolved output, and stores the graph inside its own data infrastructure. The trade-off is reach. A first-party graph only knows the customers a brand has direct relationships with. For enterprise brands with significant customer bases, that limitation is increasingly acceptable, because the customers who matter are the ones already in the brand's data.

The market prefers first-party graphs as third-party signal continues to degrade under privacy regulations, browser changes, and platform restrictions.

What identity graphs are used for

The use cases vary, but they share a common requirement: the graph has to be right.

Cross-channel personalization

Recognizing the same customer across email, web, mobile, in-store, and paid channels. Without a graph underneath, every channel reverts to its own definition of the customer, and personalization defaults to lowest-common-denominator messaging.

Measurement and attribution

Connecting touchpoints across the customer journey to outcomes. Without Identity Resolution, marketing measurement counts the same customer multiple times, undercounts cross-channel paths, and produces attribution numbers that no team trusts.

AI model training and customer-facing AI

Training data quality is the largest determinant of model quality. Models trained on fragmented customer records learn from fragmented examples and produce predictions that reflect the fragmentation. AI agents serving customers in real time need a consistent identity to retrieve the right context for each interaction.

Compliance, consent, and governance

Honoring consent, processing deletion requests, and proving lineage all depend on knowing which records belong to which person. A regulator doesn't accept "we have multiple definitions of this customer" as an answer to a deletion audit.

How to build an identity graph

Two paths, and the right one depends on what the brand is trying to solve.

Buying a third-party graph

Working with a third-party provider is appropriate when the use case requires reach beyond your owned customer base: paid media targeting, prospect enrichment, cross-publisher attribution. The graph is hosted and maintained by the provider, and the brand queries it as a service.

Building a first-party graph in your warehouse

Building inside the brand's data infrastructure is appropriate when the use case requires accuracy on known customers: personalization, lifecycle marketing, compliance, AI training, and customer service. This path requires Identity Resolution capability that runs against the brand's source data, produces a stable customer identifier, and outputs the graph in a form the warehouse and downstream tools can use.

Amperity is one of the platforms enterprise brands use for this path. The resolved output lives inside the brand's Snowflake, Databricks, or BigQuery environment, with no data movement required.

The limits of a single identity graph

For most of this article, we've treated "the identity graph" as singular. In practice, that's the assumption that breaks first in enterprise environments.

Marketing teams need a graph tuned for reach. Compliance teams need a graph tuned for traceability. These aren't preferences. They're different match tolerances applied to the same customer data, and a single graph forces one team's tolerance to win at the expense of the others.

The most mature enterprise identity strategies have moved past the assumption that a customer needs one definition. They build a shared identity foundation from first-party data, then operate multiple purpose-built graphs on top of it, each tuned for the decisions a specific team has to make. The foundation stays consistent. The graphs above it serve the work.

See what Identity Resolution looks like on your data

Identity Resolution inside your data estate needs specificity and accuracy to be useful.

The Amperity Data Diagnostic uses your actual customer data to show how your records get matched or misidentified, where identity fragmentation is hiding business cost, and how that fragmentation distorts metrics like customer lifetime value, retention, and segment performance. Results in 48 hours.

Request a free Data Diagnostic.

Identity Graph FAQs

What's the difference between an identity graph and Identity Resolution?

Is an identity graph the same as a customer profile?

What are the main use cases for an identity graph?

The four most common are cross-channel personalization (recognizing the same customer across email, web, mobile, and in-store), measurement and attribution (connecting touchpoints to outcomes accurately), AI model training and customer-facing AI (giving models a consistent identity to learn from and serve), and compliance, consent, and governance (honoring deletion requests and proving lineage at the customer level).

What's the difference between deterministic and probabilistic matching?

Deterministic matching joins records on exact shared identifiers like a hashed email, loyalty ID, or logged-in user ID. Confidence is high but recall is limited, because most customers don't provide consistent identifiers everywhere. Probabilistic matching uses machine learning to infer connections from behavioral, demographic, and contextual signals. It catches connections deterministic matching misses, but introduces false positives if it isn't carefully governed. Production identity graphs use both, plus transitive matching that chains confirmed connections.

Should I buy a third-party identity graph or build a first-party one?

It depends on what the use case requires. Third-party graphs from providers like LiveRamp or Acxiom extend reach beyond your owned customer base, which makes them useful for paid media targeting and prospect enrichment. First-party graphs are built from your own customer data, give you ownership and durability, and produce more accurate results for personalization, lifecycle marketing, compliance, and AI. The market is moving toward first-party as third-party signal degrades.

Do I need an identity graph if I already have a CDP?

Most CDPs include Identity Resolution capabilities of some kind, but the quality varies significantly. Some rely primarily on deterministic matching and lose accuracy at enterprise scale. Others use probabilistic and transitive methods but produce a single graph that every team has to share. If your CDP is producing a customer view all teams trust, you have an identity graph. If different teams keep maintaining their own identity logic outside the CDP, you have a partial solution.