blog | 4 min read

Decoding Identity Resolution, Part One: The Basics

May 25, 2022

Illustration of multiple peoples' profiles in a fragmented gray-toned mosaic.

Welcome to our blog series on decoding identity resolution. This is a nine part blog that offers an attempt at a friendly, comprehensive view of how to think about the concept of identity resolution as well as how to interpret the way it is represented in marketing and sales materials by different companies across the tech landscape. The other articles in the series can be found here:

Introduction

In the data landscape, the term “Identity Resolution” is often tossed around, and with good reason —identity resolution is integral to any enterprise-grade data management strategy and doubly so when it comes to delivering high quality customer experiences. 

Done right, it makes possible a whole range of valuable use cases that revolve around being able to make sense of chaotic customer data: it helps with website conversion by allowing brands to identify customers to retarget them; it lets companies know which customers have made transactions so they can contact them; it can give a full history of the customer’s interactions with the brand so that service reps can provide better care and marketers can craft more relevant and engaging experiences that speak to customers as individuals. 

And the customers with the most disorderly data are often a company’s most valuable customers, because they interact with the brand the most frequently across the most channels — these are the high value customers who bring in an outsized chunk of revenue, who you would most want to give a personalized experience and make them feel good about doing business with you.  

But, as mentioned, the term gets tossed around, and we’ve found it isn’t always properly explained or understood. In this series we’ll define identity resolution and look at ways the market talks about it to help you better decipher the truth behind the marketing. 

What is identity resolution?

Identity resolution is the process of comparing different data points and deciding whether they represent the same “entity,” that is, the same person.

Another way to think of it is as “the product or combination of features that compare data to determine which person it is referring to.”

Concepts of identity

People are complex, and the ways they interact with brands are diverse. 

You are a person with a name. You live somewhere that has an address. You communicate using a variety of specific “channels”, each of which are identified by some sort of unique marker in the form of an email, phone number, or social media profile.

In general, though, the market thinks of identity in two major groups: personally identifiable information (PII), and digital signatures.

Personally Identifiable Information (PII)

PII refers to the concepts we are most familiar with, for example:

  • Name

  • Physical address

  • Phone number

  • Email address

For companies there are often unique codes associated with someone as well, for example:

  • Loyalty numbers

  • Customer IDs

Digital identifiers

Most companies interact with customers via a website or a mobile app. These interactions produce “identifiers” that assume there’s a connection between a device and user. Digital identifiers are a logical link between an application and a person but are otherwise anonymous.

Common digital identifiers are:

  • Cookie (first-party vs. third-party): Information generated by a website and stored by your browser, to help understand users’ online behaviors. First-party cookies are directly created and stored by the website (or domain) when a user visits the site. Third-party cookies are created by a third-party domain via code loaded on the website, which tracks users and collects their data for a third party.

  • MAID - Mobile Advertising IDs: Similar to cookies but associated with the mobile device rather than the browser. They’re used within the advertising and personalization ecosystems.

  • IP Address: The identifying number tied to an internet service; i.e. the network the application was accessed from.

  • Device signature / fingerprinting: a process used to identify a device or browser by determining which technology is installed. For example, what kind of device is being used? An iPhone 13 or a Pixel 6 Pro or a laptop browser? Unlike website cookies that are stored on a browser, and can be collected and stored by a company directly.

Types of identity resolution

Definitions of identity resolution vary across the market, but often claims are made without giving too much supporting detail. Here are some of the major forms of identity resolution to familiarize yourself with so you know what providers are offering. 

Digital

Digital identity resolution is a process that compares digital signatures and creates a link when they match. Heavily used in online-focused platforms, the data collected doesn’t contain the concept of an actual person, but rather a “visitor.” The focus is on anonymous interactions from a website.

The idea is that at some point a person will provide PII that identifies themself and the specific devices they use, giving companies a comprehensive view of the user’s previous activity on their web application. 

For more details on this approach, check out part six of this series (coming soon!).

Deterministic (“Rules-Based”)

Rules-based identity resolution is one of the most common legacy methods of resolving identity. A program will compare PII based on a configured set of rules (like “if names are the same then record is a match,” to use a simple example) and create links when they find an explicit match.

This can, however, lack nuance, as anything that satisfies a rule creates a match, which may result in duplications or errors.

For example, most ecommerce or online platforms treat email addresses as a unique identifier because it is a safe channel to reliably send receipts to the buyer. It is common for a person to have multiple emails though, so this method inherently creates duplicate profiles. This makes any resulting analytics less accurate because transactions will seem spread over more buyers than actually exist. 

The third section in this series goes into more detail about the types of algorithms employed for a deterministic ID solution.

Third-party

Third-party identity resolution means working with vendors who “farm” identity and sell it, a practice which is incredibly common. These vendors capture information from all over the internet, create a data asset of “people” with PII and digital signatures, then sell that information.

A company that only collects limited information can use these third-party vendor services to interact with their customers as though they have access to more information. It’s important to understand though that behind the scenes almost all third-party data uses deterministic matching. For example, a company will send the vendor an email address, and the vendors sell access to all of the information they have associated with that email. 

The other important element when working with a third-party vendor is understanding how they are sourcing the data. Where are they buying it? How old is the data? And of course, there are regulatory considerations and growing restrictions on how third-party data can be used.

For more details on this approach, check out part six of this series. (coming soon, promise!)

Advanced data science

The newest innovations in identity resolution come from the data science world. A modern approach is to use the abundant resources available due to the explosive growth of cloud computing to implement complex algorithms and achieve more accurate results.

This commonly gets referred to vaguely as “Machine Learning” or “Artificial Intelligence.” 

What this means is that the algorithm will do thousands more comparisons than a rules-based matching system using several different “models” and provide an identity graph and scores. For more details about this method see part four of our series or part seven for the Amperity specific perspective.

This generally takes more processing time than other methods, but yields vastly superior results. Often the most valuable customers a company has will have engaged enough to provide a vast amount of information and a rules based algorithm will split that person into many different “profiles,” which can leave massive opportunities on the table without a company even realizing it.