Research & Reports

Entity Matching: Unifying Data in Industrial Applications

Image displaying a guide to Entity Matching.

Entity Matching in the Wild: a Consistent and Versatile Framework to Unify Data in Industrial Applications

Entity matching is a fundamental operation that occurs in virtually all modern data management tasks. In this paper, we explained three main challenges when deploying identity resolution systems in real-world, large-scale data applications.

These challenges include:

  1. How to support clustering at multiple confidence levels to enable downstream applications with varying precision/recall trade-off needs

  2. How to combine different sources of data to create a more comprehensive profile of their customers without incorrect entity merges.

  3. How to cluster records overtime and assign persistent cluster IDs that can be used for downstream use cases such as A/B tests or predictive model training

Download the research paper