Using Machine Learning for Customer Identity Building

Chief Product Officer


Today’s Customer Data Platforms (CDPs) build customer identities from a myriad of data sources both external and internal to the enterprise. Building these identities is a complex process involving fast-moving data of different types is automatically ingested into the system. However, with mere automation that lacks proper enrichment and intelligence on the ingestion side, the customer experience that depends on automation would be prone to major data issues such as poor quality, incomplete data, or worst incorrect data. This is where Machine learning can be put to best use in the identity-building process. To ensure that only data of the highest quality is used in the identity-building process, resulting in customer profiles that marketers can rely on to drive maximum ROI from their campaigns. 

As mentioned earlier in an article related to Data Preparation for Customer Identity, we mentioned that supervised machine learning algorithms work best when provided by high-quality data sets. The problem is that rich data sets cannot be found in raw data. Hence, enriched customer profiles and their respective identities cannot emerge out of data ingestion that lacks the foundation of machine learning. 

FirstHive’s innovative approach uses machine learning models for data preparation. These models are stitched to the models that are deployed for customer identity building. Multiple micro-processes come together to what we call the Uniqui-fication process — a four-layered algorithm.

Uniqui-fication — the Customer Identity creation process

Uniqui-fication is a process-driven hybrid algorithm that brings together customer behavioral patterns that matter customer identity creation and enrichment. The algorithm is constructed to peel away the layers of customer interactions to build customer identity. It assimilates breadcrumbs from trails of customer interaction data from disparate data sources to build unique, named customer identities.

Why Uniqui-fication?

During the creation of a Single Customer Identity, the algorithm continuously cross-references the new data points with existing information about Customer Identity, using evolving parameters based on Machine Learning and Data Science. So, Customer Identity is continuously enriched in real-time! Hence, Uniqui-fication is imperative to access enriched customer data and identity in real-time.

Process of Uniqui-fication

FirstHive’s Uniquification algorithm synthesizes data in either real-time or near real-time. This is a fundamental shift from the way traditional analytics companies look at historical data and deliver a retrospective analysis of personas or customer cohorts.


After preparing customer data, data is classified into sections that can be attributed to the unified and unique customer profile that is identified. The classification parameters are selected using supervised machine learning models. These parameters are bucketed using high, medium, and low confidence. 

The system currently identifies a total of 27 parameters based on Supervised Learning in training data. Many other classification parameters are used based on the Unsupervised Learning basis the data steam available to the system & the user-generated inputs.

Customized Clustering algorithm

FirstHive’s clustering algorithm is used to partition a data set into homogeneous groups based on classification parameters such that similar data sets are kept in a group whereas dissimilar data sets are in different groups. For instance, behavioral clustering and segmentation help derive strategic marketing initiatives by using the variables that determine customer shareholder value. 

By conducting demographic clustering and segmentation within the behavioral segments, we can define tactical marketing campaigns and select the appropriate marketing channel and advertising for the tactical campaign. It is then possible to target those customers most likely to exhibit the desired behavior by creating predictive models.

Customer clustering helps in identifying and targeting high-profit, high-value, and low-risk customers.

Real-time Uniqui-fication

Real-time Uniqui-fication algorithms are based on Positive Probability distribution. The data

sets are grouped and classified as highly unique if the probability of a match is high; classified as related data sets if the probability of a match is medium. It is further calibrated with the incoming data about each profile to provide the most enriched customer profile to the ongoing campaigns for accurate activation that a marketer intends to achieve.

Negative Probability impact on Real-time Uniqui-fication process

The probability of the outcome of an experiment is never negative, but quasi-periodic distributions can be defined that allow a negative probability for some unlikely events. These distributions apply to unobservable events or conditional probabilities.

The Real-time Uniqui-fication algorithm in conjunction with the Automation engine ensures relevant, timely communication for a brand with its customer leading to tangible, measurable results.

Rule-based vs Uniqui-fication

While rule-based engines are part of our customer data platform, they are primarily used to define and create criteria. Uniqui-fication which is a combination of clustering and uniqui-fication algorithms crawls beyond predetermined criteria that are set by a rule-based engine. Large customer data sets come with a long list of dimensions that need to be considered for customer identity creation.

Uniqui-fication aids to maintain homogeneity among customer segments, unlike a fully-metered rule-based system. It also allows for dynamic clustering which ensures that each unique customer identity reflects the quality and accuracy achieved with the entire data set. 

Combines Deterministic and Probabilistic Mapping

In the process of customer identity creation, we apply a combination of deterministic and probabilistic mapping processes to build the most enriched customer profile.

Deterministic mapping uses PII information to authenticate the identity of a customer. It could be implemented using a simple form that asks for basic details or a request to re-confirm a password. Though it is easy to execute, it faces its own challenges associated with user adoption. 

On the other hand, probabilistic mapping involves authenticating users based on a database that is called the identity graph. Identity graph includes parameters such as the IP address, unique mobile device IDs, location, WiFi network, data entered in forms, timestamps, and fingerprints. 

Applying both mapping methods resolves challenges associated with customer identity. At FirstHive, we also use a Persistent ID which is an internal identifier that does not change despite changes that may occur to other identifiers. 

Unified Customer Identity

Machine Learning for Marketers

Customer insights that a marketer looks for are hidden in the identity of each customer. This makes identity resolution very fundamental for a marketer’s intelligence. In a CDP, Machine learning models are implemented at different stages of customer identity building.

Some of those quick use cases are:

  • Anonymous Identity resolution and avoid duplication
  • Avoid Fraudulent profiles coming from fraudulent transactions,
  • Unified Customer Profiles despite information input and updates from campaign activation
  • Prioritizing Identifiers across datasets
  • Using a unique ID that aids omni-channel interaction to improve customer experience.
  • Authenticate a customer and the associated customer profile.
  • Connect a customer to different associated marketing channels, IoT devices, products bought online, etc
  • Digital Transformation

Artificial Intelligence can realize its full potential in automating identity authentication only when they can be trained at scale using machine learning algorithms. 

With better customer data marketers tend to have better access to growth opportunities, reach targets faster in an optimal way, achieve efficient spending, improve customer experience, leading to happy customers!

Leave a Reply