8 min read

Customer Match Audience Hashing Explained

Mounir Nejjai

Connect on LinkedIn

Updated on

June 23, 2026

Categories and Tags

SFMC Tips

Customer Match

Audience Matching

SFMC

Cezium Ads Team

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

When marketers upload customer lists to Meta, Google, or TikTok, the first reassurance they are usually given is: "don't worry, the data is hashed." That statement is true but incomplete in ways that matter — for match rates, for compliance, and for your security team's due diligence when evaluating any vendor that touches customer data.

This post is the technical explainer that security teams and marketers can both read. It covers what hashing actually is, the critical normalization step that most documentation buries in a footnote, the full end-to-end flow from your database to a matched audience, and the honest privacy nuance that "hashed" does not mean what most people assume it means.

What Hashing Is (and What It Is Not)

A hash function takes an input of any length and produces a fixed-length output, called a digest or hash. SHA-256 produces a 256-bit (64-character hexadecimal) output. The function is deterministic — the same input always produces the same output — and it is one-way, meaning you cannot reverse the hash to recover the original input by computation.

For the string john.smith@example.com, SHA-256 produces:

e3d4f2b1a8c7d6e5f4a3b2c1d0e9f8a7b6c5d4e3f2a1b0c9d8e7f6a5b4c3d2e1

(Illustrative. The actual hash depends on exact normalization.)

This is useful for audience matching because it lets two parties — you and an ad platform — check whether they have data on the same person without either party sharing the underlying email address or phone number with the other. You send the hash. The platform compares it against hashes of its own user data. If there is a match, the user is added to the audience. Neither the email you sent nor the email in the platform's records is exposed to the other party.

That is the theory. The practice has complications.

The Step Everyone Skips: Normalization

SHA-256 is deterministic. That is its most important property for matching, and also its biggest operational hazard. The same input always produces the same output — but different inputs, even trivially different ones, produce entirely different outputs.

john.smith@example.com and John.Smith@example.com are the same email address in any reasonable interpretation. To SHA-256, they are completely different strings. The hashes will not match.

This means that before you apply any hash function, you must normalize your data to a canonical form. For audience hashing, the normalization rules are:

Email addresses:

Convert to lowercase
Remove leading and trailing whitespace
Do not remove dots in Gmail addresses (contrary to some older guidance; platform behavior varies)

Phone numbers:

Convert to E.164 format: +[country code][number], digits only, no spaces, hyphens, or parentheses
Example: (415) 555-0123 → +14155550123

Names:

Lowercase
Remove leading and trailing whitespace
Some platforms have additional guidance on handling special characters and accents

These rules are documented by each platform (Google, Meta, TikTok all publish their normalization specs), but they are easy to implement inconsistently, especially when data comes from multiple source systems. A phone number stored as 415-555-0123 in your CRM will produce a different hash than +14155550123. The result is a silent match failure: no error, no warning, just a lower match rate that is genuinely difficult to diagnose without auditing the normalization step directly.

Normalization is not a technical detail. It is the single most impactful lever you have on match rate quality — more than the number of fields you send, more than the freshness of your data. Getting it wrong silently destroys the value of your entire activation investment. If you are trying to improve your ad match rates, normalization is the first thing to audit.

The Full End-to-End Flow

Here is what actually happens when a customer list goes to an ad platform, step by step.

1. Export from your system of record. You pull a segment from your CRM, marketing automation platform, or data warehouse. The output is a set of customer attributes — email, phone, name, postal code, country.

2. Normalize. Each field is converted to its canonical form: lowercase email, E.164 phone, lowercase trimmed name. This step must happen before hashing.

3. Hash. Each normalized value is run through SHA-256. The output for each field is a fixed-length hex string. Your file now contains hashed values instead of plaintext customer attributes.

4. Upload to the platform API. The hashed values are sent to the platform's audience API — Meta's Custom Audiences API, Google's Customer Match via the Data Manager API, TikTok's Custom Audience API. The platform receives only the hashed data.

5. Platform hashes its own user data the same way. The platform applies the same normalization and SHA-256 hashing to the email addresses and phone numbers in its own user identity graph.

6. Match ciphertexts. The platform compares your hashes against its hashes. Because both sides used the same normalization and the same hash function, matching records produce identical hashes. Matching is performed on hashes, never on plaintext.

7. Platform discards your list. After matching, the platform deletes the uploaded hashes. What remains is an audience segment — a set of platform user IDs — with no reference back to your customer data.

8. Platform reports match rate. You see a percentage: what fraction of your uploaded records found a match in the platform's identity graph. The match rate is entirely determined by overlap between your customer base and the platform's user base, plus the quality of your normalization. It is not determined by your sync tool or upload method.

The Privacy Nuance: Hashing Is Not Anonymization

This is the most important thing your legal and compliance team needs to understand: hashed personal data is still personal data.

Hashing is pseudonymization, not anonymization. Pseudonymization means the data has been transformed so that it cannot be attributed to a specific individual without additional information — in this case, the original unhashed value. Anonymization means the data cannot be re-linked to an individual at all, under any realistic circumstances.

A hashed email address is not anonymous because:

It is deterministic and linkable. Anyone who has the original email address can compute its SHA-256 hash and confirm the match. The hash does not protect against someone who already has the email — it protects against someone who does not.
Rainbow tables and precomputed hash databases exist. For common email addresses (major domains, common name patterns), precomputed hash-to-plaintext mappings are computationally feasible. Hashing does not prevent reverse-engineering for high-probability inputs.
Regulators treat it as personal data. Under GDPR, the UK ICO, and most comparable frameworks, pseudonymous data remains in scope. Consent obligations, data subject rights, and retention rules all apply to hashed customer lists.

The practical implication: uploading hashed customer lists does not change your consent obligations. If a user has not consented to their data being used for advertising targeting, hashing their email address before sending it to Meta does not make that use lawful. The hash is a security measure, not a legal workaround.

Where Hashing Happens Matters

Not all hashing is equal, and this is the question to ask any vendor that handles your customer data.

There are two architectures:

Client-side hashing (in your environment). Your data is normalized and hashed before it leaves your systems. The vendor or platform API receives only hashed values. Your plaintext customer data never leaves your perimeter.

Server-side hashing (in the vendor's environment). You send plaintext customer data to the vendor's servers, and the vendor hashes it before forwarding to the ad platform. Your data leaves your perimeter in plaintext.

Both approaches produce the same end result at the platform — SHA-256 hashes. But they are fundamentally different from a security and compliance standpoint. If you hand plaintext email addresses to a third-party vendor for them to hash, your data has left your environment unprotected. You have introduced a data processor relationship that requires a DPA under GDPR, and you have accepted risk during transmission and on the vendor's infrastructure.

Questions to ask any audience sync vendor:

At what point is the data hashed? In our system, or yours?
Does your infrastructure ever store plaintext customer identifiers?
If hashing happens on your servers, what are your data retention policies?
Do you have a DPA and sub-processor list available?

The architecture answer should be clear and unambiguous. If the vendor is vague about when hashing happens, treat that as a red flag.

Multiple Keys, Multiple Hashes

Email address is the most commonly sent identifier, but it is not the only one — and match rates improve substantially when you send more. Each additional key gives the platform another opportunity to match a user in its identity graph.

The fields most platforms accept:

Field	Format	Notes
Email address	Lowercase, trimmed	Primary key for most platforms
Phone number	E.164 (`+14155550123`)	Strong signal; often the best complement to email
First name	Lowercase, trimmed	Improves confidence when combined with other keys
Last name	Lowercase, trimmed	Same
Postal/ZIP code	Platform-specific	Varies by country format
Country	ISO 3166-1 alpha-2	Required for phone matching on some platforms
Mobile advertising ID	Raw IDFA or AAID	For mobile-heavy audiences

Each field is hashed independently and sent as a separate column. The platform's matching logic combines signals — a record that matches on email AND phone AND name is a higher-confidence match than email alone.

The consistent theme: normalization per field must be correct, and different fields have different normalization rules. Phone normalization is the most commonly botched.

What This Means for Your Sync Tooling

The technical requirements here have direct implications for how you should evaluate any tool that moves data from your systems to ad platforms.

The tool needs to:

Apply correct, platform-specific normalization rules before hashing — not a generic lowercase-and-trim
Hash inside your environment, not on the vendor's servers
Send all available identifier fields, not just email
Handle updates: new customers added, churned customers removed, opt-outs propagated
Provide an audit trail of what was sent, when, and to which platform

If a tool is doing CSV exports and manual uploads, normalization consistency depends entirely on whoever built the export query. Opt-out propagation is manual and delayed. The compliance risk of CSV-based audience operations deserves serious consideration before you standardize on that workflow.

Where Cezium Fits

Cezium Ads handles hashing inside your own Salesforce Marketing Cloud instance. When you create an audience, Cezium generates an automation within your own MC environment that normalizes identifiers and applies SHA-256 hashing before any data leaves your systems. No plaintext customer data is sent to Cezium's infrastructure, and nothing is stored externally. Connections to ad platforms are OAuth 2.0, controlled by your IT team, and opt-outs and deletions propagate on every sync cycle.

For SFMC teams evaluating post-Advertising Studio options, the SFMC Advertising Studio Google Customer Match migration guide covers the specific API and consent changes that took effect in April 2026.

Hashing is the right mechanism for audience matching, and SHA-256 is the right algorithm. But the implementation details — normalization, where hashing happens, what keys you send, how opt-outs propagate — determine whether your activation program is secure, compliant, and effective, or just technically hashed.

The word "hashed" in a vendor pitch is a starting point for questions, not an endpoint.

Mounir Nejjai is the founder of Cezium.

About the author

Mounir Nejjai

Founder & CEO, Cezium

Mounir Nejjai is the founder and CEO of Cezium, the SFMC-native audience activation platform. A recognized Salesforce Marketing Champion, he helps enterprise marketing teams move first-party data out of Salesforce Marketing Cloud and into the ad platforms where it drives revenue. He writes on audience activation, SFMC advertising, and the post–Advertising Studio landscape. Based in Paris.

Connect on LinkedIn →

Ready to transform your CRM audience activation?

Join marketers who've simplified their workflow with Cezium Ads

Book a Demo Get Migration Kit