PII Detection in Salesforce Data

What Is PII Detection?

PII Detection scans text fields for personally identifiable information using configurable regex patterns. It answers three questions about your Salesforce data:

Does my data contain PII that needs protection?
How exposed is my dataset?
Which fields hold sensitive information?

DQS profiles the type and density of PII exposure across every text field. It uses pattern-based detection: regex patterns match against field values to flag SSNs, credit cards, emails, phone numbers, and other identifiers.

Three properties define how detection works:

Deterministic. Same input produces the same result every time.
Transparent. You see every pattern DQS applies. No black-box scoring.
On-platform. Detection runs entirely within Salesforce. No data leaves your org.

Why It Matters

Compliance. GDPR, CCPA, HIPAA, and PCI DSS all mandate identifying and protecting PII. You can’t protect what you haven’t found. Automated detection gives you an inventory of exposure across every text field in scope.

AI readiness. Before feeding data to Agentforce or any AI system, you need to know which fields contain PII. Undetected PII in training data or retrieval indexes creates exposure that no downstream filter can fully prevent.

Data governance. Text fields accumulate PII over time. Agents paste email threads into case comments. Customers provide SSNs for verification. Integrations write contact details into description fields. Without detection, this PII sits unprotected.

How DQS Detects PII

DQS runs PII detection as a progressive diagnostic. Each step builds on the previous one.

Step 1: Is There a PII Problem?

Records with PII gives the absolute count of records where at least one pattern matched. This is the scoping number.

For example: you scan Case comments using the Standard preset. Records with PII comes back as 847. That means 847 case records need review before you can safely use the data for AI training or share it with third-party analytics.

Step 2: How Bad Is It?

PII Exposure Rate gives the percentage of scanned records containing pattern matches. The rate puts the count in context.

847 records out of 1,000 is 84.7% exposure, a systemic problem requiring a process change. 847 out of 500,000 is 0.17%, isolated incidents you can address with targeted cleanup.

Step 3: What Kind of PII?

The pattern configuration itself tells you what types were scanned. Each pattern has a category: Financial, Contact, Technical, or Identity. By reviewing which patterns triggered matches, you know whether you’re dealing with credit card leaks, email address exposure, or SSN contamination.

The 8 Detection Patterns

DQS ships with 8 predefined regex patterns organized into 4 categories.

Financial

Pattern	What It Matches	False Positive Risk
Social Security Number	US SSN in NNN-NN-NNNN format	Low. The hyphenated format is distinctive.
Credit Card Number	13-16 digit sequences with optional spaces/hyphens	Medium. Long numeric sequences (order numbers, tracking IDs) can false-match.
IBAN	International bank account numbers (ISO 13616 format)	Low. The country code + check digit prefix is distinctive.

Contact

Pattern	What It Matches	False Positive Risk
Email Address	Standard [email protected] format	Low. The @ symbol structure is distinctive.
US Phone Number	US/Canadian formats: (NNN) NNN-NNNN, NNN-NNN-NNNN, +1 variants	Medium. 10-digit numbers with separators can match non-phone data.
International Phone	E.164-style numbers starting with + country code	Low. The + prefix is a strong signal.

Technical

Pattern	What It Matches	False Positive Risk
IP Address	IPv4 dotted decimal (NNN.NNN.NNN.NNN)	Low-Medium. Software version numbers are the main false-positive source.

Identity

Pattern	What It Matches	False Positive Risk
Date of Birth	US date format MM/DD/YYYY or MM-DD-YYYY	High. Matches any US-formatted date. Best paired with field-level targeting.

DQS uses regex-only pattern matching. Detection is format-based, not contextual. There is no checksum validation (Luhn for credit cards, modulo-97 for IBAN), no keyword proximity boosting, and no ML-based confidence scoring. Every match is binary: the pattern matched or it didn’t. This makes detection fully auditable and deterministic, but you need to review matches on fields with high false-positive risk.

Regulatory Coverage

All 8 patterns are grounded in major privacy and security frameworks.

Pattern	NIST 800-122	GDPR	CCPA	PCI DSS	HIPAA	ISO 27701
SSN	X	X	X		X	X
Credit Card	X	X	X	X		X
Email	X	X	X		X	X
US Phone		X	X		X	X
Intl Phone		X	X		X	X
IP Address		X	X		X	X
IBAN		X				X
Date of Birth	X	X	X		X	X

These are the same identifier types detected as built-in patterns by Google Cloud DLP, AWS Macie, and Microsoft Purview. The difference: cloud DLP tools use multi-layered detection (regex + checksum + keyword proximity + ML). DQS uses regex-only matching, which is simpler and fully transparent but does not provide confidence scoring.

Three Detection Presets

Presets configure which patterns are active in a single click.

Preset	Patterns	Count	When to Use
Standard	SSN, Credit Card, Email, US Phone	4	General PII audit. Covers the four most common types with manageable false-positive rates. This is the default.
Critical	SSN, Credit Card	2	Financial compliance check. Minimum scan for identity theft and payment card exposure. Use when you need fast results with near-zero false positives.
Extended	All 8 patterns	8	Full scan. Includes IBAN, IP Address, Date of Birth, and International Phone. Higher false-positive rate in exchange for maximum coverage. Best for first-time audits and compliance assessments.

You can also add custom regex patterns beyond the 8 predefined ones. Custom patterns are validated server-side before they can be saved. Any valid regex works.

Metric Reference

Foundation Metrics

Metric	Type	What It Returns
Records with PII	Count (integer)	Number of records where at least one pattern matched. A record is counted once regardless of how many patterns matched or how many matches exist within it.

Advanced Metrics

Metric	Type	What It Returns
PII Exposure Rate	Percentage	Percentage of scanned records containing PII matches. This is the headline exposure number for reports and dashboards.

Field Type Coverage

Metric	String	TextArea	Email	Phone	EncryptedString	LongTextArea	Html
Records with PII	X	X	X	X	X
PII Exposure Rate		X				X	X

Records with PII casts a wide net across all text field types. PII Exposure Rate focuses on longer text fields where PII density is meaningful. A 255-character String field matching an email regex is a single data point. A 32,000-character LongTextArea with 15 SSN matches tells a different story.

Two Analysis Modes

DQS runs PII Detection in two modes.

PII Scan processes all selected fields using the configured patterns and returns Records with PII. This mode answers: “Do I have a PII problem?” Use it for quick audits before data migrations or AI projects.

PII Detection Analysis adds PII Exposure Rate on top of Records with PII. The exposure rate gives context to the raw count, turning “847 records contain PII” into “12.3% of your dataset is exposed.” Use this mode for compliance reporting and ongoing governance.

Configuring PII Detection

Input	What It Controls
Detection Patterns	Which of the 8 predefined patterns are active. Pick a preset or toggle individual patterns.
Custom Patterns	Any valid regex pattern, validated server-side. Added alongside predefined patterns.
Per-Field Overrides	Different pattern sets for different fields. Override the global configuration on a field-by-field basis.

Choosing Patterns by Field Type

Different fields need different pattern sets. An Email field already contains email addresses by design. Scanning it for email patterns produces 100% matches, which is expected, not a problem. A Case Description field is free text where any PII type can appear. Configure patterns based on what you expect to find vs. what signals a problem.

Example configurations:

Email fields: Scan for SSN and Credit Card only (email matches are expected)
Description and Notes fields: Use Standard or Extended preset (free text can contain anything)
Short text fields (Subject, Title): Use Critical preset only (low tolerance for false positives)

Common Issues

Issue	Cause	Fix
100% PII match rate on Email field	Email pattern matches the field’s intended content	Remove the email pattern from that field’s override, or exclude the field from PII scanning
High false positives on Date of Birth	The DOB pattern matches any US-formatted date (meeting dates, deadlines)	Use field-level overrides to apply the DOB pattern only on fields where birth dates are a known risk
No matches found despite known PII	SSN regex only matches hyphenated format (NNN-NN-NNNN), not 9 consecutive digits	Add a custom pattern for the specific format in your data. Example: `\b\d{9}\b` for unformatted SSNs (high false-positive risk)

Best Practices

Start with the Standard preset on free-text fields. Run an initial scan to understand your baseline before expanding to Extended.
Use field-level overrides to tune detection per field. Global patterns cast a wide net. Per-field overrides eliminate noise.
Scan unstructured text fields first. Description, Comments, and Notes fields are where PII accumulates through copy-paste and email-to-case. Structured fields (Email, Phone) contain PII by design.
Review matches on high-FP patterns like Date of Birth before treating them as confirmed PII. These patterns produce more false positives than SSN or Email.
Pair Records with PII (absolute count) with PII Exposure Rate (percentage) for a complete picture. The count scopes your cleanup effort. The rate tells you whether it’s a systemic problem or isolated incidents.

Next Steps

Data Quality in Salesforce: how PII detection fits the bigger picture
Agentforce Preparation: complete deployment readiness guide