What Is PII Detection?
PII Detection scans text fields for personally identifiable information using configurable regex patterns. It answers three questions about your Salesforce data:
- Does my data contain PII that needs protection?
- How exposed is my dataset?
- Which fields hold sensitive information?
DQS profiles the type and density of PII exposure across every text field. It uses pattern-based detection: regex patterns match against field values to flag SSNs, credit cards, emails, phone numbers, and other identifiers.
Three properties define how detection works:
- Deterministic. Same input produces the same result every time.
- Transparent. You see every pattern DQS applies. No black-box scoring.
- On-platform. Detection runs entirely within Salesforce. No data leaves your org.
Why It Matters
Compliance. GDPR, CCPA, HIPAA, and PCI DSS all mandate identifying and protecting PII. You can’t protect what you haven’t found. Automated detection gives you an inventory of exposure across every text field in scope.
AI readiness. Before feeding data to Agentforce or any AI system, you need to know which fields contain PII. Undetected PII in training data or retrieval indexes creates exposure that no downstream filter can fully prevent.
Data governance. Text fields accumulate PII over time. Agents paste email threads into case comments. Customers provide SSNs for verification. Integrations write contact details into description fields. Without detection, this PII sits unprotected.
How DQS Detects PII
DQS runs PII detection as a progressive diagnostic. Each step builds on the previous one.
Step 1: Is There a PII Problem?
Records with PII gives the absolute count of records where at least one pattern matched. This is the scoping number.
For example: you scan Case comments using the Standard preset. Records with PII comes back as 847. That means 847 case records need review before you can safely use the data for AI training or share it with third-party analytics.
Step 2: How Bad Is It?
PII Exposure Rate gives the percentage of scanned records containing pattern matches. The rate puts the count in context.
847 records out of 1,000 is 84.7% exposure, a systemic problem requiring a process change. 847 out of 500,000 is 0.17%, isolated incidents you can address with targeted cleanup.
Step 3: What Kind of PII?
The pattern configuration itself tells you what types were scanned. Each pattern has a category: Financial, Contact, Technical, or Identity. By reviewing which patterns triggered matches, you know whether you’re dealing with credit card leaks, email address exposure, or SSN contamination.
The 8 Detection Patterns
DQS ships with 8 predefined regex patterns organized into 4 categories.
Financial
| Pattern | What It Matches | False Positive Risk |
|---|---|---|
| Social Security Number | US SSN in NNN-NN-NNNN format | Low. The hyphenated format is distinctive. |
| Credit Card Number | 13-16 digit sequences with optional spaces/hyphens | Medium. Long numeric sequences (order numbers, tracking IDs) can false-match. |
| IBAN | International bank account numbers (ISO 13616 format) | Low. The country code + check digit prefix is distinctive. |
Contact
| Pattern | What It Matches | False Positive Risk |
|---|---|---|
| Email Address | Standard [email protected] format | Low. The @ symbol structure is distinctive. |
| US Phone Number | US/Canadian formats: (NNN) NNN-NNNN, NNN-NNN-NNNN, +1 variants | Medium. 10-digit numbers with separators can match non-phone data. |
| International Phone | E.164-style numbers starting with + country code | Low. The + prefix is a strong signal. |
Technical
| Pattern | What It Matches | False Positive Risk |
|---|---|---|
| IP Address | IPv4 dotted decimal (NNN.NNN.NNN.NNN) | Low-Medium. Software version numbers are the main false-positive source. |
Identity
| Pattern | What It Matches | False Positive Risk |
|---|---|---|
| Date of Birth | US date format MM/DD/YYYY or MM-DD-YYYY | High. Matches any US-formatted date. Best paired with field-level targeting. |
DQS uses regex-only pattern matching. Detection is format-based, not contextual. There is no checksum validation (Luhn for credit cards, modulo-97 for IBAN), no keyword proximity boosting, and no ML-based confidence scoring. Every match is binary: the pattern matched or it didn’t. This makes detection fully auditable and deterministic, but you need to review matches on fields with high false-positive risk.
Regulatory Coverage
All 8 patterns are grounded in major privacy and security frameworks.
| Pattern | NIST 800-122 | GDPR | CCPA | PCI DSS | HIPAA | ISO 27701 |
|---|---|---|---|---|---|---|
| SSN | X | X | X | X | X | |
| Credit Card | X | X | X | X | X | |
| X | X | X | X | X | ||
| US Phone | X | X | X | X | ||
| Intl Phone | X | X | X | X | ||
| IP Address | X | X | X | X | ||
| IBAN | X | X | ||||
| Date of Birth | X | X | X | X | X |
These are the same identifier types detected as built-in patterns by Google Cloud DLP, AWS Macie, and Microsoft Purview. The difference: cloud DLP tools use multi-layered detection (regex + checksum + keyword proximity + ML). DQS uses regex-only matching, which is simpler and fully transparent but does not provide confidence scoring.
Three Detection Presets
Presets configure which patterns are active in a single click.
| Preset | Patterns | Count | When to Use |
|---|---|---|---|
| Standard | SSN, Credit Card, Email, US Phone | 4 | General PII audit. Covers the four most common types with manageable false-positive rates. This is the default. |
| Critical | SSN, Credit Card | 2 | Financial compliance check. Minimum scan for identity theft and payment card exposure. Use when you need fast results with near-zero false positives. |
| Extended | All 8 patterns | 8 | Full scan. Includes IBAN, IP Address, Date of Birth, and International Phone. Higher false-positive rate in exchange for maximum coverage. Best for first-time audits and compliance assessments. |
You can also add custom regex patterns beyond the 8 predefined ones. Custom patterns are validated server-side before they can be saved. Any valid regex works.
Metric Reference
Foundation Metrics
| Metric | Type | What It Returns |
|---|---|---|
| Records with PII | Count (integer) | Number of records where at least one pattern matched. A record is counted once regardless of how many patterns matched or how many matches exist within it. |
Advanced Metrics
| Metric | Type | What It Returns |
|---|---|---|
| PII Exposure Rate | Percentage | Percentage of scanned records containing PII matches. This is the headline exposure number for reports and dashboards. |
Field Type Coverage
| Metric | String | TextArea | Phone | EncryptedString | LongTextArea | Html | |
|---|---|---|---|---|---|---|---|
| Records with PII | X | X | X | X | X | ||
| PII Exposure Rate | X | X | X |
Records with PII casts a wide net across all text field types. PII Exposure Rate focuses on longer text fields where PII density is meaningful. A 255-character String field matching an email regex is a single data point. A 32,000-character LongTextArea with 15 SSN matches tells a different story.
Two Analysis Modes
DQS runs PII Detection in two modes.
PII Scan processes all selected fields using the configured patterns and returns Records with PII. This mode answers: “Do I have a PII problem?” Use it for quick audits before data migrations or AI projects.
PII Detection Analysis adds PII Exposure Rate on top of Records with PII. The exposure rate gives context to the raw count, turning “847 records contain PII” into “12.3% of your dataset is exposed.” Use this mode for compliance reporting and ongoing governance.
Configuring PII Detection
| Input | What It Controls |
|---|---|
| Detection Patterns | Which of the 8 predefined patterns are active. Pick a preset or toggle individual patterns. |
| Custom Patterns | Any valid regex pattern, validated server-side. Added alongside predefined patterns. |
| Per-Field Overrides | Different pattern sets for different fields. Override the global configuration on a field-by-field basis. |
Choosing Patterns by Field Type
Different fields need different pattern sets. An Email field already contains email addresses by design. Scanning it for email patterns produces 100% matches, which is expected, not a problem. A Case Description field is free text where any PII type can appear. Configure patterns based on what you expect to find vs. what signals a problem.
Example configurations:
- Email fields: Scan for SSN and Credit Card only (email matches are expected)
- Description and Notes fields: Use Standard or Extended preset (free text can contain anything)
- Short text fields (Subject, Title): Use Critical preset only (low tolerance for false positives)
Common Issues
| Issue | Cause | Fix |
|---|---|---|
| 100% PII match rate on Email field | Email pattern matches the field’s intended content | Remove the email pattern from that field’s override, or exclude the field from PII scanning |
| High false positives on Date of Birth | The DOB pattern matches any US-formatted date (meeting dates, deadlines) | Use field-level overrides to apply the DOB pattern only on fields where birth dates are a known risk |
| No matches found despite known PII | SSN regex only matches hyphenated format (NNN-NN-NNNN), not 9 consecutive digits | Add a custom pattern for the specific format in your data. Example: \b\d{9}\b for unformatted SSNs (high false-positive risk) |
Best Practices
-
Start with the Standard preset on free-text fields. Run an initial scan to understand your baseline before expanding to Extended.
-
Use field-level overrides to tune detection per field. Global patterns cast a wide net. Per-field overrides eliminate noise.
-
Scan unstructured text fields first. Description, Comments, and Notes fields are where PII accumulates through copy-paste and email-to-case. Structured fields (Email, Phone) contain PII by design.
-
Review matches on high-FP patterns like Date of Birth before treating them as confirmed PII. These patterns produce more false positives than SSN or Email.
-
Pair Records with PII (absolute count) with PII Exposure Rate (percentage) for a complete picture. The count scopes your cleanup effort. The rate tells you whether it’s a systemic problem or isolated incidents.
Next Steps
- Data Quality in Salesforce: how PII detection fits the bigger picture
- Agentforce Preparation: complete deployment readiness guide