What is Validity?
Validity measures whether data values conform to expected formats and patterns. A value is valid when it matches the defined structure. A value is invalid when it breaks the format rules.
An email address is valid when it contains an ”@” symbol and a domain. A URL is valid when it starts with a protocol and contains a domain. A product code is valid when it has the exact character count your system requires.
DQS validates field values using regex (regular expression) patterns. You choose from built-in patterns for common formats like Email, URL, and Fixed Length, or write your own regex for any business-specific format.
Validity Rate = (Records Matching Pattern / Total Records) x 100
If 35,500 of 50,000 Contact records have an email address that matches the email format pattern, your Email validity rate is 71%. The remaining 29% contain values that fail the pattern check.
Validity vs Accuracy
Validity and accuracy are different concepts:
| Check | Valid? | Accurate? |
|---|---|---|
| [email protected] | Yes | Unknown without verification |
| john@company | No | N/A (format is wrong) |
| [email protected] | Yes | No (person left the company) |
| 555-123-4567 | Yes | Unknown without calling |
| 555-12-456 | No | N/A (wrong digit count) |
DQS measures validity because format checks can be automated. Accuracy requires external verification or human confirmation.
Valid data works in your systems even if it does not reflect reality. Invalid data breaks your systems regardless of its real-world truth. Focus on validity first. Address accuracy through verification processes.
Why Validity Matters
Invalid data causes failures across your entire stack. Bounced emails damage sender reputation. Malformed phone numbers waste dialer time. Broken URLs frustrate users and block enrichment tools.
APIs reject malformed data. When your integration sends an invalid email format to a marketing platform, the entire batch can fail. Salesforce flows that parse field values break when the format is unexpected.
AI models process text as-is. When a phone field contains “Phone: 555-1234” instead of a clean number, the model sees inconsistent patterns. Invalid formats reduce AI effectiveness and produce unreliable Agentforce outputs.
| System | Validity Impact |
|---|---|
| Email campaigns | Bounces damage sender reputation |
| Telephony | Invalid numbers waste dialer time |
| Web links | Broken URLs block enrichment and navigation |
| APIs | Malformed data causes sync failures |
| AI and Agentforce | Inconsistent formats reduce model accuracy |
How DQS Measures Validity
DQS produces 6 validity metrics organized around a diagnostic question: “Does the data match the pattern, and is there junk hiding in values that pass?”
Think of these metrics as a diagnostic flow. Each step reveals a deeper layer of the problem.
Step 1: Does It Match the Pattern?
Validity Rate is the headline metric. It calculates the percentage of records where the field value matches your configured pattern. This is the number you put on a dashboard.
You configure the Email pattern on the PersonEmail field for Contacts. Validity Rate comes back at 71%. That means 29% of email addresses fail the format check. They are missing the ”@” symbol, have no domain, or contain spaces. Every marketing campaign sent to those addresses bounces. Every automated workflow that triggers on email fails silently.
Valid Count tells you the absolute number. Of 50,000 Contacts, 35,800 have valid email addresses. That is your actual addressable audience for email campaigns, not the 50,000 in the system. Marketing can set realistic campaign projections instead of working from inflated numbers.
Step 2: What Is the Full Breakdown?
Rates tell you severity. Counts tell you workload. Two metrics complete the picture:
| Metric | What It Tells You |
|---|---|
| Invalid Rate | The negative framing of your validity score. “29% of our email addresses are structurally invalid” gets more attention in a board presentation than “71% are valid.” Same data, framed for action. |
| Invalid Count | The cleanup workload as a hard number. Your company is migrating to a new telephony system requiring E.164 format. Invalid Count on the Phone field: 23,400. That is the exact number of records that need reformatting before the migration can go live. |
Step 3: Is There Junk Beyond Format Errors?
A value can pass a format check and still be garbage. Your web-to-lead form requires a Company field. Validity Rate on Company is 98%, because almost everything passes a basic text pattern. But Noise Rate reveals 14% of those values are entries like “asdf”, “test”, “xxxxx”, or “na na na.” Format-valid, but completely useless for sales routing, enrichment, or segmentation.
Noisy Records Count gives you the cleanup scope. If Noise Rate is 14% on 50,000 records, that is 7,000 leads with garbage company names. Your ops team can build a cleanup queue, estimate hours, and decide whether to auto-delete or flag for manual review.
Two Categories of Failure
Validity metrics distinguish two fundamentally different problems:
| Problem | Metrics | Root Cause | Fix |
|---|---|---|---|
| Format errors | Validity Rate, Invalid Rate, Valid/Invalid Count | Human mistakes, integration bugs, missing validation rules | Clean the data: field validation rules, data transformation, enrichment |
| Noise and junk | Noise Rate, Noisy Records Count | Bots, forced form submissions, bulk imports with garbage defaults | Fix the source: CAPTCHA, required field redesign, record deletion |
The distinction matters because the fix is completely different. Format errors are remediated by cleaning the data. Noise is remediated by fixing the source that produces it.
Metric Reference
Foundation Metrics
These 2 metrics form the base of every validity analysis. They tell you the match rate and the number of records that pass.
| Metric | Type | What It Measures |
|---|---|---|
| Validity Rate | Percentage | Share of records matching the configured pattern |
| Valid Count | Count | Number of records matching the configured pattern |
Advanced Metrics
These 4 metrics go beyond “does it match?” to give the full breakdown, including noise detection. They require the Advanced Format Validation analysis mode.
| Metric | Type | What It Measures |
|---|---|---|
| Invalid Rate | Percentage | Share of records failing the configured pattern |
| Invalid Count | Count | Number of records failing the configured pattern |
| Noise Rate | Percentage | Share of records containing noise patterns (junk data) |
| Noisy Records Count | Count | Number of records containing noise patterns |
Why Rates and Counts Come in Pairs
Most metrics come as a rate (percentage) and a count (absolute number). This is intentional:
- Rates are for dashboards, executive reporting, and trend tracking. “Validity improved from 71% to 92% this quarter.”
- Counts are for project planning, workload estimation, and cleanup scoping. “We have 23,400 phone numbers to reformat.”
Use rates to communicate progress. Use counts to plan work.
Field Type Coverage
All 6 validity metrics share the same base field type support, with noise metrics limited to text fields.
| Metric | All 6 Field Types | String and TextArea Only |
|---|---|---|
| Validity Rate | X | |
| Valid Count | X | |
| Invalid Rate | X | |
| Invalid Count | X | |
| Noise Rate | X | |
| Noisy Records Count | X |
Pattern-based metrics (Validity Rate, Valid Count, Invalid Rate, Invalid Count) work on all 6 supported field types: String, TextArea, Email, Phone, URL, and Picklist.
Noise metrics (Noise Rate, Noisy Records Count) apply only to String and TextArea fields. Noise patterns like repeated characters and keyboard smash are free-text phenomena. A Picklist field with a valid picklist value cannot contain noise. Noise detection only makes sense on fields where users type free text.
Two Analysis Modes
DQS offers two validity analysis modes:
Format Validation answers the question: “Do field values match the expected pattern?” It produces the 2 foundation metrics and covers the essentials for a format compliance check or quick audit.
Advanced Format Validation goes deeper. It produces all 6 metrics, including the full valid/invalid breakdown and noise detection. Use this mode when you need to distinguish between format errors and junk data, or when you need precise counts for cleanup project planning.
| Business Need | Recommended Mode |
|---|---|
| Quick format compliance check | Format Validation |
| Compliance reporting or audit | Advanced (full valid/invalid breakdown for regulators) |
| Lead quality assessment | Advanced (Noise Rate catches junk that passes format checks) |
| Pre-migration data assessment | Advanced (full breakdown to scope remediation by category) |
| Ongoing data governance | Start with Format Validation, move to Advanced for noise detection |
Configuring Validity
Unlike completeness (which works automatically on any field), validity requires configuration. You must define what “valid” means for each field before DQS can check it. A validity scan without a pattern is meaningless: valid compared to what?
DQS provides 5 configuration inputs. Each can be set at the global level (applies to all fields) and overridden at the individual field level.
| Setting | What It Controls |
|---|---|
| Pattern Type | The format to validate against. Choose from Email, URL, Fixed Length, or Custom regex. Required: you must select a pattern type before running a scan. |
| Pattern / Fixed Length | The specific value for your chosen type. For Fixed Length, enter a character count (1 to 255). For Custom, enter a regex pattern. Email and URL use built-in patterns. |
| Custom Pattern | Your own regex when Pattern Type is set to Custom. DQS validates your regex before saving and blocks invalid expressions. |
| Include Blanks | When enabled, DQS counts blank values as invalid. When disabled (the default), blanks are excluded from evaluation entirely. |
| Case Sensitive | When enabled, pattern matching considers letter casing. When disabled (the default), matching is case-insensitive. |
Pattern Types
| Type | What It Validates | Example Pass | Example Fail |
|---|---|---|---|
| Standard email address format: [email protected] | [email protected] | user@domain, invalid-email | |
| URL | HTTP/HTTPS web addresses with valid domain | https://example.com | example.com, htp://site.com |
| Fixed Length | Exact character count (you define the number) | AAAAAAAAAA (10 chars, if length = 10) | SHORT (5 chars) |
| Custom | Any regex pattern you define | Depends on your pattern | Depends on your pattern |
Example: Your product codes follow the format “DQS-” followed by 6 digits. Set Pattern Type to Custom and enter the regex ^DQS-\d{6}$. DQS flags any product code that does not match this structure.
Noise Detection
Noise detection catches data that passes format checks but is still garbage. DQS uses two built-in heuristics to identify noisy values:
Heuristic 1: Consecutive identical characters. Three or more of the same character in a row. Values like “aaaa”, ”!!!”, ”---”, or “xxxxx” trigger this check. These typically come from keyboard holding, padding, or placeholder abuse.
Heuristic 2: Excessive special characters. More than 50% non-alphanumeric characters (excluding spaces). Values like ”!@#$%^” or ”***///---” trigger this check. These indicate keyboard smash, bot input, or deliberate junk entry.
| Heuristic | What It Catches | Example Noisy Values | Example Clean Values |
|---|---|---|---|
| 3+ consecutive identical characters | Padding, filler, keyboard holding | ”aaaa”, ”!!!”, ”---”, “xxxxx" | "Premium”, “DOT AB3 2024” |
| More than 50% special characters | Keyboard smash, bot input, junk | ”!@#$%^”, “***test”, ”//—//" | "[email protected]”, “O’Brien Inc” |
You can also define custom noise patterns using regex for org-specific junk that the built-in heuristics do not cover.
Tip: Noise detection is most valuable on free-text fields where users can type anything: Company, Description, Notes, and custom text fields. Run it on your web-to-lead fields first, where bot submissions and forced entries are most common.
Common Validity Issues
Invalid Email Addresses
Users enter emails without proper format. Missing ”@” symbols, missing domains, double dots, and typos are the most common problems.
| Issue | Example |
|---|---|
| Missing @ | john.company.com |
| Missing domain | john@ |
| Double dots | [email protected] |
| Typos | [email protected] |
Impact: Bounced emails, damaged sender score, lost communication.
Malformed Phone Numbers
Phone fields accept any text in Salesforce, leading to inconsistent and invalid formats.
| Issue | Example |
|---|---|
| Letters mixed in | 555-CALL-NOW |
| Wrong digit count | 555-12 |
| Extension in field | 555-1234 ext 5 |
| Country code confusion | 1-555-123-4567 vs 555-123-4567 |
Impact: Failed calls, wasted sales time, telephony sync errors.
Invalid URLs
Web address fields often contain partial or malformed values.
| Issue | Example |
|---|---|
| Missing protocol | www.company.com |
| Missing domain | https:// |
| Typos | htps://company.com |
| Social handles | @company (not a URL) |
Impact: Broken links, failed enrichment, navigation errors.
Best Practices
Validate at Entry
The best validity check happens at data entry. Use Salesforce validation rules to enforce formats before data enters your system.
// Example: Email format validation rule
NOT(ISBLANK(Email)) && NOT(REGEX(Email, "^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$"))
Standardize Formats Before Scanning
Choose one format for each field and enforce it. For phone numbers, E.164 (+15551234567) is the most universally accepted standard. For URLs, require the https:// protocol. Document your format decisions so the team knows the standard.
Set Thresholds by Field Priority
Different fields need different validity standards:
| Field | Suggested Threshold | Rationale |
|---|---|---|
| Primary Email | 95%+ | Critical for communication |
| Phone | 90%+ | Important but legacy data expected |
| Website | 85%+ | Often entered incompletely |
| Custom text codes | 98%+ | System-generated, expect high compliance |
Use Noise Detection on Free-Text Fields
Run noise detection on fields where users type free text: Company, Description, custom text fields, and any field populated by web forms. Noise Rate reveals problems that format validation misses.
Document Expected Formats
Create a data dictionary that specifies the expected format for each field, acceptable variations, and examples of valid and invalid values. Share this with your team and reference it during data cleanup projects.
Next Steps
You now understand how to validate data formats and detect noisy values. Continue learning about the next dimension:
- In Salesforce: Data Quality in Salesforce - enforce valid formats on Salesforce fields
- Next: Uniqueness - Detect and prevent duplicate records
- Previous: Completeness - Ensure required data is present
- Related: The Five Dimensions - Overview of all dimensions
- Action: AI Readiness Assessment - See your current validity scores