Skip to main content

Validity

All 6 validity metrics DQS measures, the diagnostic flow for finding format errors and noise, and how to configure pattern-based validation.

What is Validity?

Validity measures whether data values conform to expected formats and patterns. A value is valid when it matches the defined structure. A value is invalid when it breaks the format rules.

An email address is valid when it contains an ”@” symbol and a domain. A URL is valid when it starts with a protocol and contains a domain. A product code is valid when it has the exact character count your system requires.

DQS validates field values using regex (regular expression) patterns. You choose from built-in patterns for common formats like Email, URL, and Fixed Length, or write your own regex for any business-specific format.

Validity Rate = (Records Matching Pattern / Total Records) x 100

If 35,500 of 50,000 Contact records have an email address that matches the email format pattern, your Email validity rate is 71%. The remaining 29% contain values that fail the pattern check.

Validity vs Accuracy

Validity and accuracy are different concepts:

CheckValid?Accurate?
[email protected]YesUnknown without verification
john@companyNoN/A (format is wrong)
[email protected]YesNo (person left the company)
555-123-4567YesUnknown without calling
555-12-456NoN/A (wrong digit count)

DQS measures validity because format checks can be automated. Accuracy requires external verification or human confirmation.

Valid data works in your systems even if it does not reflect reality. Invalid data breaks your systems regardless of its real-world truth. Focus on validity first. Address accuracy through verification processes.

Why Validity Matters

Invalid data causes failures across your entire stack. Bounced emails damage sender reputation. Malformed phone numbers waste dialer time. Broken URLs frustrate users and block enrichment tools.

APIs reject malformed data. When your integration sends an invalid email format to a marketing platform, the entire batch can fail. Salesforce flows that parse field values break when the format is unexpected.

AI models process text as-is. When a phone field contains “Phone: 555-1234” instead of a clean number, the model sees inconsistent patterns. Invalid formats reduce AI effectiveness and produce unreliable Agentforce outputs.

SystemValidity Impact
Email campaignsBounces damage sender reputation
TelephonyInvalid numbers waste dialer time
Web linksBroken URLs block enrichment and navigation
APIsMalformed data causes sync failures
AI and AgentforceInconsistent formats reduce model accuracy

How DQS Measures Validity

DQS produces 6 validity metrics organized around a diagnostic question: “Does the data match the pattern, and is there junk hiding in values that pass?”

Think of these metrics as a diagnostic flow. Each step reveals a deeper layer of the problem.

Step 1: Does It Match the Pattern?

Validity Rate is the headline metric. It calculates the percentage of records where the field value matches your configured pattern. This is the number you put on a dashboard.

You configure the Email pattern on the PersonEmail field for Contacts. Validity Rate comes back at 71%. That means 29% of email addresses fail the format check. They are missing the ”@” symbol, have no domain, or contain spaces. Every marketing campaign sent to those addresses bounces. Every automated workflow that triggers on email fails silently.

Valid Count tells you the absolute number. Of 50,000 Contacts, 35,800 have valid email addresses. That is your actual addressable audience for email campaigns, not the 50,000 in the system. Marketing can set realistic campaign projections instead of working from inflated numbers.

Step 2: What Is the Full Breakdown?

Rates tell you severity. Counts tell you workload. Two metrics complete the picture:

MetricWhat It Tells You
Invalid RateThe negative framing of your validity score. “29% of our email addresses are structurally invalid” gets more attention in a board presentation than “71% are valid.” Same data, framed for action.
Invalid CountThe cleanup workload as a hard number. Your company is migrating to a new telephony system requiring E.164 format. Invalid Count on the Phone field: 23,400. That is the exact number of records that need reformatting before the migration can go live.

Step 3: Is There Junk Beyond Format Errors?

A value can pass a format check and still be garbage. Your web-to-lead form requires a Company field. Validity Rate on Company is 98%, because almost everything passes a basic text pattern. But Noise Rate reveals 14% of those values are entries like “asdf”, “test”, “xxxxx”, or “na na na.” Format-valid, but completely useless for sales routing, enrichment, or segmentation.

Noisy Records Count gives you the cleanup scope. If Noise Rate is 14% on 50,000 records, that is 7,000 leads with garbage company names. Your ops team can build a cleanup queue, estimate hours, and decide whether to auto-delete or flag for manual review.

Two Categories of Failure

Validity metrics distinguish two fundamentally different problems:

ProblemMetricsRoot CauseFix
Format errorsValidity Rate, Invalid Rate, Valid/Invalid CountHuman mistakes, integration bugs, missing validation rulesClean the data: field validation rules, data transformation, enrichment
Noise and junkNoise Rate, Noisy Records CountBots, forced form submissions, bulk imports with garbage defaultsFix the source: CAPTCHA, required field redesign, record deletion

The distinction matters because the fix is completely different. Format errors are remediated by cleaning the data. Noise is remediated by fixing the source that produces it.

Metric Reference

Foundation Metrics

These 2 metrics form the base of every validity analysis. They tell you the match rate and the number of records that pass.

MetricTypeWhat It Measures
Validity RatePercentageShare of records matching the configured pattern
Valid CountCountNumber of records matching the configured pattern

Advanced Metrics

These 4 metrics go beyond “does it match?” to give the full breakdown, including noise detection. They require the Advanced Format Validation analysis mode.

MetricTypeWhat It Measures
Invalid RatePercentageShare of records failing the configured pattern
Invalid CountCountNumber of records failing the configured pattern
Noise RatePercentageShare of records containing noise patterns (junk data)
Noisy Records CountCountNumber of records containing noise patterns

Why Rates and Counts Come in Pairs

Most metrics come as a rate (percentage) and a count (absolute number). This is intentional:

  • Rates are for dashboards, executive reporting, and trend tracking. “Validity improved from 71% to 92% this quarter.”
  • Counts are for project planning, workload estimation, and cleanup scoping. “We have 23,400 phone numbers to reformat.”

Use rates to communicate progress. Use counts to plan work.

Field Type Coverage

All 6 validity metrics share the same base field type support, with noise metrics limited to text fields.

MetricAll 6 Field TypesString and TextArea Only
Validity RateX
Valid CountX
Invalid RateX
Invalid CountX
Noise RateX
Noisy Records CountX

Pattern-based metrics (Validity Rate, Valid Count, Invalid Rate, Invalid Count) work on all 6 supported field types: String, TextArea, Email, Phone, URL, and Picklist.

Noise metrics (Noise Rate, Noisy Records Count) apply only to String and TextArea fields. Noise patterns like repeated characters and keyboard smash are free-text phenomena. A Picklist field with a valid picklist value cannot contain noise. Noise detection only makes sense on fields where users type free text.

Two Analysis Modes

DQS offers two validity analysis modes:

Format Validation answers the question: “Do field values match the expected pattern?” It produces the 2 foundation metrics and covers the essentials for a format compliance check or quick audit.

Advanced Format Validation goes deeper. It produces all 6 metrics, including the full valid/invalid breakdown and noise detection. Use this mode when you need to distinguish between format errors and junk data, or when you need precise counts for cleanup project planning.

Business NeedRecommended Mode
Quick format compliance checkFormat Validation
Compliance reporting or auditAdvanced (full valid/invalid breakdown for regulators)
Lead quality assessmentAdvanced (Noise Rate catches junk that passes format checks)
Pre-migration data assessmentAdvanced (full breakdown to scope remediation by category)
Ongoing data governanceStart with Format Validation, move to Advanced for noise detection

Configuring Validity

Unlike completeness (which works automatically on any field), validity requires configuration. You must define what “valid” means for each field before DQS can check it. A validity scan without a pattern is meaningless: valid compared to what?

DQS provides 5 configuration inputs. Each can be set at the global level (applies to all fields) and overridden at the individual field level.

SettingWhat It Controls
Pattern TypeThe format to validate against. Choose from Email, URL, Fixed Length, or Custom regex. Required: you must select a pattern type before running a scan.
Pattern / Fixed LengthThe specific value for your chosen type. For Fixed Length, enter a character count (1 to 255). For Custom, enter a regex pattern. Email and URL use built-in patterns.
Custom PatternYour own regex when Pattern Type is set to Custom. DQS validates your regex before saving and blocks invalid expressions.
Include BlanksWhen enabled, DQS counts blank values as invalid. When disabled (the default), blanks are excluded from evaluation entirely.
Case SensitiveWhen enabled, pattern matching considers letter casing. When disabled (the default), matching is case-insensitive.

Pattern Types

TypeWhat It ValidatesExample PassExample Fail
EmailStandard email address format: [email protected][email protected]user@domain, invalid-email
URLHTTP/HTTPS web addresses with valid domainhttps://example.comexample.com, htp://site.com
Fixed LengthExact character count (you define the number)AAAAAAAAAA (10 chars, if length = 10)SHORT (5 chars)
CustomAny regex pattern you defineDepends on your patternDepends on your pattern

Example: Your product codes follow the format “DQS-” followed by 6 digits. Set Pattern Type to Custom and enter the regex ^DQS-\d{6}$. DQS flags any product code that does not match this structure.

Noise Detection

Noise detection catches data that passes format checks but is still garbage. DQS uses two built-in heuristics to identify noisy values:

Heuristic 1: Consecutive identical characters. Three or more of the same character in a row. Values like “aaaa”, ”!!!”, ”---”, or “xxxxx” trigger this check. These typically come from keyboard holding, padding, or placeholder abuse.

Heuristic 2: Excessive special characters. More than 50% non-alphanumeric characters (excluding spaces). Values like ”!@#$%^” or ”***///---” trigger this check. These indicate keyboard smash, bot input, or deliberate junk entry.

HeuristicWhat It CatchesExample Noisy ValuesExample Clean Values
3+ consecutive identical charactersPadding, filler, keyboard holding”aaaa”, ”!!!”, ”---”, “xxxxx""Premium”, “DOT AB3 2024”
More than 50% special charactersKeyboard smash, bot input, junk”!@#$%^”, “***test”, ”//—//""[email protected]”, “O’Brien Inc”

You can also define custom noise patterns using regex for org-specific junk that the built-in heuristics do not cover.

Tip: Noise detection is most valuable on free-text fields where users can type anything: Company, Description, Notes, and custom text fields. Run it on your web-to-lead fields first, where bot submissions and forced entries are most common.

Common Validity Issues

Invalid Email Addresses

Users enter emails without proper format. Missing ”@” symbols, missing domains, double dots, and typos are the most common problems.

IssueExample
Missing @john.company.com
Missing domainjohn@
Double dots[email protected]
Typos[email protected]

Impact: Bounced emails, damaged sender score, lost communication.

Malformed Phone Numbers

Phone fields accept any text in Salesforce, leading to inconsistent and invalid formats.

IssueExample
Letters mixed in555-CALL-NOW
Wrong digit count555-12
Extension in field555-1234 ext 5
Country code confusion1-555-123-4567 vs 555-123-4567

Impact: Failed calls, wasted sales time, telephony sync errors.

Invalid URLs

Web address fields often contain partial or malformed values.

IssueExample
Missing protocolwww.company.com
Missing domainhttps://
Typoshtps://company.com
Social handles@company (not a URL)

Impact: Broken links, failed enrichment, navigation errors.

Best Practices

Validate at Entry

The best validity check happens at data entry. Use Salesforce validation rules to enforce formats before data enters your system.

// Example: Email format validation rule
NOT(ISBLANK(Email)) && NOT(REGEX(Email, "^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$"))

Standardize Formats Before Scanning

Choose one format for each field and enforce it. For phone numbers, E.164 (+15551234567) is the most universally accepted standard. For URLs, require the https:// protocol. Document your format decisions so the team knows the standard.

Set Thresholds by Field Priority

Different fields need different validity standards:

FieldSuggested ThresholdRationale
Primary Email95%+Critical for communication
Phone90%+Important but legacy data expected
Website85%+Often entered incompletely
Custom text codes98%+System-generated, expect high compliance

Use Noise Detection on Free-Text Fields

Run noise detection on fields where users type free text: Company, Description, custom text fields, and any field populated by web forms. Noise Rate reveals problems that format validation misses.

Document Expected Formats

Create a data dictionary that specifies the expected format for each field, acceptable variations, and examples of valid and invalid values. Share this with your team and reference it during data cleanup projects.

Next Steps

You now understand how to validate data formats and detect noisy values. Continue learning about the next dimension: