Skip to content

RegexRedactionMutilator SQL LIKE pre-filter doesn't work with actual regex patterns #93

@jas88

Description

@jas88

Summary

The RegexRedactionMutilator uses a SQL LIKE clause as a pre-filter to find candidate
rows for redaction, but LIKE doesn't support regex syntax. This causes regex patterns
to be treated as literal strings, potentially missing sensitive data that should be
redacted.

Details

In RegexRedactionMutilator.cs:102:
WHERE {column.GetRuntimeName()} LIKE '%{RedactionConfiguration.RegexPattern}%' COLLATE Latin1_General_BIN

And in ExecuteCommandPerformRegexRedactionOnCatalogue.cs:83:
qb.AddCustomLine($"{columnInfo.GetRuntimeName()} LIKE '%{_redactionConfiguration.RegexPattern}%' COLLATE Latin1_General_BIN", QueryComponent.WHERE);

The actual regex matching happens correctly in C# (RegexRedactionHelper.cs:79):
var matches = Regex.Matches(value, _redactionConfiguration.RegexPattern);

However, the SQL pre-filter will fail to fetch rows that should be processed.

Example

If RegexPattern = "\d{3}-\d{4}" (to match phone numbers like 555-1234):

  • The LIKE clause searches for literal string \d{3}-\d{4}
  • No rows match (unless someone literally typed \d{3}-\d{4})
  • Actual phone numbers are never fetched for regex processing
  • Sensitive data is not redacted

Impact

Security risk - Regex patterns won't find all matches in the database, so sensitive
data that should be redacted will be missed. The feature only works correctly when
RegexPattern contains a simple literal string with no regex metacharacters.

Potential solutions

  1. Document limitation - Clarify that RegexPattern should be a literal substring, not
    a regex (rename property to avoid confusion)
  2. Remove pre-filter - Fetch all rows and let C# do the filtering (performance impact
    for large tables)
  3. Use database-native regex - SQL Server doesn't have native regex, but could use
    CLR functions or evaluate in batches
  4. Smarter pre-filter extraction - Parse the regex and extract literal portions for
    LIKE (e.g., pass.*word → LIKE '%pass%' AND column LIKE '%word%')

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions