-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
The RegexRedactionMutilator uses a SQL LIKE clause as a pre-filter to find candidate
rows for redaction, but LIKE doesn't support regex syntax. This causes regex patterns
to be treated as literal strings, potentially missing sensitive data that should be
redacted.
Details
In RegexRedactionMutilator.cs:102:
WHERE {column.GetRuntimeName()} LIKE '%{RedactionConfiguration.RegexPattern}%' COLLATE Latin1_General_BIN
And in ExecuteCommandPerformRegexRedactionOnCatalogue.cs:83:
qb.AddCustomLine($"{columnInfo.GetRuntimeName()} LIKE '%{_redactionConfiguration.RegexPattern}%' COLLATE Latin1_General_BIN", QueryComponent.WHERE);
The actual regex matching happens correctly in C# (RegexRedactionHelper.cs:79):
var matches = Regex.Matches(value, _redactionConfiguration.RegexPattern);
However, the SQL pre-filter will fail to fetch rows that should be processed.
Example
If RegexPattern = "\d{3}-\d{4}" (to match phone numbers like 555-1234):
- The LIKE clause searches for literal string
\d{3}-\d{4} - No rows match (unless someone literally typed
\d{3}-\d{4}) - Actual phone numbers are never fetched for regex processing
- Sensitive data is not redacted
Impact
Security risk - Regex patterns won't find all matches in the database, so sensitive
data that should be redacted will be missed. The feature only works correctly when
RegexPattern contains a simple literal string with no regex metacharacters.
Potential solutions
- Document limitation - Clarify that RegexPattern should be a literal substring, not
a regex (rename property to avoid confusion) - Remove pre-filter - Fetch all rows and let C# do the filtering (performance impact
for large tables) - Use database-native regex - SQL Server doesn't have native regex, but could use
CLR functions or evaluate in batches - Smarter pre-filter extraction - Parse the regex and extract literal portions for
LIKE (e.g., pass.*word → LIKE '%pass%' AND column LIKE '%word%')