CWE-185: Incorrect Regular Expression
Overview
Incorrect regular expressions occur when regex patterns contain logical flaws, catastrophic backtracking paths (ReDoS), missing anchors, or unescaped metacharacters, leading to bypassed validation, denial of service, or unexpected matches that enable injection attacks.
Risk
Medium-High: Incorrect regex enables validation bypass (security control failure), Regular Expression Denial of Service (ReDoS causing CPU exhaustion), unintended matches (allowing malicious input), and application hangs. ReDoS attacks can cause 30-second to infinite execution times with carefully crafted input.
Remediation Strategy
Avoid Catastrophic Backtracking (ReDoS Prevention)
- Avoid nested quantifiers:
(a+)+or(a*)* - Avoid overlapping alternatives:
(a|a)*or(a|ab)* - Use possessive/atomic groups when available
- Implement regex timeouts
Use Anchors and Proper Escaping
- Always use
^and$to match entire string - Escape metacharacters:
.*+?^$\|()[]{} - Use \Q...\E for literal string matching
- Test regex against malicious inputs
Keep Patterns Simple and Testable
- Prefer multiple simple checks over complex regex
- Use regex only for format validation
- Validate semantics separately
- Document and test regex thoroughly
Implement Timeouts and Limits
- Set regex execution timeout (e.g., 100ms)
- Limit input length before regex
- Monitor regex performance
- Use linear-time regex engines when available
Remediation Steps
Core principle: Use correct, tested regular expressions and anchor/limit them; do not rely on flawed regex for security decisions.
Locate the vulnerability in the security findings
- Review the flaw details to identify the specific file, line number, and code pattern
- Understand the data flow from source to sink (if applicable)
Apply the remediation strategy
- Implement the primary defense from the Remediation Strategy section above
- Follow the secure code examples provided
- Consider applying multiple layers of defense
Test the fix
- Verify the specific input that triggered the finding no longer causes the vulnerability
- Test with additional edge cases and malicious inputs
- Ensure legitimate functionality still works correctly
Re-scan with the security scanner
- Re-scan the fixed code to confirm the issue is resolved
- Confirm the finding is resolved
- Check for any new findings introduced by the changes
Common Vulnerable Patterns
Nested Quantifiers (ReDoS)
import re
# DANGEROUS: catastrophic backtracking
def validate_input(text):
# Attack: "aaaaaaaaaaaaaaaaaaaaaaaa!"
# This regex has exponential time complexity
pattern = r'^(a+)+$'
# For 20 'a's followed by '!', takes ~1 second
# For 25 'a's followed by '!', takes ~30 seconds
# For 30 'a's followed by '!', effectively hangs
if re.match(pattern, text):
return True
return False
Overlapping Quantifiers in Email Validation
import re
# DANGEROUS: nested quantifiers
def validate_email_bad(email):
# Attack: "a" * 50 + "@"
pattern = r'^([a-zA-Z0-9._%+-]+)*@([a-zA-Z0-9.-]+)*$'
return re.match(pattern, email) is not None
Overlapping Alternatives
import re
# DANGEROUS: overlapping alternatives
def validate_alphanumeric(text):
# Attack: "a" * 50 + "!"
pattern = r'^([a-zA-Z]+|[a-zA-Z0-9]+)*$'
return re.match(pattern, text) is not None
Missing Anchors in Filename Validation
// DANGEROUS: no anchors - matches substring
function validateFilename(filename) {
// Attack: "malicious.exe.jpg" - matches .jpg substring
const pattern = /\.(jpg|png|gif)/;
return pattern.test(filename); // WRONG: matches anywhere!
}
Unescaped Dot Metacharacter
// DANGEROUS: unescaped dot
function validateIP(ip) {
// Attack: "192X168X1X1" - dot matches any character
const pattern = /^\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}$/;
return pattern.test(ip);
}
Secure Patterns
Single Quantifiers (ReDoS Prevention)
import re
# Correct: no nested quantifiers
def validate_input_safe(text):
# Use single quantifier
pattern = r'^a+$'
return re.match(pattern, text) is not None
Why this works: Avoiding nested quantifiers prevents exponential backtracking. Single quantifiers have linear time complexity, ensuring the regex completes quickly regardless of input length.
Simplified Email Pattern with Length Check
import re
# Correct: simplified email pattern
def validate_email_safe(email):
if len(email) > 254: # Pre-validate length
return False
# Simple pattern without nested quantifiers
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
return re.match(pattern, email) is not None
Why this works: Pre-validating input length prevents processing excessively long inputs. The simplified pattern without nested quantifiers ensures linear time complexity and prevents ReDoS attacks.
Regex with Timeout Wrapper
import re
from timeout_decorator import timeout
# Correct: use timeout wrapper
@timeout(0.1) # 100ms timeout
def validate_with_timeout(text, pattern):
return re.match(pattern, text) is not None
Why this works: Timeout wrappers terminate regex execution if it takes too long, preventing ReDoS attacks from causing indefinite hangs even if the pattern has backtracking issues.
Linear-Time Regex Engine (re2)
import re2
# Correct: use re2 library (linear time)
def validate_safe_engine(text):
# re2 doesn't support backtracking - always linear time
pattern = r'^[a-zA-Z0-9]+$'
return re2.match(pattern, text) is not None
Why this works: The re2 library doesn't support backtracking, ensuring all regex operations complete in linear time regardless of pattern complexity. This makes ReDoS attacks impossible.
Anchored Filename Validation
function validateFilename(filename) {
// Correct: anchored pattern, must END with extension
const pattern = /^[a-zA-Z0-9_-]+\.(jpg|png|gif)$/i;
if (filename.length > 255) {
return false;
}
return pattern.test(filename);
}
Why this works: Anchoring the pattern with ^ and $ ensures it matches the entire filename, not just a substring. This prevents attacks like "malicious.exe.jpg" from passing validation.
Escaped Metacharacters with Semantic Validation
function validateIP(ip) {
// Correct: escaped dots, proper anchors
const pattern = /^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$/;
if (!pattern.test(ip)) {
return false;
}
// Additional semantic validation
const parts = ip.split('.');
return parts.every(part => {
const num = parseInt(part, 10);
return num >= 0 && num <= 255;
});
}
Why this works: Escaping the dot metacharacter (\.) ensures it matches only literal periods, not any character. Semantic validation ensures each octet is within the valid 0-255 range, preventing inputs like "999.999.999.999" from passing.
Java Regex with Timeout
import java.util.regex.*;
import java.util.concurrent.*;
public class SafeRegex {
private static final int TIMEOUT_MS = 100;
public boolean validateWithTimeout(String input, String regex) {
// Pre-validate input length
if (input.length() > 10000) {
return false;
}
ExecutorService executor = Executors.newSingleThreadExecutor();
try {
Future<Boolean> future = executor.submit(() -> {
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
return matcher.matches();
});
return future.get(TIMEOUT_MS, TimeUnit.MILLISECONDS);
} catch (TimeoutException e) {
// Regex took too long - potential ReDoS
return false;
} catch (Exception e) {
return false;
} finally {
executor.shutdownNow();
}
}
// Safer: use simple patterns
public boolean validateEmail(String email) {
if (email == null || email.length() > 254) {
return false;
}
// Simple, non-backtracking pattern
String pattern = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$";
return email.matches(pattern);
}
}
Why this works: Setting timeouts prevents regex execution from running indefinitely. Using ExecutorService allows terminating runaway regex operations. Pre-validating input length and using simple patterns without nested quantifiers provides defense in depth.
.NET Regex with Timeout Configuration
using System;
using System.Text.RegularExpressions;
public class SafeRegexValidator
{
private static readonly TimeSpan Timeout = TimeSpan.FromMilliseconds(100);
public bool ValidateWithTimeout(string input, string pattern)
{
if (string.IsNullOrEmpty(input) || input.Length > 10000)
{
return false;
}
try
{
var regex = new Regex(pattern, RegexOptions.None, Timeout);
return regex.IsMatch(input);
}
catch (RegexMatchTimeoutException)
{
// Timeout - potential ReDoS
return false;
}
}
// Prefer simple, non-backtracking patterns
public bool ValidateUsername(string username)
{
if (string.IsNullOrEmpty(username) || username.Length > 20)
{
return false;
}
var pattern = @"^[a-zA-Z0-9_]{3,20}$";
var regex = new Regex(pattern, RegexOptions.None, Timeout);
return regex.IsMatch(username);
}
}
Why this works: Timeout configuration prevents ReDoS attacks by terminating regex execution after a specified duration. The RegexMatchTimeoutException catch block handles timeout events gracefully, returning false instead of hanging. Simple patterns avoid backtracking, ensuring predictable performance.