CWE-183: Permissive List of Allowed Inputs
Overview
Permissive allowlists occur when input validation accepts too broad a range of values, allowing attackers to bypass security controls with edge cases, encoding variations, or unexpected but technically valid inputs that enable injection, traversal, or logic bypass attacks.
OWASP Classification
A06:2025 - Insecure Design
Risk
Medium-High: Overly permissive allowlists enable attackers to find valid-but-malicious inputs that bypass intended restrictions, leading to injection attacks, path traversal, authentication bypass, business logic flaws, and data exfiltration.
Remediation Steps
Core principle: Use strict allowlists with default-deny; permissive allowed-input lists are security bugs.
Locate Permissive Validation in Your Code
When reviewing security scan results:
- Find the validation logic: Identify where input is validated (regex patterns, allowlists, format checks)
- Check the allowlist: Review what values/patterns are accepted
- Identify permissiveness: Look for overly broad patterns (.*, \w+, unanchored regex)
- Trace the data flow: Understand how validated data is used (file paths, SQL queries, commands, URLs)
- Check for missing validation: Identify inputs that should be validated but aren't
Common permissive patterns:
# Too broad - accepts anything
if re.match(r'\w+', user_input): # Accepts letters, numbers, underscore - too wide
# Unanchored - allows prefix/suffix attacks
if re.match(r'\.jpg', filename): # Matches "malware.exe.jpg"
# No length limit
if re.match(r'^[a-z]+$', username): # Could be 10,000 characters
# Accepts any valid format without semantic checks
if is_valid_url(redirect): # Accepts javascript:, file:, data: schemes
Use Strict, Minimal Allowlists (Primary Defense)
import re
# VULNERABLE - permissive regex without anchors
def validate_username_bad(username):
# Attack: "admin<script>alert(1)</script>"
# This matches "admin" prefix and allows suffix
if re.match(r'[a-zA-Z0-9_]+', username):
return True
return False
# SECURE - strict anchored regex with length limits
def validate_username_good(username):
# Anchored with ^$ - must match entire string
# Specific character set, explicit length limits
if re.match(r'^[a-zA-Z0-9_]{3,20}$', username):
return True
return False
# VULNERABLE - accepts any file extension
def validate_filename_bad(filename):
# Accepts "malware.exe.jpg"
if '.jpg' in filename or '.png' in filename:
return True
return False
# SECURE - strict extension at end only
def validate_filename_good(filename):
# Must end with allowed extension (anchored with $)
# No path separators, limited characters
if re.match(r'^[a-zA-Z0-9_-]{1,50}\.(jpg|png|gif)$', filename.lower()):
return True
return False
# VULNERABLE - accepts any URL scheme
def validate_redirect_bad(url):
# Accepts javascript:, file:, data: schemes
if url.startswith('http'):
return True
return False
# SECURE - explicit protocol allowlist
def validate_redirect_good(url):
from urllib.parse import urlparse
# Parse URL
parsed = urlparse(url)
# Strict allowlist: only http or https
if parsed.scheme not in ['http', 'https']:
return False
# Additional checks
if not parsed.netloc: # Must have domain
return False
# Reject private IPs, localhost
if parsed.netloc in ['localhost', '127.0.0.1', '0.0.0.0']:
return False
return True
Why this works: Strict allowlists define exactly what's permitted, rejecting all other inputs including edge cases, encoding variations, and unexpected-but-valid formats that could bypass loose validation.
Key principles:
- Anchor regex with ^$: Match entire string, not substring
- Narrow character sets: [a-z0-9] not [\w\s.] or .*
- Explicit length limits: {3,20} prevents DoS and overflow
- Exact matches for enums: Use sets/constants, not patterns
- Reject unexpected: If not explicitly allowed, reject it
Validate Format AND Semantics
import re
from pathlib import Path
# Format validation alone is insufficient
def validate_age_bad(age_str):
# Only checks format (numeric)
if re.match(r'^\d+$', age_str):
return True # Accepts 0, 999, 10000
return False
# SECURE - format + semantic validation
def validate_age_good(age_str):
# 1. Format validation
if not re.match(r'^\d{1,3}$', age_str):
return False
# 2. Semantic validation (business logic)
age = int(age_str)
if age < 0 or age > 150:
return False # Realistic age range
return True
# Filename validation with semantic checks
def validate_uploaded_file(filename, file_data):
# 1. Format validation - allowed characters and extension
if not re.match(r'^[a-zA-Z0-9_-]{1,50}\.(jpg|png|gif)$', filename.lower()):
return False
# 2. Semantic validation - check actual file type
if not is_valid_image(file_data): # Check magic bytes
return False
# 3. Size validation
if len(file_data) > 5 * 1024 * 1024: # 5MB max
return False
return True
# Path validation with canonicalization
def validate_file_path(user_path, base_dir):
# 1. Format validation - no obvious path traversal
if '..' in user_path or user_path.startswith('/'):
return False
# 2. Canonical path resolution
full_path = Path(base_dir) / user_path
try:
canonical = full_path.resolve()
except Exception:
return False
# 3. Semantic check - must be within base directory
if not str(canonical).startswith(str(Path(base_dir).resolve())):
return False
return True
# Email validation - format + domain checks
# Full RFC5322 compliance can be much more complex.
def validate_email(email):
# 1. Format validation
if not re.match(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$', email):
return False
# 2. Length validation
if len(email) > 254: # RFC 5321
return False
# 3. Semantic validation - check domain
domain = email.split('@')[1]
# Reject disposable email domains
if domain in DISPOSABLE_DOMAINS:
return False
# Could add: DNS MX record check, domain reputation
return True
Avoid Regex Pitfalls
import re
# PITFALL 1: Unanchored regex
def validate_code_bad(code):
# Matches "ADMIN" anywhere in string
# Attack: "xxxADMINyyy" passes
if re.match(r'[A-Z]{5}', code):
return True
return False
def validate_code_good(code):
# Anchored - must be exactly 5 uppercase letters
if re.match(r'^[A-Z]{5}$', code):
return True
return False
# PITFALL 2: Overly broad character classes
def validate_name_bad(name):
# \w includes letters, numbers, underscore
# \s includes all whitespace (tabs, newlines)
# Attack: "Alice\n<script>alert(1)</script>"
if re.match(r'^[\w\s.]+$', name):
return True
return False
def validate_name_good(name):
# Specific characters: letters, space, hyphen, apostrophe only
# Limited length
if re.match(r"^[a-zA-Z][a-zA-Z '-]{0,49}$", name):
return True
return False
# PITFALL 3: Case-insensitive when case matters
def validate_file_ext_bad(filename):
# Case-insensitive - accepts "file.PHP", "file.PhP"
if re.match(r'^[a-z0-9_]+\.(jpg|png)$', filename, re.IGNORECASE):
return True
return False
def validate_file_ext_good(filename):
# Normalize to lowercase first
filename_lower = filename.lower()
if re.match(r'^[a-z0-9_]+\.(jpg|png)$', filename_lower):
return True
return False
# PITFALL 4: Unescaped metacharacters
def validate_domain_bad(domain):
# Dot (.) matches any character, not literal dot
# Attack: "example@com" passes
if re.match(r'^[a-z]+.[a-z]+$', domain):
return True
return False
def validate_domain_good(domain):
# Escape dot with backslash: \.
if re.match(r'^[a-z]+\.[a-z]+$', domain):
return True
return False
# BEST: Use explicit validation, avoid complex regex
def validate_ip_address(ip):
# Instead of complex regex, parse and validate
parts = ip.split('.')
if len(parts) != 4:
return False
for part in parts:
if not part.isdigit():
return False
num = int(part)
if num < 0 or num > 255:
return False
return True
Use Enums and Constants for Known Values
from enum import Enum
# VULNERABLE - accepts any string matching pattern
def set_user_role_bad(role):
# Attack: "administrator" (typo), "ADMIN", "admin "
if re.match(r'^[a-z]+$', role):
user.role = role # Could be any lowercase string
return True
return False
# SECURE - use enum for known values
class UserRole(Enum):
ADMIN = 'admin'
USER = 'user'
GUEST = 'guest'
def set_user_role_good(role_str):
# Convert to lowercase and check against enum
try:
role = UserRole(role_str.lower())
user.role = role
return True
except ValueError:
return False # Not a valid role
# SECURE - use set for allowlist
ALLOWED_COUNTRIES = {'US', 'CA', 'GB', 'FR', 'DE', 'JP', 'AU'}
def validate_country_code(code):
# Exact match against set - no pattern matching
if code.upper() in ALLOWED_COUNTRIES:
return True
return False
# SECURE - use constants for file types
ALLOWED_EXTENSIONS = {'.jpg', '.png', '.gif', '.pdf', '.doc', '.docx'}
def validate_file_type(filename):
# Extract extension
ext = filename.lower().split('.')[-1] if '.' in filename else ''
ext_with_dot = f'.{ext}'
# Exact match against set
if ext_with_dot in ALLOWED_EXTENSIONS:
return True
return False
# SECURE - validate HTTP methods
ALLOWED_METHODS = {'GET', 'POST', 'PUT', 'DELETE', 'PATCH'}
def validate_http_method(method):
return method.upper() in ALLOWED_METHODS
Test with Bypass Payloads
import re
def test_validation_bypasses():
# Test your validation function
def validate(input_str):
return re.match(r'^[a-zA-Z0-9]{3,20}$', input_str) is not None
# Valid inputs - should pass
assert validate('alice') == True
assert validate('user123') == True
assert validate('ABC') == True
print("✓ Valid inputs pass")
# Unanchored regex bypass - should fail
assert validate('admin<script>') == False # Suffix attack
assert validate('<script>admin') == False # Prefix attack
print("✓ Blocks unanchored bypass")
# Special characters - should fail
assert validate('user@domain') == False
assert validate('user;DROP') == False
assert validate('user\\nmalicious') == False
assert validate('user/../etc') == False
print("✓ Blocks special characters")
# Length violations - should fail
assert validate('ab') == False # Too short
assert validate('a' * 21) == False # Too long
print("✓ Enforces length limits")
# Encoding variations - should fail
assert validate('user%20name') == False
assert validate('user\\x00') == False
print("✓ Blocks encoding attacks")
# Case variations (depends on your rules)
# If case-sensitive:
assert validate('Alice') == True
# If case-insensitive, normalize first
print("✓ Handles case correctly")
print("All validation bypass tests passed!")
# Test file extension validation
def test_file_extension():
def validate(filename):
return re.match(r'^[a-zA-Z0-9_-]+\.(jpg|png|gif)$', filename.lower()) is not None
# Valid extensions - should pass
assert validate('photo.jpg') == True
assert validate('image.png') == True
print("✓ Accepts valid extensions")
# Double extension attacks - should fail
assert validate('malware.exe.jpg') == False
assert validate('file.php.png') == False
print("✓ Blocks double extensions")
# No extension - should fail
assert validate('filename') == False
print("✓ Requires extension")
# Path traversal - should fail
assert validate('../../../etc/passwd.jpg') == False
assert validate('..\\..\\file.png') == False
print("✓ Blocks path traversal")
print("All file extension tests passed!")
if __name__ == '__main__':
test_validation_bypasses()
test_file_extension()
Common Vulnerable Patterns
Unanchored Regular Expressions
- Regex without start (^) and end ($) anchors
- Matches substring instead of entire string
- Example:
pattern.match(input)instead ofpattern.fullmatch(input) - Attack: Valid prefix + malicious suffix bypasses check
Overly Broad Character Classes
- Using
.*,.+,\w+without constraints - Accepting any whitespace or special characters
- No length limits
- Attack: Injection of unexpected but valid characters
Extension/Format Checking Without Position Validation
- Checking if extension exists anywhere in string
- Not validating extension is at the end
- Example: Checking
.jpgexists vs. ending with.jpg - Attack:
malware.exe.jpgorfile.jpg.php
Path/Filename Validation Without Semantic Checks
- Accepting valid characters but not checking path traversal
- No verification of canonical path
- Allowing
..,/, or absolute paths - Attack:
../../../etc/passwd
Protocol/Scheme Validation Without Allowlist
- Accepting any protocol format
- Not restricting to safe protocols (http, https)
- Attack:
javascript:,data:,file:protocols
Secure Patterns
Strict Anchored Regular Expressions
- Always use start and end anchors:
^pattern$ - Validate entire string, not substrings
- Set maximum length limits before regex
- Use language-specific exact match functions
Why this works: Anchoring regex with ^ and $ ensures the pattern matches the entire input string, preventing attacks that add malicious prefixes or suffixes to valid data.
Minimal Character Sets with Constraints
- Define narrowest acceptable character set
- Add explicit length limits (min and max)
- Use specific character classes, not broad wildcards
- Apply case sensitivity rules
Why this works: Limiting character sets to only what's needed and enforcing length constraints prevents injection of special characters and DoS attacks while ensuring input meets business requirements.
Extension Validation at End of String
- Use anchored regex:
^[allowed_chars]+\.(ext1|ext2|ext3)$ - Validate extension position explicitly
- Check for double extensions
- Normalize to lowercase before checking
Why this works: Anchoring extension checks to the end of the string prevents double-extension attacks (e.g., malware.exe.jpg) and ensures only truly allowed file types are accepted.
Path Validation with Canonicalization
- Use allowlist of specific filenames when possible
- Resolve to canonical/absolute path
- Verify resolved path starts with allowed base directory
- Reject any
.., absolute paths, or symbolic links
Why this works: Canonicalization resolves all path references (., .., symlinks) to their true location, allowing you to verify the final path is within allowed boundaries and preventing path traversal attacks.
Protocol Allowlist with Additional Validation
- Explicitly allowlist protocols (http, https only)
- Parse URL/URI using standard library
- Validate each component (scheme, host, port, path)
- Add semantic checks (no private IPs, no localhost)
Why this works: Restricting to safe protocols (http/https) and validating all URL components prevents protocol-based attacks like javascript:, data:, and file: while blocking requests to private infrastructure.
Enum/Constant-Based Validation
- Use sets, enums, or constants for known values
- Exact string matching only
- No pattern matching for enumerated values
- Normalize (lowercase) before comparison
Why this works: Using exact matching against a predefined set of values eliminates pattern-matching bypass attempts and ensures only explicitly allowed values are accepted.
Multi-Layer Validation
- Format validation (syntax)
- Semantic validation (business rules)
- Range validation (min/max, boundaries)
- Relationship validation (cross-field checks)
Why this works: Multiple layers of validation ensure that input is both structurally correct and semantically valid, catching bypass attempts that might satisfy one check but not another.
Language-Specific Guidance
- Java - Pattern.matches with anchors, enums, jakarta.validation
- JavaScript/Node.js - Regex with ^ and $, validator.js, path validation
- Python - re.fullmatch, pathlib.Path.resolve, ipaddress module
Security Checklist
- All regex patterns use ^ and $ anchors
- Character classes are specific (not .*, \w+, .+)
- Length limits enforced (min and max)
- Enumerated values use sets/enums, not patterns
- Both format and semantic validation applied
- Path validation includes canonicalization
- URL validation checks scheme, domain, and context
- Tests cover: valid inputs, bypass attempts, encoding variations