CWE-183: Permissive List of Allowed Inputs

Overview

Permissive allowlists occur when input validation accepts too broad a range of values, allowing attackers to bypass security controls with edge cases, encoding variations, or unexpected but technically valid inputs that enable injection, traversal, or logic bypass attacks.

OWASP Classification

A06:2025 - Insecure Design

Risk

Medium-High: Overly permissive allowlists enable attackers to find valid-but-malicious inputs that bypass intended restrictions, leading to injection attacks, path traversal, authentication bypass, business logic flaws, and data exfiltration.

Remediation Steps

Core principle: Use strict allowlists with default-deny; permissive allowed-input lists are security bugs.

Locate Permissive Validation in Your Code

When reviewing security scan results:

Find the validation logic: Identify where input is validated (regex patterns, allowlists, format checks)
Check the allowlist: Review what values/patterns are accepted
Identify permissiveness: Look for overly broad patterns (.*, \w+, unanchored regex)
Trace the data flow: Understand how validated data is used (file paths, SQL queries, commands, URLs)
Check for missing validation: Identify inputs that should be validated but aren't

Common permissive patterns:

# Too broad - accepts anything
if re.match(r'\w+', user_input):  # Accepts letters, numbers, underscore - too wide

# Unanchored - allows prefix/suffix attacks
if re.match(r'\.jpg', filename):  # Matches "malware.exe.jpg"

# No length limit
if re.match(r'^[a-z]+$', username):  # Could be 10,000 characters

# Accepts any valid format without semantic checks
if is_valid_url(redirect):  # Accepts javascript:, file:, data: schemes

Use Strict, Minimal Allowlists (Primary Defense)

import re

# VULNERABLE - permissive regex without anchors
def validate_username_bad(username):
    # Attack: "admin<script>alert(1)</script>"
    # This matches "admin" prefix and allows suffix
    if re.match(r'[a-zA-Z0-9_]+', username):
        return True
    return False

# SECURE - strict anchored regex with length limits
def validate_username_good(username):
    # Anchored with ^$ - must match entire string
    # Specific character set, explicit length limits
    if re.match(r'^[a-zA-Z0-9_]{3,20}$', username):
        return True
    return False

# VULNERABLE - accepts any file extension
def validate_filename_bad(filename):
    # Accepts "malware.exe.jpg"
    if '.jpg' in filename or '.png' in filename:
        return True
    return False

# SECURE - strict extension at end only
def validate_filename_good(filename):
    # Must end with allowed extension (anchored with $)
    # No path separators, limited characters
    if re.match(r'^[a-zA-Z0-9_-]{1,50}\.(jpg|png|gif)$', filename.lower()):
        return True
    return False

# VULNERABLE - accepts any URL scheme
def validate_redirect_bad(url):
    # Accepts javascript:, file:, data: schemes
    if url.startswith('http'):
        return True
    return False

# SECURE - explicit protocol allowlist
def validate_redirect_good(url):
    from urllib.parse import urlparse

    # Parse URL
    parsed = urlparse(url)

    # Strict allowlist: only http or https
    if parsed.scheme not in ['http', 'https']:
        return False

    # Additional checks
    if not parsed.netloc:  # Must have domain
        return False

    # Reject private IPs, localhost
    if parsed.netloc in ['localhost', '127.0.0.1', '0.0.0.0']:
        return False

    return True

Why this works: Strict allowlists define exactly what's permitted, rejecting all other inputs including edge cases, encoding variations, and unexpected-but-valid formats that could bypass loose validation.

Key principles:

Anchor regex with ^$: Match entire string, not substring
Narrow character sets: [a-z0-9] not [\w\s.] or .*
Explicit length limits: {3,20} prevents DoS and overflow
Exact matches for enums: Use sets/constants, not patterns
Reject unexpected: If not explicitly allowed, reject it

Validate Format AND Semantics

import re
from pathlib import Path

# Format validation alone is insufficient
def validate_age_bad(age_str):
    # Only checks format (numeric)
    if re.match(r'^\d+$', age_str):
        return True  # Accepts 0, 999, 10000
    return False

# SECURE - format + semantic validation
def validate_age_good(age_str):
    # 1. Format validation
    if not re.match(r'^\d{1,3}$', age_str):
        return False

    # 2. Semantic validation (business logic)
    age = int(age_str)
    if age < 0 or age > 150:
        return False  # Realistic age range

    return True

# Filename validation with semantic checks
def validate_uploaded_file(filename, file_data):
    # 1. Format validation - allowed characters and extension
    if not re.match(r'^[a-zA-Z0-9_-]{1,50}\.(jpg|png|gif)$', filename.lower()):
        return False

    # 2. Semantic validation - check actual file type
    if not is_valid_image(file_data):  # Check magic bytes
        return False

    # 3. Size validation
    if len(file_data) > 5 * 1024 * 1024:  # 5MB max
        return False

    return True

# Path validation with canonicalization
def validate_file_path(user_path, base_dir):
    # 1. Format validation - no obvious path traversal
    if '..' in user_path or user_path.startswith('/'):
        return False

    # 2. Canonical path resolution
    full_path = Path(base_dir) / user_path
    try:
        canonical = full_path.resolve()
    except Exception:
        return False

    # 3. Semantic check - must be within base directory
    if not str(canonical).startswith(str(Path(base_dir).resolve())):
        return False

    return True

# Email validation - format + domain checks
# Full RFC5322 compliance can be much more complex.
def validate_email(email):
    # 1. Format validation
    if not re.match(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$', email):
        return False

    # 2. Length validation
    if len(email) > 254:  # RFC 5321
        return False

    # 3. Semantic validation - check domain
    domain = email.split('@')[1]

    # Reject disposable email domains
    if domain in DISPOSABLE_DOMAINS:
        return False

    # Could add: DNS MX record check, domain reputation

    return True

Avoid Regex Pitfalls

import re

# PITFALL 1: Unanchored regex
def validate_code_bad(code):
    # Matches "ADMIN" anywhere in string
    # Attack: "xxxADMINyyy" passes
    if re.match(r'[A-Z]{5}', code):
        return True
    return False

def validate_code_good(code):
    # Anchored - must be exactly 5 uppercase letters
    if re.match(r'^[A-Z]{5}$', code):
        return True
    return False

# PITFALL 2: Overly broad character classes
def validate_name_bad(name):
    # \w includes letters, numbers, underscore
    # \s includes all whitespace (tabs, newlines)
    # Attack: "Alice\n<script>alert(1)</script>"
    if re.match(r'^[\w\s.]+$', name):
        return True
    return False

def validate_name_good(name):
    # Specific characters: letters, space, hyphen, apostrophe only
    # Limited length
    if re.match(r"^[a-zA-Z][a-zA-Z '-]{0,49}$", name):
        return True
    return False

# PITFALL 3: Case-insensitive when case matters
def validate_file_ext_bad(filename):
    # Case-insensitive - accepts "file.PHP", "file.PhP"
    if re.match(r'^[a-z0-9_]+\.(jpg|png)$', filename, re.IGNORECASE):
        return True
    return False

def validate_file_ext_good(filename):
    # Normalize to lowercase first
    filename_lower = filename.lower()
    if re.match(r'^[a-z0-9_]+\.(jpg|png)$', filename_lower):
        return True
    return False

# PITFALL 4: Unescaped metacharacters
def validate_domain_bad(domain):
    # Dot (.) matches any character, not literal dot
    # Attack: "example@com" passes
    if re.match(r'^[a-z]+.[a-z]+$', domain):
        return True
    return False

def validate_domain_good(domain):
    # Escape dot with backslash: \.
    if re.match(r'^[a-z]+\.[a-z]+$', domain):
        return True
    return False

# BEST: Use explicit validation, avoid complex regex
def validate_ip_address(ip):
    # Instead of complex regex, parse and validate
    parts = ip.split('.')

    if len(parts) != 4:
        return False

    for part in parts:
        if not part.isdigit():
            return False
        num = int(part)
        if num < 0 or num > 255:
            return False

    return True

Use Enums and Constants for Known Values

from enum import Enum

# VULNERABLE - accepts any string matching pattern
def set_user_role_bad(role):
    # Attack: "administrator" (typo), "ADMIN", "admin "
    if re.match(r'^[a-z]+$', role):
        user.role = role  # Could be any lowercase string
        return True
    return False

# SECURE - use enum for known values
class UserRole(Enum):
    ADMIN = 'admin'
    USER = 'user'
    GUEST = 'guest'

def set_user_role_good(role_str):
    # Convert to lowercase and check against enum
    try:
        role = UserRole(role_str.lower())
        user.role = role
        return True
    except ValueError:
        return False  # Not a valid role

# SECURE - use set for allowlist
ALLOWED_COUNTRIES = {'US', 'CA', 'GB', 'FR', 'DE', 'JP', 'AU'}

def validate_country_code(code):
    # Exact match against set - no pattern matching
    if code.upper() in ALLOWED_COUNTRIES:
        return True
    return False

# SECURE - use constants for file types
ALLOWED_EXTENSIONS = {'.jpg', '.png', '.gif', '.pdf', '.doc', '.docx'}

def validate_file_type(filename):
    # Extract extension
    ext = filename.lower().split('.')[-1] if '.' in filename else ''
    ext_with_dot = f'.{ext}'

    # Exact match against set
    if ext_with_dot in ALLOWED_EXTENSIONS:
        return True
    return False

# SECURE - validate HTTP methods
ALLOWED_METHODS = {'GET', 'POST', 'PUT', 'DELETE', 'PATCH'}

def validate_http_method(method):
    return method.upper() in ALLOWED_METHODS

Test with Bypass Payloads

import re

def test_validation_bypasses():
    # Test your validation function
    def validate(input_str):
        return re.match(r'^[a-zA-Z0-9]{3,20}$', input_str) is not None

    # Valid inputs - should pass
    assert validate('alice') == True
    assert validate('user123') == True
    assert validate('ABC') == True
    print("✓ Valid inputs pass")

    # Unanchored regex bypass - should fail
    assert validate('admin<script>') == False  # Suffix attack
    assert validate('<script>admin') == False  # Prefix attack
    print("✓ Blocks unanchored bypass")

    # Special characters - should fail
    assert validate('user@domain') == False
    assert validate('user;DROP') == False
    assert validate('user\\nmalicious') == False
    assert validate('user/../etc') == False
    print("✓ Blocks special characters")

    # Length violations - should fail
    assert validate('ab') == False  # Too short
    assert validate('a' * 21) == False  # Too long
    print("✓ Enforces length limits")

    # Encoding variations - should fail
    assert validate('user%20name') == False
    assert validate('user\\x00') == False
    print("✓ Blocks encoding attacks")

    # Case variations (depends on your rules)
    # If case-sensitive:
    assert validate('Alice') == True
    # If case-insensitive, normalize first
    print("✓ Handles case correctly")

    print("All validation bypass tests passed!")

# Test file extension validation
def test_file_extension():
    def validate(filename):
        return re.match(r'^[a-zA-Z0-9_-]+\.(jpg|png|gif)$', filename.lower()) is not None

    # Valid extensions - should pass
    assert validate('photo.jpg') == True
    assert validate('image.png') == True
    print("✓ Accepts valid extensions")

    # Double extension attacks - should fail
    assert validate('malware.exe.jpg') == False
    assert validate('file.php.png') == False
    print("✓ Blocks double extensions")

    # No extension - should fail
    assert validate('filename') == False
    print("✓ Requires extension")

    # Path traversal - should fail
    assert validate('../../../etc/passwd.jpg') == False
    assert validate('..\\..\\file.png') == False
    print("✓ Blocks path traversal")

    print("All file extension tests passed!")

if __name__ == '__main__':
    test_validation_bypasses()
    test_file_extension()

Common Vulnerable Patterns

Unanchored Regular Expressions

Regex without start (^) and end ($) anchors
Matches substring instead of entire string
Example: pattern.match(input) instead of pattern.fullmatch(input)
Attack: Valid prefix + malicious suffix bypasses check

Overly Broad Character Classes

Using .*, .+, \w+ without constraints
Accepting any whitespace or special characters
No length limits
Attack: Injection of unexpected but valid characters

Extension/Format Checking Without Position Validation

Checking if extension exists anywhere in string
Not validating extension is at the end
Example: Checking .jpg exists vs. ending with .jpg
Attack: malware.exe.jpg or file.jpg.php

Path/Filename Validation Without Semantic Checks

Accepting valid characters but not checking path traversal
No verification of canonical path
Allowing .., /, or absolute paths
Attack: ../../../etc/passwd

Protocol/Scheme Validation Without Allowlist

Accepting any protocol format
Not restricting to safe protocols (http, https)
Attack: javascript:, data:, file: protocols

Secure Patterns

Strict Anchored Regular Expressions

Always use start and end anchors: ^pattern$
Validate entire string, not substrings
Set maximum length limits before regex
Use language-specific exact match functions

Why this works: Anchoring regex with ^ and $ ensures the pattern matches the entire input string, preventing attacks that add malicious prefixes or suffixes to valid data.

Minimal Character Sets with Constraints

Define narrowest acceptable character set
Add explicit length limits (min and max)
Use specific character classes, not broad wildcards
Apply case sensitivity rules

Why this works: Limiting character sets to only what's needed and enforcing length constraints prevents injection of special characters and DoS attacks while ensuring input meets business requirements.

Extension Validation at End of String

Use anchored regex: ^[allowed_chars]+\.(ext1|ext2|ext3)$
Validate extension position explicitly
Check for double extensions
Normalize to lowercase before checking

Why this works: Anchoring extension checks to the end of the string prevents double-extension attacks (e.g., malware.exe.jpg) and ensures only truly allowed file types are accepted.

Path Validation with Canonicalization

Use allowlist of specific filenames when possible
Resolve to canonical/absolute path
Verify resolved path starts with allowed base directory
Reject any .., absolute paths, or symbolic links

Why this works: Canonicalization resolves all path references (., .., symlinks) to their true location, allowing you to verify the final path is within allowed boundaries and preventing path traversal attacks.

Protocol Allowlist with Additional Validation

Explicitly allowlist protocols (http, https only)
Parse URL/URI using standard library
Validate each component (scheme, host, port, path)
Add semantic checks (no private IPs, no localhost)

Why this works: Restricting to safe protocols (http/https) and validating all URL components prevents protocol-based attacks like javascript:, data:, and file: while blocking requests to private infrastructure.

Enum/Constant-Based Validation

Use sets, enums, or constants for known values
Exact string matching only
No pattern matching for enumerated values
Normalize (lowercase) before comparison

Why this works: Using exact matching against a predefined set of values eliminates pattern-matching bypass attempts and ensures only explicitly allowed values are accepted.

Multi-Layer Validation

Format validation (syntax)
Semantic validation (business rules)
Range validation (min/max, boundaries)
Relationship validation (cross-field checks)

Why this works: Multiple layers of validation ensure that input is both structurally correct and semantically valid, catching bypass attempts that might satisfy one check but not another.

Language-Specific Guidance

Java - Pattern.matches with anchors, enums, jakarta.validation
JavaScript/Node.js - Regex with ^ and $, validator.js, path validation
Python - re.fullmatch, pathlib.Path.resolve, ipaddress module

Security Checklist

All regex patterns use ^ and $ anchors
Character classes are specific (not .*, \w+, .+)
Length limits enforced (min and max)
Enumerated values use sets/enums, not patterns
Both format and semantic validation applied
Path validation includes canonicalization
URL validation checks scheme, domain, and context
Tests cover: valid inputs, bypass attempts, encoding variations