CWE-183: Permissive List of Allowed Inputs - Java

Overview

Java-specific guidance for implementing strict input validation using Pattern, URI, and Path classes.

Primary Defence: Use fully anchored regex patterns with ^ and $, call matches() instead of find() for complete input validation, and implement length limits with secure APIs like URI, Path, and InetAddress classes for complex validation scenarios to ensure complete input matching and prevent injection attacks.

Common Vulnerable Patterns

Unanchored Pattern Matching

import java.util.regex.*;

public class VulnerableValidator {
    // VULNERABLE - no anchors, matches substring
    public boolean validateUsername(String username) {
        // Attacker: "admin'; DROP TABLE users--"
        Pattern pattern = Pattern.compile("[a-zA-Z0-9]+");
        return pattern.matcher(username).find();  // Matches substring!
    }

    // VULNERABLE - permissive URL validation
    public boolean validateURL(String url) {
        // Attacker: "javascript:alert(1)"
        return url.matches(".*://.*");  // Allows any protocol!
    }
}

Permissive File Extension Check

public boolean validateFilename(String filename) {
    // VULNERABLE - checks if extension appears anywhere
    // Attacker: "malware.exe.jpg"
    return filename.matches(".*\\.(jpg|png|gif).*");
}

Secure Patterns

Strict Username Validation

import java.util.regex.*;
import java.util.Set;

public class SecureValidator {
    private static final int MAX_USERNAME_LENGTH = 20;
    private static final Pattern USERNAME_PATTERN = 
        Pattern.compile("^[a-z0-9_]{3,20}$", Pattern.CASE_INSENSITIVE);
    private static final Set<String> RESERVED_NAMES = 
        Set.of("admin", "root", "system", "administrator");

    public boolean validateUsername(String username) {
        if (username == null || username.length() > MAX_USERNAME_LENGTH) {
            return false;
        }

        // Strict: anchored pattern, use matches() not find()
        if (!USERNAME_PATTERN.matcher(username).matches()) {
            return false;
        }

        // Reject reserved names
        if (RESERVED_NAMES.contains(username.toLowerCase())) {
            return false;
        }

        return true;
    }
}

Why this works: The anchored regex pattern ^[a-z0-9_]{3,20}$ uses ^ (start) and $ (end) anchors to ensure the entire string matches exactly, preventing substring matches that would allow "admin'; DROP TABLE users--" to pass validation. The matches() method enforces whole-string matching, unlike find() which only searches for substrings. Length validation with MAX_USERNAME_LENGTH prevents ReDoS attacks and buffer overflows. The reserved names check prevents privilege escalation by blocking admin/system accounts. By pre-compiling the pattern as a static final constant, regex compilation happens once at class load time, improving performance for repeated validations.

Strict URL Validation

import java.net.*;

public class SecureValidator {
    private static final Set<String> ALLOWED_SCHEMES = Set.of("http", "https");

    public boolean validateURL(String urlString) {
        try {
            URI uri = new URI(urlString);
            String scheme = uri.getScheme();

            // Strict: only allow specific protocols
            if (scheme == null || !ALLOWED_SCHEMES.contains(scheme.toLowerCase())) {
                return false;
            }

            // Validate host exists
            String host = uri.getHost();
            if (host == null || host.isEmpty()) {
                return false;
            }

            // Optional: reject private/loopback addresses
            InetAddress addr = InetAddress.getByName(host);
            if (addr.isLoopbackAddress() || addr.isSiteLocalAddress()) {
                return false;
            }

            return true;
        } catch (URISyntaxException | UnknownHostException e) {
            return false;
        }
    }
}

Why this works: The URI class provides robust parsing that correctly handles URL components and rejects malformed URLs. By validating uri.getScheme() against an allowlist (ALLOWED_SCHEMES), the code prevents dangerous protocols like javascript:, data:, file:, or vbscript: that could enable XSS or local file access attacks. Checking for a non-null, non-empty host prevents URLs like http:// that have valid schemes but no destination. The InetAddress.getByName() verification ensures the host is resolvable and the loopback/private address checks prevent SSRF attacks targeting internal services. Using try-catch for both URISyntaxException and UnknownHostException ensures that any parsing or DNS resolution failures result in rejection, following a fail-secure pattern.

Strict Email Validation

This is strictly based on xxxxx@yyyyy.zzzzzz. Full RFC5322 compliance can be much more complex.

import java.util.regex.Pattern;

public class EmailValidator {
    private static final int MAX_EMAIL_LENGTH = 254;
    private static final int MAX_LOCAL_LENGTH = 64;
    private static final Pattern EMAIL_PATTERN = Pattern.compile(
        "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
    );

    public boolean validateEmail(String email) {
        if (email == null || email.length() > MAX_EMAIL_LENGTH) {
            return false;
        }

        // Pattern matching with anchored regex
        if (!EMAIL_PATTERN.matcher(email).matches()) {
            return false;
        }

        // Additional semantic checks
        String[] parts = email.split("@");
        if (parts[0].length() > MAX_LOCAL_LENGTH) {
            return false;
        }

        return true;
    }
}

Why this works: The anchored pattern ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ enforces strict email structure with clear separation between local part, @ symbol, domain, and TLD. The anchors prevent accepting emails embedded in larger strings (like "user@example.com<script>alert(1)</script>"). Length validation at 254 characters matches RFC 5321 limits and prevents ReDoS attacks from extremely long inputs. The local part length check (64 characters) enforces RFC 5321 mailbox limits. The pattern requires at least a 2-character TLD (.co, .uk) which blocks invalid domains and most typos. This simplified approach balances security with usability - full RFC 5322 compliance is extremely complex and rarely needed for web applications.

Strict Filename Validation

import java.util.regex.Pattern;

public class FilenameValidator {
    private static final int MAX_FILENAME_LENGTH = 255;
    private static final Pattern FILENAME_PATTERN = 
        Pattern.compile("^[a-zA-Z0-9_-]+\\.(jpg|png|gif)$", Pattern.CASE_INSENSITIVE);

    public boolean validateFilename(String filename) {
        if (filename == null || filename.length() > MAX_FILENAME_LENGTH) {
            return false;
        }

        // Anchored pattern - must END with allowed extension
        if (!FILENAME_PATTERN.matcher(filename).matches()) {
            return false;
        }

        // Additional security checks
        if (filename.contains("..") || filename.contains("/") || filename.contains("\\")) {
            return false;
        }

        return true;
    }
}

Why this works: The pattern ^[a-zA-Z0-9_-]+\.(jpg|png|gif)$ uses the $ anchor to ensure the filename ends with an allowed extension, preventing double-extension attacks like "malware.exe.jpg" where the real extension is .exe but .jpg appears in the filename. The character allowlist [a-zA-Z0-9_-] blocks special characters that could be used for path traversal or command injection. Length validation prevents buffer overflows and denial-of-service from extremely long filenames. The explicit checks for .., /, and \ provide defense-in-depth against path traversal, even though the regex should already block these. Case-insensitive matching prevents bypasses like "file.JPG" vs "file.jpg".

Path Validation with Canonicalization

import java.io.*;
import java.nio.file.*;
import java.util.Set;

public class FileAccessValidator {
    private static final Path BASE_DIR = Paths.get("/var/data").toAbsolutePath();
    private static final Set<String> ALLOWED_FILES = 
        Set.of("report.pdf", "data.csv", "summary.txt");

    public File getFile(String filename) throws IOException {
        // Strict allowlist
        if (!ALLOWED_FILES.contains(filename)) {
            throw new IllegalArgumentException("File not allowed");
        }

        // Resolve to canonical path
        Path filePath = BASE_DIR.resolve(filename).toRealPath();

        // Verify within allowed directory
        if (!filePath.startsWith(BASE_DIR)) {
            throw new IllegalArgumentException("Path traversal detected");
        }

        return filePath.toFile();
    }
}

Why this works: The allowlist approach with ALLOWED_FILES provides the strongest security by explicitly defining which files can be accessed, blocking any unauthorized file requests. The toRealPath() method resolves symbolic links and normalizes the path (removing ., .., redundant separators), preventing path traversal attacks that use techniques like "../../etc/passwd", symbolic links, or Windows-specific tricks like "file....". The startsWith() check ensures the canonical path remains within BASE_DIR, blocking escapes even if normalization was bypassed. Using Paths.get().toAbsolutePath() for the base directory ensures consistent comparison regardless of the current working directory. This defense-in-depth approach combines allowlisting, canonicalization, and boundary checking.

Enum-Based Validation

public enum Role {
    USER, MODERATOR, ADMIN;

    public static boolean isValid(String role) {
        if (role == null) {
            return false;
        }
        try {
            Role.valueOf(role.toUpperCase());
            return true;
        } catch (IllegalArgumentException e) {
            return false;
        }
    }
}

// Usage
public boolean validateRole(String role) {
    return Role.isValid(role);
}

Why this works: Using a Java enum provides compile-time type safety and a fixed set of allowed values that cannot be extended at runtime. The valueOf() method throws IllegalArgumentException for any string that doesn't exactly match an enum constant, providing perfect allowlist validation without regex complexity. Converting input to uppercase (role.toUpperCase()) provides case-insensitive matching while maintaining strict value validation. This approach eliminates injection risks entirely because there's no pattern matching - the value either exists in the enum or it doesn't. Enums are also more maintainable than string constants because the IDE can detect typos, refactoring tools work correctly, and the type system prevents invalid assignments.

Numeric ID Validation

import java.util.regex.Pattern;

public class IDValidator {
    private static final Pattern ID_PATTERN = Pattern.compile("^[0-9]{8}$");
    private static final int MIN_ID = 10000000;
    private static final int MAX_ID = 99999999;

    public boolean validateID(String idStr) {
        // Format validation
        if (!ID_PATTERN.matcher(idStr).matches()) {
            return false;
        }

        // Semantic validation: check range
        try {
            int id = Integer.parseInt(idStr);
            return id >= MIN_ID && id <= MAX_ID;
        } catch (NumberFormatException e) {
            return false;
        }
    }
}

Why this works: The pattern ^[0-9]{8}$ enforces exactly 8 digits with anchors, preventing inputs like "12345678abc" or "abc12345678" that contain valid substrings. This format validation happens before parsing, catching malformed input early and preventing NumberFormatException from non-numeric characters. The range check with MIN_ID and MAX_ID enforces semantic validity - for example, if your IDs start at 10000000, inputs like "00000001" or "99999999" that match the format but are outside valid ranges get rejected. The try-catch handles edge cases like overflow (though an 8-digit number won't overflow int). This layered validation (format → parsing → range) provides defense-in-depth and clear error handling.

Java-Specific Best Practices

Use `matches()` Not `find()`

import java.util.regex.*;

Pattern pattern = Pattern.compile("^[a-z0-9]+$");
Matcher matcher = pattern.matcher(input);

// WRONG: finds substring match
if (matcher.find()) { }

// CORRECT: matches entire string (when pattern is anchored)
if (matcher.matches()) { }

// ALTERNATIVE: use String.matches() for simple cases
if (input.matches("^[a-z0-9]+$")) { }

Pre-compile Patterns as Constants

public class Validator {
    // Compile once, reuse many times
    private static final Pattern USERNAME_PATTERN = 
        Pattern.compile("^[a-z0-9_]{3,20}$", Pattern.CASE_INSENSITIVE);

    public boolean validate(String username) {
        return USERNAME_PATTERN.matcher(username).matches();
    }
}

Use `Set.of()` for Allowlists (Java 9+)

// Immutable set (Java 9+)
private static final Set<String> ALLOWED_EXTENSIONS = 
    Set.of("jpg", "png", "gif", "pdf");

// Older Java versions
private static final Set<String> ALLOWED_EXTENSIONS = 
    Collections.unmodifiableSet(new HashSet<>(
        Arrays.asList("jpg", "png", "gif", "pdf")
    ));

Use NIO Path APIs for File Operations

import java.nio.file.*;

public Path validatePath(String filename) throws IOException {
    Path basePath = Paths.get("/var/data").toAbsolutePath().normalize();
    Path filePath = basePath.resolve(filename).normalize();

    // Check if resolved path is within base directory
    if (!filePath.startsWith(basePath)) {
        throw new SecurityException("Path traversal attempt");
    }

    return filePath;
}

Verification

After implementing the recommended secure patterns, verify the fix through multiple approaches:

Manual testing: Submit malicious payloads relevant to this vulnerability and confirm they're handled safely without executing unintended operations
Code review: Confirm all instances use the secure pattern (parameterized queries, safe APIs, proper encoding) with no string concatenation or unsafe operations
Static analysis: Use security scanners to verify no new vulnerabilities exist and the original finding is resolved
Regression testing: Ensure legitimate user inputs and application workflows continue to function correctly
Edge case validation: Test with special characters, boundary conditions, and unusual inputs to verify proper handling
Framework verification: If using a framework or library, confirm the recommended APIs are used correctly according to documentation
Authentication/session testing: Verify security controls remain effective and cannot be bypassed (if applicable to the vulnerability type)
Rescan: Run the security scanner again to confirm the finding is resolved and no new issues were introduced