CWE-115: Misinterpretation of Input
Overview
Misinterpretation of input occurs when an application or system component incorrectly interprets the structure, meaning, or format of incoming data due to ambiguous encoding, conflicting parsers, or inconsistent validation logic. This vulnerability arises when multiple layers of software (web server, application framework, backend service) process the same input differently, allowing attackers to craft payloads that bypass security controls in one layer while being maliciously interpreted by another. Common manifestations include HTTP request smuggling, SQL injection via character encoding differences, and path traversal through URL interpretation inconsistencies.
OWASP Classification
A05:2025 - Injection
Risk
Medium to High: Misinterpretation vulnerabilities can enable request smuggling attacks leading to cache poisoning, credential hijacking, and firewall bypass. They can also facilitate injection attacks when validation logic and execution logic interpret input differently, allowing malicious payloads to evade detection while still being executed.
Remediation Steps
Core principle: Ensure all system layers interpret input identically by using consistent parsing logic, explicit encoding declarations, and strict validation that matches actual interpretation semantics.
Standardize Input Parsing
Use consistent parsing across all layers:
- Character encoding: Explicitly declare UTF-8 everywhere, reject ambiguous encodings
- URL parsing: Use same URL parser for validation and routing
- Header interpretation: Handle multi-line headers, folding, duplicate headers consistently
- Content-Type processing: Enforce strict Content-Type validation matching actual parser behavior
Normalize Input Before Validation
Perform canonicalization before security checks:
# VULNERABLE - validation uses different interpretation than execution
if is_safe_path(user_path): # Checks "../../etc/passwd"
content = read_file(normalize(user_path)) # Actually reads "/etc/passwd"
# SECURE - normalize first, then validate
normalized_path = normalize(user_path)
if is_safe_path(normalized_path):
content = read_file(normalized_path)
Prevent HTTP Request Smuggling
Issue: Frontend and backend servers disagree on request boundaries.
Fix:
- Use HTTP/2 (eliminates CL.TE and TE.CL attacks)
- Disable HTTP request pipelining
- Reject requests with both Content-Length and Transfer-Encoding
- Normalize headers before forwarding through proxies
- Use same HTTP parser across infrastructure (nginx, HAProxy, application)
Address Encoding Ambiguities
Character Encoding Attacks:
# VULNERABLE - validation checks UTF-8, execution uses ISO-8859-1
if contains_xss(input_utf8):
reject()
display(input_iso88591) # Displays decoded differently!
# SECURE - force consistent encoding
normalized = input.encode('utf-8', errors='replace').decode('utf-8')
if contains_xss(normalized):
reject()
display(normalized)
Validate After Final Interpretation
Perform security checks on the exact data form that will be processed:
- If database expects UTF-8, validate UTF-8 not raw bytes
- If OS expects filesystem path, validate canonical path not raw input
- If SQL engine interprets Unicode, validate Unicode not ASCII
Use Strict Content-Type Validation
# VULNERABLE
if content_type.startswith('application/json'):
data = json.loads(request.body)
# SECURE
if content_type == 'application/json; charset=utf-8':
data = json.loads(request.body.decode('utf-8'))
else:
reject()
Implement Input Rejection
When in doubt, reject ambiguous input:
- Multiple encodings specified (UTF-8 and ISO-8859-1)
- Both Content-Length and Transfer-Encoding headers
- Conflicting URL path interpretations (/../, /..;/, /%2e%2e/)
- Non-canonical representations (overlong UTF-8, mixed encodings)
Dynamic Scan Guidance
For guidance on remediating this CWE when detected by dynamic (DAST) scanners:
- Dynamic Scan Guidance - Analyzing DAST findings and mapping to source code