CWE-93: CRLF Injection - Python
Overview
CRLF Injection in Python applications occurs when untrusted user input containing carriage return (\r, %0D) and line feed (\n, %0A) characters is used in HTTP headers or other protocol fields without proper validation or sanitization. Attackers can exploit this to perform HTTP response splitting, header injection, log injection, cache poisoning, and cross-site scripting (XSS) attacks.
Primary Defence: Strip or reject all newline characters (\r, \n, \r\n) from user input before including in HTTP headers or logs, use framework methods (Flask's make_response(), Django's HttpResponse()) which provide built-in sanitization, validate header values against strict allowlists or regex patterns, and use structured logging (JSON format) to prevent CRLF injection, HTTP response splitting, and log injection attacks.
Common Vulnerable Patterns
Flask Redirect with User Input
# VULNERABLE - Direct user input in redirect location
from flask import Flask, request, redirect
app = Flask(__name__)
@app.route('/redirect')
def vulnerable_redirect():
url = request.args.get('url', '')
# VULNERABLE - User input directly in redirect
return redirect(url)
# Attack: /redirect?url=http://example.com%0d%0aSet-Cookie:%20admin=true
# Results in HTTP response splitting:
# HTTP/1.1 302 Found
# Location: http://example.com
# Set-Cookie: admin=true
# Attacker can inject arbitrary headers
Why this is vulnerable:
- No validation or sanitization
- CRLF characters allow header injection
- Response splitting possible
- Can set malicious cookies or headers
Custom Response Headers
# VULNERABLE - User input in custom headers
from flask import Flask, request, Response
app = Flask(__name__)
@app.route('/api/data')
def vulnerable_headers():
username = request.args.get('username', '')
response = Response("User data")
# VULNERABLE - User input in custom header
response.headers['X-User-Name'] = username
response.headers['X-Requested-By'] = request.headers.get('User-Agent', '')
return response
# Attack: ?username=admin%0d%0aContent-Length:%200%0d%0a%0d%0a<script>alert('XSS')</script>
# Injects new headers and content
Why this is vulnerable:
- Custom headers accept unsanitized input
- Response splitting via CRLF
- XSS via injected content
- Cache poisoning
Django HttpResponse Headers
# VULNERABLE - Django with user-controlled headers
from django.http import HttpResponse
from django.views.decorators.http import require_GET
@require_GET
def vulnerable_view(request):
callback = request.GET.get('callback', '')
data = '{"status": "success"}'
response = HttpResponse(data, content_type='application/json')
# VULNERABLE - User input in JSONP callback header
response['X-Callback'] = callback
return response
# Attack: ?callback=test%0d%0aSet-Cookie:%20sessionid=stolen
# Injects Set-Cookie header
Why this is vulnerable:
- Django doesn't auto-sanitize header values
- JSONP callback can contain CRLF
- Session fixation possible
- Header injection
Email Header Injection
# VULNERABLE - Email headers with user input
import smtplib
from email.message import EmailMessage
def send_feedback(name, email, subject, message):
msg = EmailMessage()
# VULNERABLE - User input in email headers
msg['From'] = email
msg['To'] = 'admin@example.com'
msg['Subject'] = subject
msg.set_content(message)
# VULNERABLE - Name in additional header
msg['X-Sender-Name'] = name
smtp = smtplib.SMTP('localhost')
smtp.send_message(msg)
smtp.quit()
# Attack: email = "attacker@evil.com%0aBcc:%20victim@example.com"
# Attack: subject = "Feedback%0aTo:%20victim2@example.com"
# Injects additional recipients
Why this is vulnerable:
- Email headers vulnerable to CRLF
- Can add Bcc, Cc recipients
- Spam relay possible
- Email spoofing
Log Injection
# VULNERABLE - Logging user input without sanitization
import logging
logger = logging.getLogger(__name__)
def process_login(username, password):
# VULNERABLE - User input in log message
logger.info(f"Login attempt for user: {username}")
if authenticate(username, password):
logger.info(f"Successful login: {username}")
return True
else:
logger.warning(f"Failed login for: {username}")
return False
# Attack: username = "admin\nINFO:root:Successful login: attacker\nINFO:root:Admin access granted"
# Creates fake log entries
Why this is vulnerable:
- Log injection via newlines
- Can forge log entries
- Audit trail manipulation
- Security monitoring bypass
FastAPI Response Headers
# VULNERABLE - FastAPI with custom headers
from fastapi import FastAPI, Query, Response
app = FastAPI()
@app.get("/download")
async def download_file(filename: str = Query(...)):
content = "File content"
# VULNERABLE - User input in Content-Disposition header
response = Response(content=content, media_type="application/octet-stream")
response.headers["Content-Disposition"] = f"attachment; filename={filename}"
return response
# Attack: ?filename=file.txt%0d%0aX-Injected:%20malicious
# Injects additional headers
Why this is vulnerable:
- FastAPI doesn't sanitize header values
- Content-Disposition vulnerable
- File download manipulation
- Header injection
CSV Export with User Data
# VULNERABLE - CSV export with unsanitized data
import csv
from io import StringIO
from flask import Flask, Response
app = Flask(__name__)
@app.route('/export')
def export_csv():
users = [
{'name': request.args.get('name', 'User'), 'email': 'user@example.com'},
]
# VULNERABLE - User data in CSV without sanitization
output = StringIO()
writer = csv.DictWriter(output, fieldnames=['name', 'email'])
writer.writeheader()
writer.writerows(users)
return Response(
output.getvalue(),
mimetype='text/csv',
headers={'Content-Disposition': 'attachment; filename=users.csv'}
)
# Attack: ?name=admin%0aadmin2,admin2@evil.com
# Injects additional CSV rows
Why this is vulnerable:
- CSV injection via newlines
- Can inject formulas
- Data exfiltration
- Code execution in Excel
HTTP Proxy Headers
# VULNERABLE - Proxy forwarding with user headers
from flask import Flask, request
import requests
app = Flask(__name__)
@app.route('/proxy')
def proxy_request():
target_url = request.args.get('url', '')
# VULNERABLE - Forwarding user-controlled headers
headers = {
'X-Forwarded-For': request.headers.get('X-Forwarded-For', ''),
'X-Real-IP': request.headers.get('X-Real-IP', ''),
'X-Custom': request.headers.get('X-Custom', '')
}
response = requests.get(target_url, headers=headers)
return response.text
# Attack: X-Forwarded-For: 1.2.3.4%0d%0aX-Admin:%20true
# Injects headers to backend
Why this is vulnerable:
- Proxy headers not sanitized
- Backend header injection
- Authentication bypass
- IP spoofing
Secure Patterns
Flask Redirect with Validation
# SECURE - Flask redirect with CRLF removal and validation
from flask import Flask, request, redirect, abort
import re
from urllib.parse import urlparse
app = Flask(__name__)
def sanitize_url(url):
"""Remove CRLF characters and validate URL"""
if not url:
return None
# Remove CRLF characters
clean_url = url.replace('\r', '').replace('\n', '').replace('%0d', '').replace('%0a', '')
# Validate URL format
try:
parsed = urlparse(clean_url)
# Only allow http/https schemes
if parsed.scheme not in ['http', 'https', '']:
return None
# Optionally: allowlist domains
# if parsed.netloc not in ['example.com', 'trusted.com']:
# return None
return clean_url
except:
return None
@app.route('/redirect')
def secure_redirect():
url = request.args.get('url', '')
# SECURE - Sanitize and validate URL
clean_url = sanitize_url(url)
if not clean_url:
abort(400, "Invalid redirect URL")
return redirect(clean_url)
if __name__ == '__main__':
app.run()
Why this works:
This pattern prevents CRLF injection through multiple defensive layers. The sanitize_url() function first removes literal CRLF characters (\r, \n) and their URL-encoded equivalents (%0d, %0a), preventing attackers from injecting header delimiters. By handling both literal and encoded forms (including uppercase variants), the sanitization catches different encoding variations that attackers might use to bypass simple filters. This comprehensive character removal ensures that even if the URL passes validation, it cannot contain the characters needed for response splitting.
The URL parsing and validation using urlparse() provides structural validation beyond just character filtering. By checking that the scheme is either empty (relative URL) or explicitly http/https, the code prevents javascript:, data:, or other exotic schemes that could be used for XSS attacks. The optional domain allowlist (commented out in the example) demonstrates how you can further restrict redirects to trusted destinations, preventing open redirect vulnerabilities where attackers trick users into visiting malicious sites.
Returning None for invalid URLs and checking this result in the route handler implements secure failure handling. The abort(400) call explicitly rejects malicious requests rather than attempting to redirect to a potentially dangerous location. This "fail securely" approach is critical for security functions - if validation detects an attack, the safest response is to reject the request entirely. The combination of character sanitization, structural validation, scheme allowlisting, and secure error handling creates defense-in-depth that protects against CRLF injection, open redirects, and XSS through the redirect parameter.
Custom Headers with Sanitization
# SECURE - Custom headers with CRLF removal
from flask import Flask, request, Response
import re
app = Flask(__name__)
def sanitize_header_value(value):
"""Remove CRLF and other control characters"""
if not value:
return ''
# Remove CRLF characters (including encoded versions)
clean = value.replace('\r', '').replace('\n', '')
clean = clean.replace('%0d', '').replace('%0a', '')
clean = clean.replace('%0D', '').replace('%0A', '')
# Remove other control characters
clean = re.sub(r'[\x00-\x1f\x7f]', '', clean)
# Limit length
return clean[:200]
def validate_username(username):
"""Validate username format"""
if not username:
return False
return bool(re.match(r'^[a-zA-Z0-9._-]{3,50}$', username))
@app.route('/api/data')
def secure_headers():
username = request.args.get('username', '')
# SECURE - Validate input
if not validate_username(username):
return Response("Invalid username", status=400)
response = Response("User data")
# SECURE - Sanitize header value
clean_username = sanitize_header_value(username)
response.headers['X-User-Name'] = clean_username
return response
Why this works:
This pattern demonstrates comprehensive input sanitization for HTTP headers through both validation and character filtering. The sanitize_header_value() function removes CRLF characters in multiple forms: literal \r and \n, lowercase URL-encoded %0d and %0a, and uppercase URL-encoded %0D and %0A. This multi-encoding approach prevents bypass attempts where attackers use different encoding schemes to evade simple filters. The regex [\x00-\x1f\x7f] removes all ASCII control characters, including not just CRLF but also null bytes, tabs, and escape sequences that could manipulate header parsing or terminal displays.
The username validation using regex (^[a-zA-Z0-9._-]{3,50}$) enforces a strict allowlist of allowed characters before the value is ever used. This validation-first approach means that only alphanumeric characters, dots, underscores, and hyphens are permitted in usernames. By rejecting any input that doesn't match this pattern, you eliminate entire classes of attacks - not just CRLF injection but also SQL injection attempts, XSS payloads, and other malicious input that might be disguised as a username. Returning HTTP 400 for invalid usernames provides clear feedback that the request was malformed.
The 200-character length limit in sanitize_header_value() provides additional defense against denial-of-service attacks where attackers send extremely long header values to consume server resources or trigger buffer-related vulnerabilities. By validating before sanitizing, the code ensures that only well-formed usernames even reach the sanitization function, while sanitization provides an extra layer of protection if validation is somehow bypassed or if other code paths use the sanitizer. This defense-in-depth approach - validation, sanitization, length limits - ensures that even if one control fails, others prevent the attack.
Django with Header Sanitization
# SECURE - Django with proper header handling
from django.http import HttpResponse, HttpResponseBadRequest
from django.views.decorators.http import require_GET
import re
def sanitize_header_value(value):
"""Remove CRLF and control characters"""
if not value:
return ''
# Remove all newline variations
clean = re.sub(r'[\r\n\x00-\x1f\x7f]', '', value)
# Remove URL-encoded CRLF
clean = re.sub(r'%0[dDaA]', '', clean)
return clean[:200]
def validate_callback(callback):
"""Validate JSONP callback name"""
return bool(re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*$', callback))
@require_GET
def secure_view(request):
callback = request.GET.get('callback', '')
# SECURE - Validate callback format
if not validate_callback(callback):
return HttpResponseBadRequest("Invalid callback name")
data = '{"status": "success"}'
response = HttpResponse(data, content_type='application/json')
# SECURE - Use validated callback (no sanitization needed)
response['X-Callback'] = callback
return response
Why this works:
This Django pattern combines strict input validation with sanitization to prevent CRLF injection in custom headers. The validate_callback() function uses a regex that enforces JSONP callback naming conventions: must start with a letter or underscore, followed by any combination of letters, numbers, and underscores. This allowlist approach is extremely secure because it only accepts characters that are valid in JavaScript identifiers, completely eliminating the possibility of CRLF characters or other special characters being present in the callback parameter.
The sanitize_header_value() function provides defense-in-depth by removing CRLF characters even though the validation should prevent them from occurring. The regex [\r\n\x00-\x1f\x7f] removes literal newlines and all control characters, while the second regex %0[dDaA] removes URL-encoded CRLF sequences in both lowercase and uppercase. This double-layer protection is valuable because it protects against scenarios where the sanitizer might be used elsewhere in the codebase, or if validation is accidentally bypassed due to code changes.
Returning HttpResponseBadRequest for invalid callbacks implements proper error handling for security violations. Rather than attempting to sanitize invalid input or using a default value, the code explicitly rejects malicious requests with HTTP 400. This approach makes attack attempts visible in server logs and prevents attackers from discovering what sanitization is applied. Because the callback passed validation using the strict regex, it doesn't need sanitization before being set as a header value - the comment "no sanitization needed" reflects this. However, having the sanitization function available demonstrates good security architecture for other headers that might not have such strict validation.
Email with Header Validation
# SECURE - Email with header sanitization
import smtplib
from email.message import EmailMessage
from email.utils import parseaddr
import re
def sanitize_email_header(value):
"""Remove CRLF from email headers"""
if not value:
return ''
# Remove CRLF and control characters
return re.sub(r'[\r\n\x00-\x1f\x7f]', '', value)
def validate_email(email):
"""Validate email format"""
if not email or len(email) > 254:
return False
name, addr = parseaddr(email)
if not addr:
return False
# Additional validation
return bool(re.match(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$', addr))
def send_feedback_secure(name, email, subject, message):
"""Send email with sanitized headers"""
# SECURE - Validate inputs
if not validate_email(email):
raise ValueError("Invalid email address")
if len(subject) > 200 or len(message) > 5000:
raise ValueError("Content too long")
# SECURE - Sanitize all header values
clean_email = sanitize_email_header(email)
clean_subject = sanitize_email_header(subject)
clean_name = sanitize_email_header(name)
msg = EmailMessage()
msg['From'] = clean_email
msg['To'] = 'admin@example.com'
msg['Subject'] = clean_subject
msg.set_content(message)
msg['X-Sender-Name'] = clean_name
smtp = smtplib.SMTP('localhost')
smtp.send_message(msg)
smtp.quit()
Why this works:
This email pattern prevents header injection attacks through comprehensive validation and sanitization of all email header fields. The validate_email() function performs structural validation using parseaddr() from Python's email utilities, which properly parses email addresses according to RFC standards. The length check (254 characters maximum per RFC 5321) and regex validation ensure that only properly formatted email addresses are accepted. This prevents attackers from injecting additional recipients through Bcc headers or manipulating the From field with CRLF sequences.
The sanitize_email_header() function removes all control characters including CRLF from header values using the regex [\r\n\x00-\x1f\x7f]. Email headers are particularly vulnerable to injection because SMTP protocol uses CRLF as a delimiter between headers and between headers and message body. An attacker who injects \r\n into a subject line could add Bcc: attacker@evil.com, turning your email system into a spam relay. By removing these characters from all header fields (From, Subject, custom headers like X-Sender-Name), the code prevents this entire class of attacks.
The length validation (200 characters for subject, 5000 for message) prevents denial-of-service attacks and limits the scope of any potential injection. The pattern validates before sanitizing, ensuring that only valid emails are processed, then sanitizes as defense-in-depth. Using Python's EmailMessage class is safer than manually constructing SMTP commands, as the class handles header encoding and formatting according to email RFCs. However, the class doesn't automatically sanitize CRLF from header values, making the explicit sanitization critical. This pattern demonstrates that even when using high-level libraries, you must still validate and sanitize user input before placing it in protocol-sensitive contexts like email headers.
Secure Logging
# SECURE - Logging with sanitization
import logging
import re
logger = logging.getLogger(__name__)
def sanitize_log_input(value):
"""Remove CRLF and control characters for logging"""
if not value:
return ''
# Remove newlines and control characters
clean = re.sub(r'[\r\n\x00-\x1f\x7f]', ' ', value)
# Limit length
return clean[:200]
def validate_username(username):
"""Validate username format"""
return bool(re.match(r'^[a-zA-Z0-9._-]{3,50}$', username))
def process_login_secure(username, password):
"""Process login with secure logging"""
# SECURE - Validate username
if not validate_username(username):
logger.warning("Invalid username format in login attempt")
return False
# SECURE - Sanitize for logging
clean_username = sanitize_log_input(username)
logger.info(f"Login attempt for user: {clean_username}")
if authenticate(username, password):
logger.info(f"Successful login: {clean_username}")
return True
else:
logger.warning(f"Failed login for: {clean_username}")
return False
def authenticate(username, password):
# Authentication logic
return True
Why this works:
This logging pattern prevents log injection attacks by sanitizing user input before it's written to log files. The sanitize_log_input() function uses regex to replace all newlines and control characters with spaces, preventing attackers from creating fake log entries. Without this protection, an attacker could provide a username like "admin\nINFO: User hacker performed GRANT ADMIN", which would create a completely fabricated log entry that appears legitimate in log analysis tools, SIEMs, and audit reviews. This could enable attackers to hide their activities or frame other users.
The regex [\r\n\x00-\x1f\x7f] matches not just CRLF but all ASCII control characters. This comprehensive approach prevents attacks using other control characters like tabs or escape sequences that might be used to manipulate log file display, inject terminal escape codes, or interfere with log parsing. Replacing these characters with spaces rather than removing them entirely preserves the readability of log entries while neutralizing the attack - users can still see what input was provided, but it can't break the log structure.
The username validation using regex (^[a-zA-Z0-9._-]{3,50}$) provides a first line of defense by rejecting usernames that don't match expected format. This validation-first approach means most attacks are caught before reaching the sanitization function. The 200-character length limit in sanitize_log_input() prevents excessively long inputs that could fill disk space or trigger buffer issues. By combining validation (rejecting invalid usernames entirely), sanitization (cleaning what passes validation), and length limits, this pattern creates multiple layers of protection. Using f-strings with sanitized values maintains clean, readable code while ensuring all logged user input is safe.
FastAPI with Pydantic Validation
# SECURE - FastAPI with Pydantic validation
from fastapi import FastAPI, HTTPException, Response
from pydantic import BaseModel, validator
import re
app = FastAPI()
class DownloadRequest(BaseModel):
filename: str
@validator('filename')
def validate_filename(cls, v):
# Remove CRLF
clean = re.sub(r'[\r\n\x00-\x1f\x7f]', '', v)
# Validate format
if not re.match(r'^[a-zA-Z0-9._-]+\.[a-zA-Z0-9]+$', clean):
raise ValueError('Invalid filename format')
if len(clean) > 100:
raise ValueError('Filename too long')
return clean
@app.get("/download")
async def download_file(req: DownloadRequest):
content = "File content"
# SECURE - Use validated filename
response = Response(content=content, media_type="application/octet-stream")
response.headers["Content-Disposition"] = f"attachment; filename={req.filename}"
return response
Why this works:
This FastAPI pattern leverages Pydantic's validation framework to prevent CRLF injection at the data model level, ensuring that invalid input is rejected before it reaches any application logic. The @validator decorator on the filename field executes automatically whenever a DownloadRequest is created, providing centralized validation that can't be accidentally bypassed. The regex re.sub(r'[\r\n\x00-\x1f\x7f]', '', v) removes all control characters including CRLF, ensuring the filename is clean before further validation.
The filename format validation using ^[a-zA-Z0-9._-]+\.[a-zA-Z0-9]+$ enforces a strict allowlist: alphanumeric characters, dots, hyphens, and underscores, with exactly one dot separating the base name from the extension. This pattern prevents not only CRLF injection but also path traversal attacks (by disallowing / and \), hidden files (by requiring the name to start with alphanumeric), and other filename-based attacks. If the filename doesn't match this pattern after CRLF removal, Pydantic raises a ValueError which FastAPI automatically converts to an HTTP 422 Unprocessable Entity response with detailed error information.
The 100-character length limit prevents denial-of-service through excessively long filenames that could cause filesystem issues or consume excessive memory. Because the validation happens in the Pydantic model, it's automatically applied to all code paths that use DownloadRequest - you can't accidentally forget to validate the filename in some route handler. Once the request object is created, you can trust that req.filename has been validated and sanitized, allowing you to use it confidently in the Content-Disposition header. This declarative validation approach is superior to imperative validation scattered throughout route handlers because it's centralized, automatically applied, type-safe, and generates consistent error responses.
Verification
After implementing the recommended secure patterns, verify the fix through multiple approaches:
- Manual testing: Submit malicious payloads relevant to this vulnerability and confirm they're handled safely without executing unintended operations
- Code review: Confirm all instances use the secure pattern (parameterized queries, safe APIs, proper encoding) with no string concatenation or unsafe operations
- Static analysis: Use security scanners to verify no new vulnerabilities exist and the original finding is resolved
- Regression testing: Ensure legitimate user inputs and application workflows continue to function correctly
- Edge case validation: Test with special characters, boundary conditions, and unusual inputs to verify proper handling
- Framework verification: If using a framework or library, confirm the recommended APIs are used correctly according to documentation
- Authentication/session testing: Verify security controls remain effective and cannot be bypassed (if applicable to the vulnerability type)
- Rescan: Run the security scanner again to confirm the finding is resolved and no new issues were introduced