CWE-80: Cross-Site Scripting (XSS) - Python
Overview
XSS occurs when untrusted data is included in web output without proper encoding. Python web frameworks like Django and Flask provide built-in protection, but you must use them correctly.
Primary Defence: Use framework auto-escaping (Django templates, Flask/Jinja2), or explicit encoding with html.escape() for manual HTML construction.
Common Vulnerable Patterns
Django mark_safe() Misuse
# VULNERABLE - Marking user input as safe
from django.utils.safestring import mark_safe
def profile_view(request):
user_bio = request.GET.get('bio', '')
safe_bio = mark_safe(user_bio) # DANGEROUS!
return render(request, 'profile.html', {'bio': safe_bio})
Flask Without Auto-Escaping
# VULNERABLE - Disabling auto-escape
from flask import Flask, request, Markup
app = Flask(__name__)
@app.route('/comment')
def show_comment():
comment = request.args.get('text', '')
return f'<div>{Markup(comment)}</div>' # DANGEROUS!
Manual HTML Construction
# VULNERABLE - String concatenation
from flask import Flask, request
@app.route('/greeting')
def greet():
name = request.args.get('name', 'Guest')
html = '<h1>Hello, ' + name + '</h1>'
return html # No escaping!
JavaScript Context Without Escaping
# VULNERABLE - Injecting into JavaScript
def search_view(request):
query = request.GET.get('q', '')
return render(request, 'search.html', {'query': query})
# Template:
# <script>
# var searchTerm = '{{ query }}'; // Can break out with '
# </script>
Secure Patterns
Django Auto-Escaping (Default)
# SECURE - Django templates auto-escape by default
from django.shortcuts import render
def profile_view(request):
user_bio = request.GET.get('bio', '')
# Django automatically HTML-escapes user_bio in template
return render(request, 'profile.html', {'bio': user_bio})
# Template (profile.html):
# <div class="bio">
# {{ bio }} <!-- Automatically escaped -->
# </div>
Why this works: Django's template engine automatically HTML-encodes all variable output using the {{ }} syntax by default. When you render {{ bio }} in a template, Django converts dangerous HTML characters (<, >, &, ", ') into their entity equivalents (<, >, &, etc.) before sending to the browser. This happens during the template rendering phase, after your view passes data to the template but before the HTTP response is generated. The auto-escaping is enabled by default in Django settings (autoescape=True) and applies to all templates unless explicitly disabled. This secure-by-default design makes XSS much harder to introduce accidentally - developers must explicitly use the |safe filter or mark_safe() function to bypass escaping, making dangerous usages easy to spot in code reviews. The automatic escaping works in both HTML content and attribute contexts, providing comprehensive protection.
Flask/Jinja2 Auto-Escaping
# SECURE - Flask enables auto-escaping for .html templates
from flask import Flask, render_template, request
app = Flask(__name__)
@app.route('/comment')
def show_comment():
comment = request.args.get('text', '')
return render_template('comment.html', comment=comment)
# Template (comment.html):
# <div class="comment">
# {{ comment }} <!-- Jinja2 auto-escapes -->
# </div>
Why this works: Flask uses the Jinja2 templating engine, which automatically escapes HTML in templates with .html, .htm, .xml, and .xhtml extensions. When you use {{ comment }} in a template, Jinja2 applies HTML entity encoding before rendering, converting special characters to entities. This auto-escaping is context-aware and happens at render time, ensuring even if malicious content like <script>alert(1)</script> is in the variable, it displays as plain text instead of executing. Flask enables this by default for HTML templates but not for plain text templates (.txt) where escaping would be inappropriate. Developers must explicitly use the |safe filter or Markup() to bypass escaping, making it obvious when raw HTML is being rendered. This matches Django's philosophy of secure-by-default design.
Explicit Escaping with MarkupSafe
# SECURE - Explicit HTML escaping
from markupsafe import escape
def build_html(user_input):
escaped = escape(user_input)
return f'<div>{escaped}</div>'
# Example:
# user_input = '<script>alert("xss")</script>'
# result = '<div><script>alert("xss")</script></div>'
Why this works: MarkupSafe is the library that powers Jinja2's auto-escaping, providing the escape() function for manual HTML encoding. When you call escape(user_input), it converts HTML special characters (<, >, &, ", ') to their entity equivalents, preventing browsers from interpreting the content as markup. The library returns a Markup object that Jinja2 recognizes as already-safe, preventing double-escaping if the value is later used in a template. This explicit escaping is useful when you need to build HTML strings in Python code rather than templates, or when working with Flask's make_response() or other response builders. Using MarkupSafe ensures consistency with template auto-escaping - you're using the same battle-tested encoding logic that Flask and Django rely on. The library handles edge cases like Unicode characters correctly.
Context-Specific Encoding
HTML Context
# SECURE - HTML body content
from django.utils.html import escape
def display_message(request):
msg = request.GET.get('msg', '')
safe_msg = escape(msg)
return HttpResponse(f'<p>{safe_msg}</p>')
Why this works: Django's escape() function provides HTML entity encoding for use in Python code when you're not using templates. It converts special characters to HTML entities, ensuring user input is displayed as text rather than interpreted as markup. This is essential when building HTTP responses directly with HttpResponse() instead of using the template system. The function handles all dangerous characters and edge cases, providing the same security guarantees as template auto-escaping but for manual HTML construction.
JavaScript Context
# SECURE - JavaScript string context
from django.utils.html import escapejs
from django.shortcuts import render
def search_view(request):
query = request.GET.get('q', '')
safe_query = escapejs(query)
return render(request, 'search.html', {'safe_query': safe_query})
# Template:
# <script>
# var searchTerm = '{{ safe_query|escapejs }}';
# console.log(searchTerm);
# </script>
Why this works: JavaScript context requires different encoding than HTML. Django's escapejs() function escapes characters that could break out of JavaScript string literals, such as quotes, backslashes, and script-closing tags. HTML encoding alone is insufficient - an attacker could inject </script> which browsers parse as a script boundary even when HTML-encoded. The escapejs filter handles quotes, backslashes, newlines, and other control characters that are special in JavaScript, using backslash escaping. This prevents injection attacks where user input could close the JavaScript string and execute arbitrary code. Always use JavaScript-specific encoding when embedding data in <script> tags or event handlers.
URL Context
# SECURE - URL encoding
from urllib.parse import quote
def build_search_url(query):
encoded_query = quote(query)
return f'/search?q={encoded_query}'
# Django template filter:
# <a href="/search?q={{ query|urlencode }}">Search</a>
Why this works: URL encoding (percent-encoding) is necessary when embedding user data in URLs to prevent injection of additional query parameters or URL manipulation. Python's quote() function from urllib.parse percent-encodes special characters like &, =, ?, spaces, and other URL metacharacters, turning them into %XX format where XX is the hexadecimal value. This ensures user input is treated as data, not URL syntax. For example, &admin=true becomes %26admin%3Dtrue, preventing it from being interpreted as an additional query parameter. Django's urlencode template filter provides the same functionality in templates. Always use URL encoding for query parameters and path components that contain user data.
JSON Responses
# SECURE - JSON is automatically escaped
from django.http import JsonResponse
from flask import jsonify
# Django:
def api_user(request, user_id):
user = User.objects.get(id=user_id)
return JsonResponse({
'name': user.name, # Automatically JSON-escaped
'bio': user.bio
})
# Flask:
@app.route('/api/user/<int:user_id>')
def api_user(user_id):
user = get_user(user_id)
return jsonify({
'name': user.name,
'bio': user.bio
})
Why this works: Both JsonResponse() in Django and jsonify() in Flask automatically serialize Python objects to JSON with proper escaping and set the Content-Type: application/json header. JSON encoding handles special characters according to JSON specification, escaping quotes, backslashes, and control characters. The critical security feature is the content type header - it tells browsers to treat the response as JSON data, not HTML, preventing browsers from parsing and executing any embedded script tags. Even if user input contains <script>alert(1)</script>, it's JSON-encoded to a string literal and the browser never interprets it as HTML because of the content type. This makes REST APIs secure by default without manual HTML encoding, as long as the proper content type is set.
html.escape() - Standard Library HTML Encoding
# SECURE - Built-in HTML entity encoding (Python 3.2+)
import html
user_input = request.args.get('name')
safe_output = html.escape(user_input)
print(f"<h1>Hello, {safe_output}</h1>")
# Optionally escape quotes for attribute context
safe_attr = html.escape(user_input, quote=True) # Escapes " and '
print(f'<input value="{safe_attr}">')
# Encodes: & → & < → < > → > " → " ' → '
Why this works: html.escape() is Python's built-in function for HTML encoding, available in the standard library since Python 3.2. It converts HTML special characters into their entity equivalents, preventing browsers from interpreting user input as markup. The quote=True parameter is essential when outputting to HTML attributes, as it encodes both double and single quotes, preventing attribute injection attacks. Being part of the standard library means no external dependencies are needed, making it the first choice for HTML encoding in Python applications. It's simple, fast, and maintained as part of Python itself.
markupsafe.escape() / flask.escape() - Framework Escaping
# SECURE - MarkupSafe library (used by Flask/Jinja2)
from markupsafe import escape
user_comment = request.form.get('comment')
safe_comment = escape(user_comment)
html_output = f"<p>{safe_comment}</p>"
# Flask convenience import (same as markupsafe.escape)
from flask import escape
safe_text = escape(user_input)
Why this works: markupsafe.escape() is the escaping engine underlying Flask and Jinja2 templating, providing the same HTML encoding as html.escape() but returning a Markup object that tracks whether content is already safe. This prevents double-escaping when combining escaped strings with template rendering. Flask's escape() import is a convenience wrapper around markupsafe.escape(), making it readily available in Flask applications without additional imports. The library is battle-tested across millions of Flask and Jinja2 applications, and its Markup type system enables safe composition of HTML fragments in template contexts.
Template Auto-Escaping (Django/Flask/Jinja2)
# SECURE - Django templates auto-escape by default
from django.shortcuts import render
def view(request):
return render(request, 'template.html', {
'user_input': request.GET.get('data') # Auto-escaped in template
})
# Template: {{ user_input }} <!-- Automatically HTML-escaped -->
# SECURE - Flask/Jinja2 auto-escapes .html templates
from flask import render_template
@app.route('/page')
def page():
return render_template('page.html',
data=request.args.get('input') # Auto-escaped
)
Why this works: Both Django and Flask (via Jinja2) use auto-escaping templates that treat all variable output as plain text by default. When you render {{ user_input }} in a template, the framework automatically HTML-encodes it before sending to the browser. This secure-by-default design makes XSS much harder to introduce accidentally - developers must explicitly opt out of escaping (using |safe in Django or {% autoescape false %} in Jinja2) to render raw HTML. The template engines parse your template, identify all variable interpolations, and apply context-appropriate encoding. This separation of logic and presentation, combined with automatic encoding, is why modern Python web frameworks are resistant to XSS when used correctly.
json.dumps() - JSON/JavaScript Context Encoding
# SECURE - Safely embed data in JavaScript
import json
user_data = request.args.get('search')
safe_json = json.dumps(user_data) # Properly escapes for JavaScript
html = f"""
<script>
var searchTerm = {safe_json}; // Safe: quotes/brackets escaped
console.log(searchTerm);
</script>
"""
# Never concatenate into JavaScript strings:
# var x = '{user_data}'; // Can break out with '
# var x = {json.dumps(user_data)}; // Safe
Why this works: JavaScript context requires different encoding than HTML. json.dumps() creates valid JSON which properly escapes quotes, backslashes, and control characters, making it safe to embed in JavaScript code. HTML encoding alone is insufficient - an attacker could inject </script> which browsers parse as a script boundary even when HTML-encoded. When you use json.dumps(), the value is encoded according to JSON standards: quotes become \", backslashes become \\, and special characters are escaped. This ensures the value remains a valid JavaScript literal and cannot break out of the script context. The approach works for strings, numbers, arrays, and objects, providing type-safe data transfer to JavaScript.
flask.jsonify() - JSON API Responses
# SECURE - Flask JSON responses (sets proper Content-Type)
from flask import jsonify, request
@app.route('/api/user')
def get_user():
user_input = request.args.get('query')
return jsonify({
'query': user_input, # Automatically JSON-encoded
'results': get_results(user_input)
})
# Content-Type: application/json (prevents HTML interpretation)
# Can also use jsonify() directly for single values
@app.route('/api/echo')
def echo():
return jsonify(request.args.get('text'))
Why this works: Flask's jsonify() function does two critical things for security: (1) it serializes Python objects to JSON using proper encoding (via json.dumps()), and (2) it sets the Content-Type: application/json header. The JSON encoding ensures special characters are properly escaped. The content type header prevents browsers from interpreting the response as HTML, blocking content-type confusion attacks where an attacker tricks the browser into rendering JSON as an HTML page. This combination makes REST APIs secure by default - even if user input contains <script> tags, they're JSON-encoded and the browser never parses them as HTML because of the content type.
bleach.clean() - Rich HTML Sanitization (When HTML Input Needed)
# SECURE - Allow safe HTML tags while removing XSS
import bleach
user_html = request.form.get('content')
clean_html = bleach.clean(
user_html,
tags=['p', 'b', 'i', 'a', 'ul', 'ol', 'li', 'strong', 'em'],
attributes={'a': ['href', 'title']},
protocols=['http', 'https', 'mailto']
)
# Removes dangerous tags/attributes:
# <script>, onclick=, javascript:, etc.
# For Markdown-to-HTML, use bleach after conversion:
import markdown
html_content = markdown.markdown(user_markdown)
safe_content = bleach.clean(html_content, tags=bleach.ALLOWED_TAGS)
Why this works: When you need to allow rich HTML content (like from a WYSIWYG editor or Markdown converter), simple HTML encoding would destroy the formatting. Bleach provides a allowlist-based HTML sanitizer: it parses the user's HTML, removes dangerous elements (like <script>, <iframe>) and attributes (like onclick, onerror), and reconstructs clean HTML with only approved tags. The protocols parameter prevents javascript: URLs in links. The allowlist approach is more secure than denylisting because it blocks unknown attack vectors by default - if a new dangerous tag is discovered, it's already blocked unless you explicitly allowed it. Bleach is actively maintained and handles complex cases like CSS expression injection and nested tag attacks.
Framework-Specific Guidance
Django
# SECURE - Default behavior is safe
from django.shortcuts import render
from django.utils.html import escape, format_html
def comment_view(request):
author = request.GET.get('author', '')
text = request.GET.get('text', '')
# Template auto-escapes these
return render(request, 'comment.html', {
'author': author,
'text': text
})
# Template (comment.html):
# <div class="comment">
# <strong>{{ author }}</strong>: {{ text }}
# </div>
# For building HTML in Python code:
from django.utils.html import format_html
def build_message(username, msg):
return format_html(
'<div class="msg"><b>{}</b>: {}</div>',
username,
msg
) # format_html auto-escapes arguments
Django Settings:
# settings.py - Ensure templates auto-escape
TEMPLATES = [{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'OPTIONS': {
'autoescape': True, # Default, don't disable!
},
}]
Flask / Jinja2
# SECURE - Jinja2 auto-escapes .html templates
from flask import Flask, render_template, request
from markupsafe import escape
app = Flask(__name__)
@app.route('/profile/<username>')
def profile(username):
bio = request.args.get('bio', '')
# Auto-escaped in template
return render_template('profile.html',
username=username,
bio=bio)
# profile.html:
# <h1>{{ username }}'s Profile</h1>
# <p>{{ bio }}</p>
# For manual HTML building:
@app.route('/message')
def message():
text = request.args.get('text', '')
escaped_text = escape(text)
return f'<div>{escaped_text}</div>'
FastAPI
# SECURE - Jinja2 templates with FastAPI
from fastapi import FastAPI, Request
from fastapi.templating import Jinja2Templates
app = FastAPI()
templates = Jinja2Templates(directory="templates")
@app.get("/profile/{user_id}")
async def profile(request: Request, user_id: int, bio: str = ""):
return templates.TemplateResponse("profile.html", {
"request": request,
"user_id": user_id,
"bio": bio # Auto-escaped
})
# JSON responses are automatically safe:
@app.get("/api/user/{user_id}")
async def get_user(user_id: int):
return {"name": "John", "bio": "<script>alert('xss')</script>"}
# FastAPI serializes to JSON, which escapes special chars
Rich HTML Sanitization
When you need to allow safe HTML (e.g., WYSIWYG editor):
# Use bleach library for HTML sanitization
import bleach
ALLOWED_TAGS = ['p', 'br', 'strong', 'em', 'ul', 'ol', 'li', 'a']
ALLOWED_ATTRIBUTES = {'a': ['href', 'title']}
def sanitize_html(dirty_html):
clean = bleach.clean(
dirty_html,
tags=ALLOWED_TAGS,
attributes=ALLOWED_ATTRIBUTES,
strip=True
)
return clean
# Django view:
from django.utils.safestring import mark_safe
def save_article(request):
content = request.POST.get('content', '')
sanitized = sanitize_html(content)
article = Article.objects.create(
title=request.POST.get('title'),
content=sanitized
)
return redirect('article_detail', pk=article.pk)
# Template (only mark_safe AFTER sanitization):
# <div class="article-content">
# {{ article.content|safe }}
# </div>
Installation:
Input Validation (Defense in Depth)
# Django forms with validation
from django import forms
class CommentForm(forms.Form):
author = forms.CharField(
max_length=100,
required=True,
validators=[
RegexValidator(
regex=r'^[a-zA-Z0-9\s]+$',
message='Only alphanumeric characters allowed'
)
]
)
text = forms.CharField(
max_length=1000,
widget=forms.Textarea,
validators=[
lambda value: '<script>' not in value.lower()
]
)
# View:
def post_comment(request):
form = CommentForm(request.POST)
if form.is_valid():
# Even with validation, template still auto-escapes
comment = form.cleaned_data['text']
Comment.objects.create(text=comment)
return redirect('comments')
Content Security Policy
# Django middleware for CSP
from django.utils.deprecation import MiddlewareMixin
class SecurityHeadersMiddleware(MiddlewareMixin):
def process_response(self, request, response):
response['Content-Security-Policy'] = (
"default-src 'self'; "
"script-src 'self' https://trusted-cdn.com; "
"style-src 'self' 'unsafe-inline'; "
"img-src 'self' data: https:; "
"frame-ancestors 'none';"
)
response['X-Content-Type-Options'] = 'nosniff'
response['X-Frame-Options'] = 'DENY'
response['X-XSS-Protection'] = '1; mode=block'
return response
# settings.py
MIDDLEWARE = [
'myapp.middleware.SecurityHeadersMiddleware',
# ... other middleware
]
# Flask:
from flask import Flask
app = Flask(__name__)
@app.after_request
def set_security_headers(response):
response.headers['Content-Security-Policy'] = (
"default-src 'self'; "
"script-src 'self'"
)
response.headers['X-Content-Type-Options'] = 'nosniff'
return response
Verification and Detection
Security testing requires multiple approaches - unit tests alone are insufficient.
Static Application Security Testing (SAST)
Use automated tools to detect XSS in Python code:
Commercial Tools:
- Checkmarx - Python web framework security scanning
- Fortify - Data flow analysis for Django/Flask
- Snyk Code - Real-time Python security scanning
- Veracode - Python application security testing
Open Source Tools: - Bandit - Python security linter
-
Semgrep - Pattern-based security scanning
-
Pylint with security plugins
-
SonarQube - Continuous security analysis
Dynamic Application Security Testing (DAST)
Test running Python applications:
- OWASP ZAP - Automated web vulnerability scanner
- Burp Suite Professional - Comprehensive testing
- Arachni - Web application security scanner
- w3af - Python-based web attack framework
Framework-Specific Tools
Django:
# Django security check
python manage.py check --deploy
# Check for security issues
python manage.py check --tag security
# Bandit with Django configuration
bandit -r . -ll -i -x */tests/*,*/migrations/*
Flask:
# Flask-Talisman for security headers
pip install flask-talisman
# Safety check for vulnerable dependencies
pip install safety
safety check --json
Limited Role of Unit Tests
Unit tests can verify encoding functions work but not comprehensive security:
import html
from markupsafe import escape
# Tests verify encoding - NOT comprehensive security
def test_html_escape():
malicious = '<script>alert("xss")</script>'
safe = html.escape(malicious)
assert '<script>' not in safe
assert '<script>' in safe
def test_markupsafe_escape():
payload = '<img src=x onerror=alert(1)>'
safe = escape(payload)
assert '<img' not in safe
assert '<img' in safe
Important: Passing these tests does NOT mean your application is secure. Use SAST/DAST tools to find actual vulnerabilities.
Integration Testing
from django.test import TestCase, Client
class XssIntegrationTest(TestCase):
def test_xss_payload_encoded(self):
"""Verify XSS payloads are encoded in responses"""
client = Client()
xss_payload = '<script>alert("XSS")</script>'
response = client.get('/profile/', {'bio': xss_payload})
# Should not contain unencoded script tag
self.assertNotContains(response, '<script>')
# Should contain encoded version
self.assertContains(response, '<script>')
# pytest with Flask
def test_flask_xss_protection(client):
payload = '<script>alert(1)</script>'
response = client.get(f'/comment?text={payload}')
assert b'<script>' not in response.data
assert b'<script>' in response.data
Security Headers Testing
# Check security headers with curl
curl -I http://localhost:5000/ | grep -i "content-security-policy\|x-frame-options\|x-content-type"
# Use Django check command
python manage.py check --deploy
Continuous Security
-
CI/CD Integration - Run Bandit and Semgrep in GitHub Actions/GitLab CI
-
Pre-commit Hooks - Scan code before commits
-
Dependency Scanning - Check for vulnerable packages
- Security Champions - Train developers on secure coding
- Penetration Testing - Regular professional assessments
Security Checklist
Manually verify:
- Django templates use
{{ }}not{{ | safe }}for user data - Flask/Jinja2 templates use
{{ }}not{{ | safe }}or{% autoescape false %} - No
mark_safe()orMarkup()on user input - All
render()calls use template auto-escaping - JSON responses use
jsonify()orjson.dumps() - HTML manual construction uses
html.escape()ormarkupsafe.escape() - All input sources identified (request.args, request.form, request.cookies, headers)