CWE-80: Cross-Site Scripting (XSS) - Python
Overview
XSS occurs when untrusted data is included in web output without proper encoding. Python web frameworks like Django and Flask provide built-in protection, but you must use them correctly.
Primary Defence: Use framework auto-escaping (Django templates, Flask/Jinja2), or explicit encoding with html.escape() for manual HTML construction.
Common Vulnerable Patterns
Django mark_safe() Misuse
# VULNERABLE - Marking user input as safe
from django.utils.safestring import mark_safe
def profile_view(request):
user_bio = request.GET.get('bio', '')
safe_bio = mark_safe(user_bio) # DANGEROUS!
return render(request, 'profile.html', {'bio': safe_bio})
Flask Without Auto-Escaping
# VULNERABLE - Disabling auto-escape
from flask import Flask, request
from markupsafe import Markup
app = Flask(__name__)
@app.route('/comment')
def show_comment():
comment = request.args.get('text', '')
return f'<div>{Markup(comment)}</div>' # DANGEROUS!
Manual HTML Construction
# VULNERABLE - String concatenation
from flask import Flask, request
@app.route('/greeting')
def greet():
name = request.args.get('name', 'Guest')
html = '<h1>Hello, ' + name + '</h1>'
return html # No escaping!
JavaScript Context Without Escaping
# VULNERABLE - Injecting into JavaScript
def search_view(request):
query = request.GET.get('q', '')
return render(request, 'search.html', {'query': query})
# Template:
# <script>
# var searchTerm = '{{ query }}'; // Can break out with '
# </script>
Secure Patterns
Django Auto-Escaping (Default)
# SECURE - Django templates auto-escape by default
from django.shortcuts import render
def profile_view(request):
user_bio = request.GET.get('bio', '')
# Django automatically HTML-escapes user_bio in template
return render(request, 'profile.html', {'bio': user_bio})
# Template (profile.html):
# <div class="bio">
# {{ bio }} <!-- Automatically escaped -->
# </div>
Why this works: Django's template engine automatically HTML-encodes variable output using the {{ }} syntax by default. When you render {{ bio }} in a template, Django converts dangerous HTML characters (<, >, &, ", ') into their entity equivalents before sending the response. This makes normal HTML body and quoted attribute output safer by default. Developers must explicitly use the |safe filter or mark_safe() function to bypass escaping, making dangerous usages easy to spot in code reviews. Use context-specific helpers such as escapejs, json_script, and urlencode for JavaScript, JSON-in-HTML, and URL contexts.
Flask/Jinja2 Auto-Escaping
# SECURE - Flask enables auto-escaping for .html templates
from flask import Flask, render_template, request
app = Flask(__name__)
@app.route('/comment')
def show_comment():
comment = request.args.get('text', '')
return render_template('comment.html', comment=comment)
# Template (comment.html):
# <div class="comment">
# {{ comment }} <!-- Jinja2 auto-escapes -->
# </div>
Why this works: Flask configures Jinja to automatically escape templates ending in .html, .htm, .xml, .xhtml, and .svg when using render_template(). When you use {{ comment }} in those templates, Jinja applies HTML entity encoding before rendering. Flask also enables escaping for strings rendered with render_template_string(). Developers must explicitly use the |safe filter, Markup(), or {% autoescape false %} to bypass escaping, and those escape hatches should only be used for trusted or sanitized HTML. For standalone Jinja environments, configure auto-escaping explicitly instead of assuming it is enabled by default.
Explicit Escaping with MarkupSafe
# SECURE - Explicit HTML escaping
from markupsafe import escape
def build_html(user_input):
escaped = escape(user_input)
return f'<div>{escaped}</div>'
# Example:
# user_input = '<script>alert("xss")</script>'
# result = '<div><script>alert("xss")</script></div>'
Why this works: MarkupSafe is the library that powers Jinja2's auto-escaping, providing the escape() function for manual HTML encoding. When you call escape(user_input), it converts HTML special characters (<, >, &, ", ') to their entity equivalents, preventing browsers from interpreting the content as markup. The library returns a Markup object that Jinja2 recognizes as already-safe, preventing double-escaping if the value is later used in a template. This explicit escaping is useful when you need to build HTML strings in Python code rather than templates, or when working with Flask's make_response() or other response builders. Using MarkupSafe ensures consistency with template auto-escaping - you're using the same battle-tested encoding logic that Flask and Django rely on. The library handles edge cases like Unicode characters correctly.
Context-Specific Encoding
HTML Context
# SECURE - HTML body content
from django.utils.html import escape
def display_message(request):
msg = request.GET.get('msg', '')
safe_msg = escape(msg)
return HttpResponse(f'<p>{safe_msg}</p>')
Why this works: Django's escape() function provides HTML entity encoding for use in Python code when you're not using templates. It converts special characters to HTML entities, ensuring user input is displayed as text rather than interpreted as markup. This is essential when building HTTP responses directly with HttpResponse() instead of using the template system. The function handles all dangerous characters and edge cases, providing the same security guarantees as template auto-escaping but for manual HTML construction.
JavaScript Context
# SECURE - JavaScript string context
from django.utils.html import escapejs
from django.shortcuts import render
def search_view(request):
query = request.GET.get('q', '')
safe_query = escapejs(query)
return render(request, 'search.html', {'safe_query': safe_query})
# Template:
# <script>
# var searchTerm = '{{ safe_query|escapejs }}';
# console.log(searchTerm);
# </script>
Why this works: JavaScript context requires different encoding than HTML. Django's escapejs() function escapes characters that could break out of JavaScript string literals, such as quotes, backslashes, and script-closing tags. HTML encoding alone is insufficient - an attacker could inject </script> which browsers parse as a script boundary even when HTML-encoded. The escapejs filter handles quotes, backslashes, newlines, and other control characters that are special in JavaScript, using backslash escaping. This prevents injection attacks where user input could close the JavaScript string and execute arbitrary code. Always use JavaScript-specific encoding when embedding data in <script> tags or event handlers.
URL Context
# SECURE - URL encoding
from urllib.parse import quote
def build_search_url(query):
encoded_query = quote(query)
return f'/search?q={encoded_query}'
# Django template filter:
# <a href="/search?q={{ query|urlencode }}">Search</a>
Why this works: URL encoding (percent-encoding) is necessary when embedding user data in URLs to prevent injection of additional query parameters or URL manipulation. Python's quote() function from urllib.parse percent-encodes special characters like &, =, ?, spaces, and other URL metacharacters, turning them into %XX format where XX is the hexadecimal value. This ensures user input is treated as data, not URL syntax. For example, &admin=true becomes %26admin%3Dtrue, preventing it from being interpreted as an additional query parameter. Django's urlencode template filter provides the same functionality in templates. Always use URL encoding for query parameters and path components that contain user data.
JSON Responses
# SECURE - JSON is automatically escaped
from django.http import JsonResponse
from flask import jsonify
# Django:
def api_user(request, user_id):
user = User.objects.get(id=user_id)
return JsonResponse({
'name': user.name, # Automatically JSON-escaped
'bio': user.bio
})
# Flask:
@app.route('/api/user/<int:user_id>')
def api_user(user_id):
user = get_user(user_id)
return jsonify({
'name': user.name,
'bio': user.bio
})
Why this works: Both JsonResponse() in Django and jsonify() in Flask automatically serialize Python objects to JSON with proper escaping and set the Content-Type: application/json header. JSON encoding handles special characters according to JSON specification, escaping quotes, backslashes, and control characters. The critical security feature is the content type header - it tells browsers to treat the response as JSON data, not HTML, preventing browsers from parsing and executing any embedded script tags. Even if user input contains <script>alert(1)</script>, it's JSON-encoded to a string literal and the browser never interprets it as HTML because of the content type. This makes REST APIs secure by default without manual HTML encoding, as long as the proper content type is set.
html.escape() - Standard Library HTML Encoding
# SECURE - Built-in HTML entity encoding (Python 3.2+)
import html
user_input = request.args.get('name')
safe_output = html.escape(user_input)
print(f"<h1>Hello, {safe_output}</h1>")
# Optionally escape quotes for attribute context
safe_attr = html.escape(user_input, quote=True) # Escapes " and '
print(f'<input value="{safe_attr}">')
# Encodes: & → & < → < > → > " → " ' → '
Why this works: html.escape() is Python's built-in function for HTML encoding, available in the standard library since Python 3.2. It converts HTML special characters into their entity equivalents, preventing browsers from interpreting user input as markup. The quote=True parameter is essential when outputting to HTML attributes, as it encodes both double and single quotes, preventing attribute injection attacks. Being part of the standard library means no external dependencies are needed, making it the first choice for HTML encoding in Python applications. It's simple, fast, and maintained as part of Python itself.
markupsafe.escape() / flask.escape() - Framework Escaping
# SECURE - MarkupSafe library (used by Flask/Jinja2)
from markupsafe import escape
user_comment = request.form.get('comment')
safe_comment = escape(user_comment)
html_output = f"<p>{safe_comment}</p>"
# In Flask/Jinja applications, import from MarkupSafe directly
from markupsafe import escape
safe_text = escape(user_input)
Why this works: markupsafe.escape() is the escaping engine underlying Flask and Jinja templating, providing the same kind of HTML encoding as html.escape() but returning a Markup object that tracks whether content is already safe. This prevents double-escaping when combining escaped strings with template rendering. Import it from markupsafe directly so the example works with current Flask versions. The Markup type system enables safer composition of HTML fragments in template contexts.
Template Auto-Escaping (Django/Flask/Jinja2)
# SECURE - Django templates auto-escape by default
from django.shortcuts import render
def view(request):
return render(request, 'template.html', {
'user_input': request.GET.get('data') # Auto-escaped in template
})
# Template: {{ user_input }} <!-- Automatically HTML-escaped -->
# SECURE - Flask/Jinja2 auto-escapes .html templates
from flask import render_template
@app.route('/page')
def page():
return render_template('page.html',
data=request.args.get('input') # Auto-escaped
)
Why this works: Django templates and Flask's default Jinja integration treat normal {{ user_input }} output as text in HTML templates. The framework HTML-encodes variable output before sending it to the browser, while dangerous raw-output paths require explicit opt-out with |safe, mark_safe(), Markup(), or {% autoescape false %}. This separation of logic and presentation reduces accidental XSS in HTML body and attribute contexts. JavaScript, CSS, URL, and JSON-in-HTML contexts still require context-specific helpers.
JSON Data in HTML Templates
# SECURE - Django: store JSON as inert application/json data
from django.shortcuts import render
def search_page(request):
return render(request, 'search.html', {
'search_data': {'query': request.GET.get('search', '')}
})
# Template:
# {{ search_data|json_script:"search-data" }}
# <script src="/static/search.js"></script>
# JavaScript:
# const data = JSON.parse(document.getElementById('search-data').textContent);
# SECURE - Flask/Jinja: use tojson in a script block
# <script>
# const searchData = {{ search_data|tojson }};
# </script>
# Never concatenate into JavaScript strings:
# var x = '{user_data}'; // Can break out with '
# var x = {json.dumps(user_data)}; // Not enough for inline HTML by itself
Why this works: Plain json.dumps() creates valid JSON, but it does not by itself make the result safe to paste into an inline <script> block because sequences such as </script> can still terminate the element. Django's json_script escapes <, >, and & and stores the data in an inert application/json script tag that application JavaScript reads with textContent. Jinja's tojson filter serializes and marks JSON safe for HTML documents and <script> tags. Prefer these framework helpers over hand-built inline JavaScript.
flask.jsonify() - JSON API Responses
# SECURE - Flask JSON responses (sets proper Content-Type)
from flask import jsonify, request
@app.route('/api/user')
def get_user():
user_input = request.args.get('query')
return jsonify({
'query': user_input, # Automatically JSON-encoded
'results': get_results(user_input)
})
# Content-Type: application/json (prevents HTML interpretation)
# Can also use jsonify() directly for single values
@app.route('/api/echo')
def echo():
return jsonify(request.args.get('text'))
Why this works: Flask's jsonify() function does two critical things for security: (1) it serializes Python objects to JSON using proper encoding (via json.dumps()), and (2) it sets the Content-Type: application/json header. The JSON encoding ensures special characters are properly escaped. The content type header prevents browsers from interpreting the response as HTML, blocking content-type confusion attacks where an attacker tricks the browser into rendering JSON as an HTML page. This combination makes REST APIs secure by default - even if user input contains <script> tags, they're JSON-encoded and the browser never parses them as HTML because of the content type.
bleach.clean() - Rich HTML Sanitization (When HTML Input Needed)
# SECURE - Allow safe HTML tags while removing XSS
import bleach
user_html = request.form.get('content')
clean_html = bleach.clean(
user_html,
tags=['p', 'b', 'i', 'a', 'ul', 'ol', 'li', 'strong', 'em'],
attributes={'a': ['href', 'title']},
protocols=['http', 'https', 'mailto']
)
# Removes dangerous tags/attributes:
# <script>, onclick=, javascript:, etc.
# For Markdown-to-HTML, use bleach after conversion:
import markdown
html_content = markdown.markdown(user_markdown)
safe_content = bleach.clean(html_content, tags=bleach.ALLOWED_TAGS)
Why this works: When you need to allow rich HTML content (like from a WYSIWYG editor or Markdown converter), simple HTML encoding would destroy the formatting. Bleach provides a allowlist-based HTML sanitizer: it parses the user's HTML, removes dangerous elements (like <script>, <iframe>) and attributes (like onclick, onerror), and reconstructs clean HTML with only approved tags. The protocols parameter prevents javascript: URLs in links. The allowlist approach is more secure than denylisting because it blocks unknown attack vectors by default - if a new dangerous tag is discovered, it's already blocked unless you explicitly allowed it. Bleach is actively maintained and handles complex cases like CSS expression injection and nested tag attacks.
Framework-Specific Guidance
Django
# SECURE - Default behavior is safe
from django.shortcuts import render
from django.utils.html import escape, format_html
def comment_view(request):
author = request.GET.get('author', '')
text = request.GET.get('text', '')
# Template auto-escapes these
return render(request, 'comment.html', {
'author': author,
'text': text
})
# Template (comment.html):
# <div class="comment">
# <strong>{{ author }}</strong>: {{ text }}
# </div>
# For building HTML in Python code:
from django.utils.html import format_html
def build_message(username, msg):
return format_html(
'<div class="msg"><b>{}</b>: {}</div>',
username,
msg
) # format_html auto-escapes arguments
Django Settings:
# settings.py - Ensure templates auto-escape
TEMPLATES = [{
'BACKEND': 'django.template.backends.django.DjangoTemplates',
'OPTIONS': {
'autoescape': True, # Default, don't disable!
},
}]
Flask / Jinja2
# SECURE - Jinja2 auto-escapes .html templates
from flask import Flask, render_template, request
from markupsafe import escape
app = Flask(__name__)
@app.route('/profile/<username>')
def profile(username):
bio = request.args.get('bio', '')
# Auto-escaped in template
return render_template('profile.html',
username=username,
bio=bio)
# profile.html:
# <h1>{{ username }}'s Profile</h1>
# <p>{{ bio }}</p>
# For manual HTML building:
@app.route('/message')
def message():
text = request.args.get('text', '')
escaped_text = escape(text)
return f'<div>{escaped_text}</div>'
FastAPI
# SECURE - Jinja2 templates with FastAPI
from fastapi import FastAPI, Request
from fastapi.templating import Jinja2Templates
app = FastAPI()
templates = Jinja2Templates(directory="templates")
@app.get("/profile/{user_id}")
async def profile(request: Request, user_id: int, bio: str = ""):
return templates.TemplateResponse("profile.html", {
"request": request,
"user_id": user_id,
"bio": bio # Auto-escaped
})
# JSON responses are automatically safe:
@app.get("/api/user/{user_id}")
async def get_user(user_id: int):
return {"name": "John", "bio": "<script>alert('xss')</script>"}
# FastAPI serializes to JSON, which escapes special chars
Rich HTML Sanitization
When you need to allow safe HTML (e.g., WYSIWYG editor):
# Use bleach library for HTML sanitization
import bleach
ALLOWED_TAGS = ['p', 'br', 'strong', 'em', 'ul', 'ol', 'li', 'a']
ALLOWED_ATTRIBUTES = {'a': ['href', 'title']}
def sanitize_html(dirty_html):
clean = bleach.clean(
dirty_html,
tags=ALLOWED_TAGS,
attributes=ALLOWED_ATTRIBUTES,
strip=True
)
return clean
# Django view:
from django.utils.safestring import mark_safe
def save_article(request):
content = request.POST.get('content', '')
sanitized = sanitize_html(content)
article = Article.objects.create(
title=request.POST.get('title'),
content=sanitized
)
return redirect('article_detail', pk=article.pk)
# Template (only mark_safe AFTER sanitization):
# <div class="article-content">
# {{ article.content|safe }}
# </div>
Installation:
Input Validation (Defense in Depth)
# Django forms with validation
from django import forms
class CommentForm(forms.Form):
author = forms.CharField(
max_length=100,
required=True,
validators=[
RegexValidator(
regex=r'^[a-zA-Z0-9\s]+$',
message='Only alphanumeric characters allowed'
)
]
)
text = forms.CharField(
max_length=1000,
widget=forms.Textarea,
validators=[
lambda value: '<script>' not in value.lower()
]
)
# View:
def post_comment(request):
form = CommentForm(request.POST)
if form.is_valid():
# Even with validation, template still auto-escapes
comment = form.cleaned_data['text']
Comment.objects.create(text=comment)
return redirect('comments')
Content Security Policy
# Django middleware for CSP
from django.utils.deprecation import MiddlewareMixin
class SecurityHeadersMiddleware(MiddlewareMixin):
def process_response(self, request, response):
response['Content-Security-Policy'] = (
"default-src 'self'; "
"script-src 'self' https://trusted-cdn.com; "
"style-src 'self' 'unsafe-inline'; "
"img-src 'self' data: https:; "
"frame-ancestors 'none';"
)
response['X-Content-Type-Options'] = 'nosniff'
response['X-Frame-Options'] = 'DENY'
response['X-XSS-Protection'] = '1; mode=block'
return response
# settings.py
MIDDLEWARE = [
'myapp.middleware.SecurityHeadersMiddleware',
# ... other middleware
]
# Flask:
from flask import Flask
app = Flask(__name__)
@app.after_request
def set_security_headers(response):
response.headers['Content-Security-Policy'] = (
"default-src 'self'; "
"script-src 'self'"
)
response.headers['X-Content-Type-Options'] = 'nosniff'
return response
Security Checklist
Manually verify:
- Django templates use
{{ }}not{{ | safe }}for user data - Flask/Jinja2 templates use
{{ }}not{{ | safe }}or{% autoescape false %} - No
mark_safe()orMarkup()on user input - All
render()calls use template auto-escaping - JSON API responses use
JsonResponseorjsonify()withContent-Type: application/json - JSON embedded in HTML uses Django
json_scriptor Jinjatojson, not rawjson.dumps()pasted into<script> - HTML manual construction uses
html.escape()ormarkupsafe.escape() - All input sources identified (request.args, request.form, request.cookies, headers)