CWE-80: Cross-Site Scripting (XSS) - Python

Overview

XSS occurs when untrusted data is included in web output without proper encoding. Python web frameworks like Django and Flask provide built-in protection, but you must use them correctly.

Primary Defence: Use framework auto-escaping (Django templates, Flask/Jinja2), or explicit encoding with html.escape() for manual HTML construction.

Common Vulnerable Patterns

Django mark_safe() Misuse

# VULNERABLE - Marking user input as safe
from django.utils.safestring import mark_safe

def profile_view(request):
    user_bio = request.GET.get('bio', '')
    safe_bio = mark_safe(user_bio)  # DANGEROUS!
    return render(request, 'profile.html', {'bio': safe_bio})

Flask Without Auto-Escaping

# VULNERABLE - Disabling auto-escape
from flask import Flask, request
from markupsafe import Markup

app = Flask(__name__)

@app.route('/comment')
def show_comment():
    comment = request.args.get('text', '')
    return f'<div>{Markup(comment)}</div>'  # DANGEROUS!

Manual HTML Construction

# VULNERABLE - String concatenation
from flask import Flask, request

@app.route('/greeting')
def greet():
    name = request.args.get('name', 'Guest')
    html = '<h1>Hello, ' + name + '</h1>'
    return html  # No escaping!

JavaScript Context Without Escaping

# VULNERABLE - Injecting into JavaScript

def search_view(request):
    query = request.GET.get('q', '')
    return render(request, 'search.html', {'query': query})

# Template:
# <script>
#     var searchTerm = '{{ query }}';  // Can break out with '
# </script>

Secure Patterns

Django Auto-Escaping (Default)

# SECURE - Django templates auto-escape by default

from django.shortcuts import render

def profile_view(request):
    user_bio = request.GET.get('bio', '')
    # Django automatically HTML-escapes user_bio in template
    return render(request, 'profile.html', {'bio': user_bio})

# Template (profile.html):
# <div class="bio">
#     {{ bio }}  <!-- Automatically escaped -->
# </div>

Why this works: Django's template engine automatically HTML-encodes variable output using the {{ }} syntax by default. When you render {{ bio }} in a template, Django converts dangerous HTML characters (<, >, &, ", ') into their entity equivalents before sending the response. This makes normal HTML body and quoted attribute output safer by default. Developers must explicitly use the |safe filter or mark_safe() function to bypass escaping, making dangerous usages easy to spot in code reviews. Use context-specific helpers such as escapejs, json_script, and urlencode for JavaScript, JSON-in-HTML, and URL contexts.

Flask/Jinja2 Auto-Escaping

# SECURE - Flask enables auto-escaping for .html templates

from flask import Flask, render_template, request

app = Flask(__name__)

@app.route('/comment')
def show_comment():
    comment = request.args.get('text', '')
    return render_template('comment.html', comment=comment)

# Template (comment.html):
# <div class="comment">
#     {{ comment }}  <!-- Jinja2 auto-escapes -->
# </div>

Why this works: Flask configures Jinja to automatically escape templates ending in .html, .htm, .xml, .xhtml, and .svg when using render_template(). When you use {{ comment }} in those templates, Jinja applies HTML entity encoding before rendering. Flask also enables escaping for strings rendered with render_template_string(). Developers must explicitly use the |safe filter, Markup(), or {% autoescape false %} to bypass escaping, and those escape hatches should only be used for trusted or sanitized HTML. For standalone Jinja environments, configure auto-escaping explicitly instead of assuming it is enabled by default.

Explicit Escaping with MarkupSafe

# SECURE - Explicit HTML escaping

from markupsafe import escape

def build_html(user_input):
    escaped = escape(user_input)
    return f'<div>{escaped}</div>'

# Example:
# user_input = '<script>alert("xss")</script>'
# result = '<div>&lt;script&gt;alert(&#34;xss&#34;)&lt;/script&gt;</div>'

Why this works: MarkupSafe is the library that powers Jinja2's auto-escaping, providing the escape() function for manual HTML encoding. When you call escape(user_input), it converts HTML special characters (<, >, &, ", ') to their entity equivalents, preventing browsers from interpreting the content as markup. The library returns a Markup object that Jinja2 recognizes as already-safe, preventing double-escaping if the value is later used in a template. This explicit escaping is useful when you need to build HTML strings in Python code rather than templates, or when working with Flask's make_response() or other response builders. Using MarkupSafe ensures consistency with template auto-escaping - you're using the same battle-tested encoding logic that Flask and Django rely on. The library handles edge cases like Unicode characters correctly.

Context-Specific Encoding

HTML Context

# SECURE - HTML body content

from django.utils.html import escape

def display_message(request):
    msg = request.GET.get('msg', '')
    safe_msg = escape(msg)
    return HttpResponse(f'<p>{safe_msg}</p>')

Why this works: Django's escape() function provides HTML entity encoding for use in Python code when you're not using templates. It converts special characters to HTML entities, ensuring user input is displayed as text rather than interpreted as markup. This is essential when building HTTP responses directly with HttpResponse() instead of using the template system. The function handles all dangerous characters and edge cases, providing the same security guarantees as template auto-escaping but for manual HTML construction.

JavaScript Context

# SECURE - JavaScript string context

from django.utils.html import escapejs
from django.shortcuts import render

def search_view(request):
    query = request.GET.get('q', '')
    safe_query = escapejs(query)
    return render(request, 'search.html', {'safe_query': safe_query})

# Template:
# <script>
#     var searchTerm = '{{ safe_query|escapejs }}';
#     console.log(searchTerm);
# </script>

Why this works: JavaScript context requires different encoding than HTML. Django's escapejs() function escapes characters that could break out of JavaScript string literals, such as quotes, backslashes, and script-closing tags. HTML encoding alone is insufficient - an attacker could inject </script> which browsers parse as a script boundary even when HTML-encoded. The escapejs filter handles quotes, backslashes, newlines, and other control characters that are special in JavaScript, using backslash escaping. This prevents injection attacks where user input could close the JavaScript string and execute arbitrary code. Always use JavaScript-specific encoding when embedding data in <script> tags or event handlers.

URL Context

# SECURE - URL encoding

from urllib.parse import quote

def build_search_url(query):
    encoded_query = quote(query)
    return f'/search?q={encoded_query}'

# Django template filter:
# <a href="/search?q={{ query|urlencode }}">Search</a>

Why this works: URL encoding (percent-encoding) is necessary when embedding user data in URLs to prevent injection of additional query parameters or URL manipulation. Python's quote() function from urllib.parse percent-encodes special characters like &, =, ?, spaces, and other URL metacharacters, turning them into %XX format where XX is the hexadecimal value. This ensures user input is treated as data, not URL syntax. For example, &admin=true becomes %26admin%3Dtrue, preventing it from being interpreted as an additional query parameter. Django's urlencode template filter provides the same functionality in templates. Always use URL encoding for query parameters and path components that contain user data.

JSON Responses

# SECURE - JSON is automatically escaped

from django.http import JsonResponse
from flask import jsonify

# Django:

def api_user(request, user_id):
    user = User.objects.get(id=user_id)
    return JsonResponse({
        'name': user.name,  # Automatically JSON-escaped
        'bio': user.bio
    })

# Flask:

@app.route('/api/user/<int:user_id>')
def api_user(user_id):
    user = get_user(user_id)
    return jsonify({
        'name': user.name,
        'bio': user.bio
    })

Why this works: Both JsonResponse() in Django and jsonify() in Flask automatically serialize Python objects to JSON with proper escaping and set the Content-Type: application/json header. JSON encoding handles special characters according to JSON specification, escaping quotes, backslashes, and control characters. The critical security feature is the content type header - it tells browsers to treat the response as JSON data, not HTML, preventing browsers from parsing and executing any embedded script tags. Even if user input contains <script>alert(1)</script>, it's JSON-encoded to a string literal and the browser never interprets it as HTML because of the content type. This makes REST APIs secure by default without manual HTML encoding, as long as the proper content type is set.

html.escape() - Standard Library HTML Encoding

# SECURE - Built-in HTML entity encoding (Python 3.2+)
import html

user_input = request.args.get('name')
safe_output = html.escape(user_input)
print(f"<h1>Hello, {safe_output}</h1>")

# Optionally escape quotes for attribute context
safe_attr = html.escape(user_input, quote=True)  # Escapes " and '
print(f'<input value="{safe_attr}">')

# Encodes: & → &amp;  < → &lt;  > → &gt;  " → &quot;  ' → &#x27;

Why this works: html.escape() is Python's built-in function for HTML encoding, available in the standard library since Python 3.2. It converts HTML special characters into their entity equivalents, preventing browsers from interpreting user input as markup. The quote=True parameter is essential when outputting to HTML attributes, as it encodes both double and single quotes, preventing attribute injection attacks. Being part of the standard library means no external dependencies are needed, making it the first choice for HTML encoding in Python applications. It's simple, fast, and maintained as part of Python itself.

markupsafe.escape() / flask.escape() - Framework Escaping

# SECURE - MarkupSafe library (used by Flask/Jinja2)
from markupsafe import escape

user_comment = request.form.get('comment')
safe_comment = escape(user_comment)
html_output = f"<p>{safe_comment}</p>"

# In Flask/Jinja applications, import from MarkupSafe directly
from markupsafe import escape
safe_text = escape(user_input)

Why this works: markupsafe.escape() is the escaping engine underlying Flask and Jinja templating, providing the same kind of HTML encoding as html.escape() but returning a Markup object that tracks whether content is already safe. This prevents double-escaping when combining escaped strings with template rendering. Import it from markupsafe directly so the example works with current Flask versions. The Markup type system enables safer composition of HTML fragments in template contexts.

Template Auto-Escaping (Django/Flask/Jinja2)

# SECURE - Django templates auto-escape by default
from django.shortcuts import render

def view(request):
    return render(request, 'template.html', {
        'user_input': request.GET.get('data')  # Auto-escaped in template
    })

# Template: {{ user_input }}  <!-- Automatically HTML-escaped -->

# SECURE - Flask/Jinja2 auto-escapes .html templates
from flask import render_template

@app.route('/page')
def page():
    return render_template('page.html', 
        data=request.args.get('input')  # Auto-escaped
    )

Why this works: Django templates and Flask's default Jinja integration treat normal {{ user_input }} output as text in HTML templates. The framework HTML-encodes variable output before sending it to the browser, while dangerous raw-output paths require explicit opt-out with |safe, mark_safe(), Markup(), or {% autoescape false %}. This separation of logic and presentation reduces accidental XSS in HTML body and attribute contexts. JavaScript, CSS, URL, and JSON-in-HTML contexts still require context-specific helpers.

JSON Data in HTML Templates

# SECURE - Django: store JSON as inert application/json data
from django.shortcuts import render

def search_page(request):
    return render(request, 'search.html', {
        'search_data': {'query': request.GET.get('search', '')}
    })

# Template:
# {{ search_data|json_script:"search-data" }}
# <script src="/static/search.js"></script>

# JavaScript:
# const data = JSON.parse(document.getElementById('search-data').textContent);

# SECURE - Flask/Jinja: use tojson in a script block
# <script>
#   const searchData = {{ search_data|tojson }};
# </script>

# Never concatenate into JavaScript strings:
# var x = '{user_data}';  // Can break out with '
# var x = {json.dumps(user_data)};  // Not enough for inline HTML by itself

Why this works: Plain json.dumps() creates valid JSON, but it does not by itself make the result safe to paste into an inline <script> block because sequences such as </script> can still terminate the element. Django's json_script escapes <, >, and & and stores the data in an inert application/json script tag that application JavaScript reads with textContent. Jinja's tojson filter serializes and marks JSON safe for HTML documents and <script> tags. Prefer these framework helpers over hand-built inline JavaScript.

flask.jsonify() - JSON API Responses

# SECURE - Flask JSON responses (sets proper Content-Type)
from flask import jsonify, request

@app.route('/api/user')
def get_user():
    user_input = request.args.get('query')
    return jsonify({
        'query': user_input,  # Automatically JSON-encoded
        'results': get_results(user_input)
    })
    # Content-Type: application/json (prevents HTML interpretation)

# Can also use jsonify() directly for single values
@app.route('/api/echo')
def echo():
    return jsonify(request.args.get('text'))

Why this works: Flask's jsonify() function does two critical things for security: (1) it serializes Python objects to JSON using proper encoding (via json.dumps()), and (2) it sets the Content-Type: application/json header. The JSON encoding ensures special characters are properly escaped. The content type header prevents browsers from interpreting the response as HTML, blocking content-type confusion attacks where an attacker tricks the browser into rendering JSON as an HTML page. This combination makes REST APIs secure by default - even if user input contains <script> tags, they're JSON-encoded and the browser never parses them as HTML because of the content type.

bleach.clean() - Rich HTML Sanitization (When HTML Input Needed)

# SECURE - Allow safe HTML tags while removing XSS
import bleach

user_html = request.form.get('content')
clean_html = bleach.clean(
    user_html,
    tags=['p', 'b', 'i', 'a', 'ul', 'ol', 'li', 'strong', 'em'],
    attributes={'a': ['href', 'title']},
    protocols=['http', 'https', 'mailto']
)

# Removes dangerous tags/attributes:
# <script>, onclick=, javascript:, etc.

# For Markdown-to-HTML, use bleach after conversion:
import markdown
html_content = markdown.markdown(user_markdown)
safe_content = bleach.clean(html_content, tags=bleach.ALLOWED_TAGS)

Why this works: When you need to allow rich HTML content (like from a WYSIWYG editor or Markdown converter), simple HTML encoding would destroy the formatting. Bleach provides a allowlist-based HTML sanitizer: it parses the user's HTML, removes dangerous elements (like <script>, <iframe>) and attributes (like onclick, onerror), and reconstructs clean HTML with only approved tags. The protocols parameter prevents javascript: URLs in links. The allowlist approach is more secure than denylisting because it blocks unknown attack vectors by default - if a new dangerous tag is discovered, it's already blocked unless you explicitly allowed it. Bleach is actively maintained and handles complex cases like CSS expression injection and nested tag attacks.

Framework-Specific Guidance

Django

# SECURE - Default behavior is safe

from django.shortcuts import render
from django.utils.html import escape, format_html

def comment_view(request):
    author = request.GET.get('author', '')
    text = request.GET.get('text', '')

    # Template auto-escapes these
    return render(request, 'comment.html', {
        'author': author,
        'text': text
    })

# Template (comment.html):

# <div class="comment">

#     <strong>{{ author }}</strong>: {{ text }}

# </div>

# For building HTML in Python code:

from django.utils.html import format_html

def build_message(username, msg):
    return format_html(
        '<div class="msg"><b>{}</b>: {}</div>',
        username,
        msg
    )  # format_html auto-escapes arguments

Django Settings:

# settings.py - Ensure templates auto-escape

TEMPLATES = [{
    'BACKEND': 'django.template.backends.django.DjangoTemplates',
    'OPTIONS': {
        'autoescape': True,  # Default, don't disable!
    },
}]

Flask / Jinja2

# SECURE - Jinja2 auto-escapes .html templates

from flask import Flask, render_template, request
from markupsafe import escape

app = Flask(__name__)

@app.route('/profile/<username>')
def profile(username):
    bio = request.args.get('bio', '')
    # Auto-escaped in template
    return render_template('profile.html', 
                          username=username, 
                          bio=bio)

# profile.html:

# <h1>{{ username }}'s Profile</h1>

# <p>{{ bio }}</p>

# For manual HTML building:

@app.route('/message')
def message():
    text = request.args.get('text', '')
    escaped_text = escape(text)
    return f'<div>{escaped_text}</div>'

FastAPI

# SECURE - Jinja2 templates with FastAPI

from fastapi import FastAPI, Request
from fastapi.templating import Jinja2Templates

app = FastAPI()
templates = Jinja2Templates(directory="templates")

@app.get("/profile/{user_id}")
async def profile(request: Request, user_id: int, bio: str = ""):
    return templates.TemplateResponse("profile.html", {
        "request": request,
        "user_id": user_id,
        "bio": bio  # Auto-escaped
    })

# JSON responses are automatically safe:

@app.get("/api/user/{user_id}")
async def get_user(user_id: int):
    return {"name": "John", "bio": "<script>alert('xss')</script>"}
    # FastAPI serializes to JSON, which escapes special chars

Rich HTML Sanitization

When you need to allow safe HTML (e.g., WYSIWYG editor):

# Use bleach library for HTML sanitization

import bleach

ALLOWED_TAGS = ['p', 'br', 'strong', 'em', 'ul', 'ol', 'li', 'a']
ALLOWED_ATTRIBUTES = {'a': ['href', 'title']}

def sanitize_html(dirty_html):
    clean = bleach.clean(
        dirty_html,
        tags=ALLOWED_TAGS,
        attributes=ALLOWED_ATTRIBUTES,
        strip=True
    )
    return clean

# Django view:

from django.utils.safestring import mark_safe

def save_article(request):
    content = request.POST.get('content', '')
    sanitized = sanitize_html(content)

    article = Article.objects.create(
        title=request.POST.get('title'),
        content=sanitized
    )
    return redirect('article_detail', pk=article.pk)

# Template (only mark_safe AFTER sanitization):

# <div class="article-content">

#     {{ article.content|safe }}

# </div>

Installation:

pip install bleach

Input Validation (Defense in Depth)

# Django forms with validation

from django import forms

class CommentForm(forms.Form):
    author = forms.CharField(
        max_length=100,
        required=True,
        validators=[
            RegexValidator(
                regex=r'^[a-zA-Z0-9\s]+$',
                message='Only alphanumeric characters allowed'
            )
        ]
    )

    text = forms.CharField(
        max_length=1000,
        widget=forms.Textarea,
        validators=[
            lambda value: '<script>' not in value.lower()
        ]
    )

# View:

def post_comment(request):
    form = CommentForm(request.POST)
    if form.is_valid():
        # Even with validation, template still auto-escapes
        comment = form.cleaned_data['text']
        Comment.objects.create(text=comment)
    return redirect('comments')

Content Security Policy

# Django middleware for CSP

from django.utils.deprecation import MiddlewareMixin

class SecurityHeadersMiddleware(MiddlewareMixin):
    def process_response(self, request, response):
        response['Content-Security-Policy'] = (
            "default-src 'self'; "
            "script-src 'self' https://trusted-cdn.com; "
            "style-src 'self' 'unsafe-inline'; "
            "img-src 'self' data: https:; "
            "frame-ancestors 'none';"
        )
        response['X-Content-Type-Options'] = 'nosniff'
        response['X-Frame-Options'] = 'DENY'
        response['X-XSS-Protection'] = '1; mode=block'
        return response

# settings.py

MIDDLEWARE = [
    'myapp.middleware.SecurityHeadersMiddleware',
    # ... other middleware
]

# Flask:

from flask import Flask

app = Flask(__name__)

@app.after_request
def set_security_headers(response):
    response.headers['Content-Security-Policy'] = (
        "default-src 'self'; "
        "script-src 'self'"
    )
    response.headers['X-Content-Type-Options'] = 'nosniff'
    return response

Security Checklist

Manually verify:

Django templates use {{ }} not {{ | safe }} for user data
Flask/Jinja2 templates use {{ }} not {{ | safe }} or {% autoescape false %}
No mark_safe() or Markup() on user input
All render() calls use template auto-escaping
JSON API responses use JsonResponse or jsonify() with Content-Type: application/json
JSON embedded in HTML uses Django json_script or Jinja tojson, not raw json.dumps() pasted into <script>
HTML manual construction uses html.escape() or markupsafe.escape()
All input sources identified (request.args, request.form, request.cookies, headers)