CWE-79: Cross-Site Scripting (XSS) - Python

Overview

XSS occurs when untrusted data is included in web output without proper encoding. Python web frameworks like Django and Flask provide built-in protection, but you must use them correctly.

Primary Defence: Use framework auto-escaping (Django templates with {{ }}, Flask/Jinja2 templates for .html files) for automatic HTML encoding, or html.escape() for manual output encoding. For rich HTML content, use bleach.clean() with allowlist-based sanitization.

Common Vulnerable Patterns

Django mark_safe() Misuse

# VULNERABLE - Marking user input as safe

from django.utils.safestring import mark_safe

def profile_view(request):
    user_bio = request.GET.get('bio', '')
    safe_bio = mark_safe(user_bio)  # DANGEROUS!
    return render(request, 'profile.html', {'bio': safe_bio})

Why this is vulnerable: mark_safe() tells Django to skip HTML escaping, so malicious input like <script>alert(document.cookie)</script> renders as executable JavaScript instead of safe text, bypassing Django's automatic XSS protection.

Flask Without Auto-Escaping

# VULNERABLE - Disabling auto-escape

from flask import Flask, request, Markup

app = Flask(__name__)

@app.route('/comment')
def show_comment():
    comment = request.args.get('text', '')
    return f'<div>{Markup(comment)}</div>'  # DANGEROUS!

Why this is vulnerable: Markup() marks strings as safe HTML, disabling Jinja2's auto-escaping, so user input like <img src=x onerror=alert(1)> executes as JavaScript instead of being encoded to safe text entities.

Manual HTML Construction

# VULNERABLE - String concatenation

from flask import Flask, request

@app.route('/greeting')
def greet():
    name = request.args.get('name', 'Guest')
    html = '<h1>Hello, ' + name + '</h1>'
    return html  # No escaping!

Why this is vulnerable: Returning raw HTML strings bypasses Jinja2's auto-escaping entirely, so malicious input like <script>alert(1)</script> or <img src=x onerror=alert(1)> executes directly when rendered by the browser.

JavaScript Context Without Escaping

# VULNERABLE - Injecting into JavaScript

def search_view(request):
    query = request.GET.get('q', '')
    return render(request, 'search.html', {'query': query})

# Template:

# <script>

#     var searchTerm = '{{ query }}';  // Can break out with '

# </script>

Why this is vulnerable: Even with Django's HTML escaping, JavaScript string contexts require additional escaping because attackers can break out with quotes like '; alert(1); //, executing arbitrary JavaScript by closing the string and injecting code.

Secure Patterns

Django Auto-Escaping (Default)

# SECURE - Django templates auto-escape by default

from django.shortcuts import render

def profile_view(request):
    user_bio = request.GET.get('bio', '')
    # Django automatically HTML-escapes user_bio in template
    return render(request, 'profile.html', {'bio': user_bio})

# Template (profile.html):

# <div class="bio">

#     {{ bio }}  <!-- Automatically escaped -->

# </div>

Why this works: Django's template engine automatically HTML-encodes all variable output using the {{ }} syntax by default. When you render {{ bio }} in a template, Django converts dangerous HTML characters (<, >, &, ", ') into their entity equivalents (<, >, &, etc.) before sending to the browser. This happens during the template rendering phase, after your view passes data to the template but before the HTTP response is generated. The auto-escaping is enabled by default in Django settings (autoescape=True) and applies to all templates unless explicitly disabled. This secure-by-default design makes XSS much harder to introduce accidentally - developers must explicitly use the |safe filter or mark_safe() function to bypass escaping, making dangerous usages easy to spot in code reviews. The auto-escaping is context-aware for HTML content and attributes, but you still need escapejs filter for JavaScript contexts and urlencode for URL contexts.

Flask/Jinja2 Auto-Escaping

# SECURE - Flask enables auto-escaping for .html templates
from flask import Flask, render_template, request
app = Flask(__name__)

@app.route('/comment')
def show_comment():
    comment = request.args.get('text', '')
    return render_template('comment.html', comment=comment)

# Template (comment.html):
# <div class="comment">
#     {{ comment }}  <!-- Jinja2 auto-escapes -->
# </div>

Why this works: Flask uses Jinja2 as its template engine, which automatically enables HTML auto-escaping for files with .html, .htm, .xml, and .xhtml extensions. When you render {{ comment }} in a template, Jinja2 applies HTML entity encoding (converting <, >, &, ", ' to their entity equivalents) before rendering. This encoding happens during template compilation and rendering, after render_template() passes data to the template but before sending the HTTP response. The auto-escaping is controlled by the file extension - .html files have auto-escaping enabled by default, while .txt files don't. To explicitly disable escaping for trusted HTML content, use the |safe filter or wrap content with Markup(), but this should only be done with sanitized content. Flask's Jinja2 configuration can be customized via app.jinja_env.autoescape, but the secure default should rarely be changed. Like Django, this protects HTML contexts but requires additional encoding for JavaScript, CSS, or URL contexts.

Explicit Escaping with MarkupSafe

# SECURE - Explicit HTML escaping

from markupsafe import escape

def build_html(user_input):
    escaped = escape(user_input)
    return f'<div>{escaped}</div>'

# Example:
# user_input = '<script>alert("xss")</script>'
# result = '<div>&lt;script&gt;alert(&#34;xss&#34;)&lt;/script&gt;</div>'

Why this works: MarkupSafe is the library that powers Jinja2's auto-escaping, providing the escape() function for manual HTML encoding outside of templates. When you call escape(user_input), it converts HTML-significant characters (<, >, &, ", ') into their entity equivalents and returns a Markup object that Jinja2 recognizes as already-safe. This prevents double-escaping when the result is later used in a template. The escape() function is more robust than manual string replacement because it handles edge cases and character encodings correctly. MarkupSafe's Markup class tracks which strings have been escaped, preventing accidental double-escaping or re-escaping of safe content. Use escape() when building HTML in Python code (outside templates), especially when combining user input with HTML fragments. The function is efficient and well-tested, making it preferable to writing your own HTML encoding logic.

Context-Specific Encoding

HTML Context

# SECURE - HTML body content

from django.utils.html import escape

def display_message(request):
    msg = request.GET.get('msg', '')
    safe_msg = escape(msg)
    return HttpResponse(f'<p>{safe_msg}</p>')

Why this works: Django's escape() function (from django.utils.html) provides HTML entity encoding for use in Python code when building HTTP responses manually. It converts the standard HTML special characters into entities, preventing XSS when you're not using template rendering. This is the Python-code equivalent of template auto-escaping. Use escape() when constructing HttpResponse objects directly, building error messages, or any scenario where you're bypassing the template layer. The function handles Unicode correctly and returns a string that's safe to embed in HTML. However, template-based rendering with auto-escaping is still preferred over manual HTML construction because it's harder to accidentally forget to encode a value.

JavaScript Context

# SECURE - JavaScript string context

from django.utils.html import escapejs
from django.shortcuts import render

def search_view(request):
    query = request.GET.get('q', '')
    safe_query = escapejs(query)
    return render(request, 'search.html', {'safe_query': safe_query})

# Template:
# <script>
#     var searchTerm = '{{ safe_query|escapejs }}';
#     console.log(searchTerm);
# </script>

URL Context

# SECURE - URL encoding

from urllib.parse import quote

def build_search_url(query):
    encoded_query = quote(query)
    return f'/search?q={encoded_query}'

# Django template filter:
# <a href="/search?q={{ query|urlencode }}">Search</a>

JSON Responses

# SECURE - JSON is automatically escaped

from django.http import JsonResponse
from flask import jsonify

# Django:
def api_user(request, user_id):
    user = User.objects.get(id=user_id)
    return JsonResponse({
        'name': user.name,  # Automatically JSON-escaped
        'bio': user.bio
    })

# Flask:
@app.route('/api/user/<int:user_id>')
def api_user(user_id):
    user = get_user(user_id)
    return jsonify({
        'name': user.name,
        'bio': user.bio
    })

Why this works: Both JsonResponse() in Django and jsonify() in Flask automatically serialize Python objects to JSON with proper escaping and set the Content-Type: application/json header. JSON encoding handles special characters according to JSON specification, escaping quotes, backslashes, and control characters. The critical security feature is the content type header - it tells browsers to treat the response as JSON data, not HTML, preventing the browser from parsing any HTML or JavaScript in the response. This makes JSON responses inherently safe from XSS because browsers won't execute scripts in JSON content. However, if you later insert this JSON into HTML (like with innerHTML), you'll need HTML encoding at that point. JSON encoding is perfect for API responses but doesn't replace HTML encoding for web pages.

Framework-Specific Guidance

Django

# SECURE - Default behavior is safe
from django.shortcuts import render
from django.utils.html import escape, format_html

def comment_view(request):
    author = request.GET.get('author', '')
    text = request.GET.get('text', '')

    # Template auto-escapes these
    return render(request, 'comment.html', {
        'author': author,
        'text': text
    })

# Template (comment.html):
# <div class="comment">
#     <strong>{{ author }}</strong>: {{ text }}
# </div>

# For building HTML in Python code:

from django.utils.html import format_html

def build_message(username, msg):
    return format_html(
        '<div class="msg"><b>{}</b>: {}</div>',
        username,
        msg
    )  # format_html auto-escapes arguments

Django Settings:

# settings.py - Ensure templates auto-escape

TEMPLATES = [{
    'BACKEND': 'django.template.backends.django.DjangoTemplates',
    'OPTIONS': {
        'autoescape': True,  # Default, don't disable!
    },
}]

Flask / Jinja2

# SECURE - Jinja2 auto-escapes .html templates

from flask import Flask, render_template, request
from markupsafe import escape

app = Flask(__name__)

@app.route('/profile/<username>')
def profile(username):
    bio = request.args.get('bio', '')
    # Auto-escaped in template
    return render_template('profile.html', 
                          username=username, 
                          bio=bio)

# profile.html:
# <h1>{{ username }}'s Profile</h1>
# <p>{{ bio }}</p>
# For manual HTML building:

@app.route('/message')
def message():
    text = request.args.get('text', '')
    escaped_text = escape(text)
    return f'<div>{escaped_text}</div>'

FastAPI

# SECURE - Jinja2 templates with FastAPI

from fastapi import FastAPI, Request
from fastapi.templating import Jinja2Templates

app = FastAPI()
templates = Jinja2Templates(directory="templates")

@app.get("/profile/{user_id}")
async def profile(request: Request, user_id: int, bio: str = ""):
    return templates.TemplateResponse("profile.html", {
        "request": request,
        "user_id": user_id,
        "bio": bio  # Auto-escaped
    })

# JSON responses are automatically safe:

@app.get("/api/user/{user_id}")
async def get_user(user_id: int):
    return {"name": "John", "bio": "<script>alert('xss')</script>"}
    # FastAPI serializes to JSON, which escapes special chars

Rich HTML Sanitization

When you need to allow safe HTML (e.g., WYSIWYG editor):

# Use bleach library for HTML sanitization

import bleach

ALLOWED_TAGS = ['p', 'br', 'strong', 'em', 'ul', 'ol', 'li', 'a']
ALLOWED_ATTRIBUTES = {'a': ['href', 'title']}

def sanitize_html(dirty_html):
    clean = bleach.clean(
        dirty_html,
        tags=ALLOWED_TAGS,
        attributes=ALLOWED_ATTRIBUTES,
        strip=True
    )
    return clean

# Django view:

from django.utils.safestring import mark_safe

def save_article(request):
    content = request.POST.get('content', '')
    sanitized = sanitize_html(content)

    article = Article.objects.create(
        title=request.POST.get('title'),
        content=sanitized
    )
    return redirect('article_detail', pk=article.pk)

# Template (only mark_safe AFTER sanitization):
# <div class="article-content">
#     {{ article.content|safe }}
# </div>

Installation:

pip install bleach

Input Validation (Defense in Depth)

# Django forms with validation

from django import forms

class CommentForm(forms.Form):
    author = forms.CharField(
        max_length=100,
        required=True,
        validators=[
            RegexValidator(
                regex=r'^[a-zA-Z0-9\s]+$',
                message='Only alphanumeric characters allowed'
            )
        ]
    )

    text = forms.CharField(
        max_length=1000,
        widget=forms.Textarea,
        validators=[
            lambda value: '<script>' not in value.lower()
        ]
    )

# View:

def post_comment(request):
    form = CommentForm(request.POST)
    if form.is_valid():
        # Even with validation, template still auto-escapes
        comment = form.cleaned_data['text']
        Comment.objects.create(text=comment)
    return redirect('comments')

Content Security Policy

# Django middleware for CSP

from django.utils.deprecation import MiddlewareMixin

class SecurityHeadersMiddleware(MiddlewareMixin):
    def process_response(self, request, response):
        response['Content-Security-Policy'] = (
            "default-src 'self'; "
            "script-src 'self' https://trusted-cdn.com; "
            "style-src 'self' 'unsafe-inline'; "
            "img-src 'self' data: https:; "
            "frame-ancestors 'none';"
        )
        response['X-Content-Type-Options'] = 'nosniff'
        response['X-Frame-Options'] = 'DENY'
        response['X-XSS-Protection'] = '1; mode=block'
        return response

# settings.py

MIDDLEWARE = [
    'myapp.middleware.SecurityHeadersMiddleware',
    # ... other middleware
]

# Flask:

from flask import Flask

app = Flask(__name__)

@app.after_request
def set_security_headers(response):
    response.headers['Content-Security-Policy'] = (
        "default-src 'self'; "
        "script-src 'self'"
    )
    response.headers['X-Content-Type-Options'] = 'nosniff'
    return response

Verification

To verify XSS protection is working:

Test with XSS payloads: Submit common XSS patterns (<script>alert('xss')</script>, <img src=x onerror=alert('xss')>, etc.) and verify they appear encoded in the response
Check HTML source: View the page source to confirm user input is HTML-encoded (< appears as <, > as >)
Test JavaScript context: If data appears in <script> tags, verify quotes and special characters are properly JavaScript-encoded using escapejs filter
Review templates: Search for |safe filter, mark_safe(), Markup(), or {% autoescape off %} and verify they're only used with sanitized content
Check framework settings: Confirm Django's autoescape is True in template settings, or Flask templates use .html extension
Test URL context: Verify user data in URLs is URL-encoded using urlencode filter or quote()
Check Content-Security-Policy: Verify CSP headers are present and properly configured to restrict script sources
Use browser DevTools: Inspect rendered HTML to ensure no unencoded user input appears