CWE-80: Cross-Site Scripting (XSS) - Python

Overview

XSS occurs when untrusted data is included in web output without proper encoding. Python web frameworks like Django and Flask provide built-in protection, but you must use them correctly.

Primary Defence: Use framework auto-escaping (Django templates, Flask/Jinja2), or explicit encoding with html.escape() for manual HTML construction.

Common Vulnerable Patterns

Django mark_safe() Misuse

# VULNERABLE - Marking user input as safe
from django.utils.safestring import mark_safe

def profile_view(request):
    user_bio = request.GET.get('bio', '')
    safe_bio = mark_safe(user_bio)  # DANGEROUS!
    return render(request, 'profile.html', {'bio': safe_bio})

Flask Without Auto-Escaping

# VULNERABLE - Disabling auto-escape
from flask import Flask, request, Markup

app = Flask(__name__)

@app.route('/comment')
def show_comment():
    comment = request.args.get('text', '')
    return f'<div>{Markup(comment)}</div>'  # DANGEROUS!

Manual HTML Construction

# VULNERABLE - String concatenation
from flask import Flask, request

@app.route('/greeting')
def greet():
    name = request.args.get('name', 'Guest')
    html = '<h1>Hello, ' + name + '</h1>'
    return html  # No escaping!

JavaScript Context Without Escaping

# VULNERABLE - Injecting into JavaScript

def search_view(request):
    query = request.GET.get('q', '')
    return render(request, 'search.html', {'query': query})

# Template:
# <script>
#     var searchTerm = '{{ query }}';  // Can break out with '
# </script>

Secure Patterns

Django Auto-Escaping (Default)

# SECURE - Django templates auto-escape by default

from django.shortcuts import render

def profile_view(request):
    user_bio = request.GET.get('bio', '')
    # Django automatically HTML-escapes user_bio in template
    return render(request, 'profile.html', {'bio': user_bio})

# Template (profile.html):
# <div class="bio">
#     {{ bio }}  <!-- Automatically escaped -->
# </div>

Why this works: Django's template engine automatically HTML-encodes all variable output using the {{ }} syntax by default. When you render {{ bio }} in a template, Django converts dangerous HTML characters (<, >, &, ", ') into their entity equivalents (<, >, &, etc.) before sending to the browser. This happens during the template rendering phase, after your view passes data to the template but before the HTTP response is generated. The auto-escaping is enabled by default in Django settings (autoescape=True) and applies to all templates unless explicitly disabled. This secure-by-default design makes XSS much harder to introduce accidentally - developers must explicitly use the |safe filter or mark_safe() function to bypass escaping, making dangerous usages easy to spot in code reviews. The automatic escaping works in both HTML content and attribute contexts, providing comprehensive protection.

Flask/Jinja2 Auto-Escaping

# SECURE - Flask enables auto-escaping for .html templates

from flask import Flask, render_template, request

app = Flask(__name__)

@app.route('/comment')
def show_comment():
    comment = request.args.get('text', '')
    return render_template('comment.html', comment=comment)

# Template (comment.html):
# <div class="comment">
#     {{ comment }}  <!-- Jinja2 auto-escapes -->
# </div>

Why this works: Flask uses the Jinja2 templating engine, which automatically escapes HTML in templates with .html, .htm, .xml, and .xhtml extensions. When you use {{ comment }} in a template, Jinja2 applies HTML entity encoding before rendering, converting special characters to entities. This auto-escaping is context-aware and happens at render time, ensuring even if malicious content like <script>alert(1)</script> is in the variable, it displays as plain text instead of executing. Flask enables this by default for HTML templates but not for plain text templates (.txt) where escaping would be inappropriate. Developers must explicitly use the |safe filter or Markup() to bypass escaping, making it obvious when raw HTML is being rendered. This matches Django's philosophy of secure-by-default design.

Explicit Escaping with MarkupSafe

# SECURE - Explicit HTML escaping

from markupsafe import escape

def build_html(user_input):
    escaped = escape(user_input)
    return f'<div>{escaped}</div>'

# Example:
# user_input = '<script>alert("xss")</script>'
# result = '<div>&lt;script&gt;alert(&#34;xss&#34;)&lt;/script&gt;</div>'

Why this works: MarkupSafe is the library that powers Jinja2's auto-escaping, providing the escape() function for manual HTML encoding. When you call escape(user_input), it converts HTML special characters (<, >, &, ", ') to their entity equivalents, preventing browsers from interpreting the content as markup. The library returns a Markup object that Jinja2 recognizes as already-safe, preventing double-escaping if the value is later used in a template. This explicit escaping is useful when you need to build HTML strings in Python code rather than templates, or when working with Flask's make_response() or other response builders. Using MarkupSafe ensures consistency with template auto-escaping - you're using the same battle-tested encoding logic that Flask and Django rely on. The library handles edge cases like Unicode characters correctly.

Context-Specific Encoding

HTML Context

# SECURE - HTML body content

from django.utils.html import escape

def display_message(request):
    msg = request.GET.get('msg', '')
    safe_msg = escape(msg)
    return HttpResponse(f'<p>{safe_msg}</p>')

Why this works: Django's escape() function provides HTML entity encoding for use in Python code when you're not using templates. It converts special characters to HTML entities, ensuring user input is displayed as text rather than interpreted as markup. This is essential when building HTTP responses directly with HttpResponse() instead of using the template system. The function handles all dangerous characters and edge cases, providing the same security guarantees as template auto-escaping but for manual HTML construction.

JavaScript Context

# SECURE - JavaScript string context

from django.utils.html import escapejs
from django.shortcuts import render

def search_view(request):
    query = request.GET.get('q', '')
    safe_query = escapejs(query)
    return render(request, 'search.html', {'safe_query': safe_query})

# Template:
# <script>
#     var searchTerm = '{{ safe_query|escapejs }}';
#     console.log(searchTerm);
# </script>

Why this works: JavaScript context requires different encoding than HTML. Django's escapejs() function escapes characters that could break out of JavaScript string literals, such as quotes, backslashes, and script-closing tags. HTML encoding alone is insufficient - an attacker could inject </script> which browsers parse as a script boundary even when HTML-encoded. The escapejs filter handles quotes, backslashes, newlines, and other control characters that are special in JavaScript, using backslash escaping. This prevents injection attacks where user input could close the JavaScript string and execute arbitrary code. Always use JavaScript-specific encoding when embedding data in <script> tags or event handlers.

URL Context

# SECURE - URL encoding

from urllib.parse import quote

def build_search_url(query):
    encoded_query = quote(query)
    return f'/search?q={encoded_query}'

# Django template filter:
# <a href="/search?q={{ query|urlencode }}">Search</a>

Why this works: URL encoding (percent-encoding) is necessary when embedding user data in URLs to prevent injection of additional query parameters or URL manipulation. Python's quote() function from urllib.parse percent-encodes special characters like &, =, ?, spaces, and other URL metacharacters, turning them into %XX format where XX is the hexadecimal value. This ensures user input is treated as data, not URL syntax. For example, &admin=true becomes %26admin%3Dtrue, preventing it from being interpreted as an additional query parameter. Django's urlencode template filter provides the same functionality in templates. Always use URL encoding for query parameters and path components that contain user data.

JSON Responses

# SECURE - JSON is automatically escaped

from django.http import JsonResponse
from flask import jsonify

# Django:

def api_user(request, user_id):
    user = User.objects.get(id=user_id)
    return JsonResponse({
        'name': user.name,  # Automatically JSON-escaped
        'bio': user.bio
    })

# Flask:

@app.route('/api/user/<int:user_id>')
def api_user(user_id):
    user = get_user(user_id)
    return jsonify({
        'name': user.name,
        'bio': user.bio
    })

Why this works: Both JsonResponse() in Django and jsonify() in Flask automatically serialize Python objects to JSON with proper escaping and set the Content-Type: application/json header. JSON encoding handles special characters according to JSON specification, escaping quotes, backslashes, and control characters. The critical security feature is the content type header - it tells browsers to treat the response as JSON data, not HTML, preventing browsers from parsing and executing any embedded script tags. Even if user input contains <script>alert(1)</script>, it's JSON-encoded to a string literal and the browser never interprets it as HTML because of the content type. This makes REST APIs secure by default without manual HTML encoding, as long as the proper content type is set.

html.escape() - Standard Library HTML Encoding

# SECURE - Built-in HTML entity encoding (Python 3.2+)
import html

user_input = request.args.get('name')
safe_output = html.escape(user_input)
print(f"<h1>Hello, {safe_output}</h1>")

# Optionally escape quotes for attribute context
safe_attr = html.escape(user_input, quote=True)  # Escapes " and '
print(f'<input value="{safe_attr}">')

# Encodes: & → &amp;  < → &lt;  > → &gt;  " → &quot;  ' → &#x27;

Why this works: html.escape() is Python's built-in function for HTML encoding, available in the standard library since Python 3.2. It converts HTML special characters into their entity equivalents, preventing browsers from interpreting user input as markup. The quote=True parameter is essential when outputting to HTML attributes, as it encodes both double and single quotes, preventing attribute injection attacks. Being part of the standard library means no external dependencies are needed, making it the first choice for HTML encoding in Python applications. It's simple, fast, and maintained as part of Python itself.

markupsafe.escape() / flask.escape() - Framework Escaping

# SECURE - MarkupSafe library (used by Flask/Jinja2)
from markupsafe import escape

user_comment = request.form.get('comment')
safe_comment = escape(user_comment)
html_output = f"<p>{safe_comment}</p>"

# Flask convenience import (same as markupsafe.escape)
from flask import escape
safe_text = escape(user_input)

Why this works: markupsafe.escape() is the escaping engine underlying Flask and Jinja2 templating, providing the same HTML encoding as html.escape() but returning a Markup object that tracks whether content is already safe. This prevents double-escaping when combining escaped strings with template rendering. Flask's escape() import is a convenience wrapper around markupsafe.escape(), making it readily available in Flask applications without additional imports. The library is battle-tested across millions of Flask and Jinja2 applications, and its Markup type system enables safe composition of HTML fragments in template contexts.

Template Auto-Escaping (Django/Flask/Jinja2)

# SECURE - Django templates auto-escape by default
from django.shortcuts import render

def view(request):
    return render(request, 'template.html', {
        'user_input': request.GET.get('data')  # Auto-escaped in template
    })

# Template: {{ user_input }}  <!-- Automatically HTML-escaped -->

# SECURE - Flask/Jinja2 auto-escapes .html templates
from flask import render_template

@app.route('/page')
def page():
    return render_template('page.html', 
        data=request.args.get('input')  # Auto-escaped
    )

Why this works: Both Django and Flask (via Jinja2) use auto-escaping templates that treat all variable output as plain text by default. When you render {{ user_input }} in a template, the framework automatically HTML-encodes it before sending to the browser. This secure-by-default design makes XSS much harder to introduce accidentally - developers must explicitly opt out of escaping (using |safe in Django or {% autoescape false %} in Jinja2) to render raw HTML. The template engines parse your template, identify all variable interpolations, and apply context-appropriate encoding. This separation of logic and presentation, combined with automatic encoding, is why modern Python web frameworks are resistant to XSS when used correctly.

json.dumps() - JSON/JavaScript Context Encoding

# SECURE - Safely embed data in JavaScript
import json

user_data = request.args.get('search')
safe_json = json.dumps(user_data)  # Properly escapes for JavaScript

html = f"""
<script>
    var searchTerm = {safe_json};  // Safe: quotes/brackets escaped
    console.log(searchTerm);
</script>
"""

# Never concatenate into JavaScript strings:
# var x = '{user_data}';  // Can break out with '
# var x = {json.dumps(user_data)};  // Safe

Why this works: JavaScript context requires different encoding than HTML. json.dumps() creates valid JSON which properly escapes quotes, backslashes, and control characters, making it safe to embed in JavaScript code. HTML encoding alone is insufficient - an attacker could inject </script> which browsers parse as a script boundary even when HTML-encoded. When you use json.dumps(), the value is encoded according to JSON standards: quotes become \", backslashes become \\, and special characters are escaped. This ensures the value remains a valid JavaScript literal and cannot break out of the script context. The approach works for strings, numbers, arrays, and objects, providing type-safe data transfer to JavaScript.

flask.jsonify() - JSON API Responses

# SECURE - Flask JSON responses (sets proper Content-Type)
from flask import jsonify, request

@app.route('/api/user')
def get_user():
    user_input = request.args.get('query')
    return jsonify({
        'query': user_input,  # Automatically JSON-encoded
        'results': get_results(user_input)
    })
    # Content-Type: application/json (prevents HTML interpretation)

# Can also use jsonify() directly for single values
@app.route('/api/echo')
def echo():
    return jsonify(request.args.get('text'))

Why this works: Flask's jsonify() function does two critical things for security: (1) it serializes Python objects to JSON using proper encoding (via json.dumps()), and (2) it sets the Content-Type: application/json header. The JSON encoding ensures special characters are properly escaped. The content type header prevents browsers from interpreting the response as HTML, blocking content-type confusion attacks where an attacker tricks the browser into rendering JSON as an HTML page. This combination makes REST APIs secure by default - even if user input contains <script> tags, they're JSON-encoded and the browser never parses them as HTML because of the content type.

bleach.clean() - Rich HTML Sanitization (When HTML Input Needed)

# SECURE - Allow safe HTML tags while removing XSS
import bleach

user_html = request.form.get('content')
clean_html = bleach.clean(
    user_html,
    tags=['p', 'b', 'i', 'a', 'ul', 'ol', 'li', 'strong', 'em'],
    attributes={'a': ['href', 'title']},
    protocols=['http', 'https', 'mailto']
)

# Removes dangerous tags/attributes:
# <script>, onclick=, javascript:, etc.

# For Markdown-to-HTML, use bleach after conversion:
import markdown
html_content = markdown.markdown(user_markdown)
safe_content = bleach.clean(html_content, tags=bleach.ALLOWED_TAGS)

Why this works: When you need to allow rich HTML content (like from a WYSIWYG editor or Markdown converter), simple HTML encoding would destroy the formatting. Bleach provides a allowlist-based HTML sanitizer: it parses the user's HTML, removes dangerous elements (like <script>, <iframe>) and attributes (like onclick, onerror), and reconstructs clean HTML with only approved tags. The protocols parameter prevents javascript: URLs in links. The allowlist approach is more secure than denylisting because it blocks unknown attack vectors by default - if a new dangerous tag is discovered, it's already blocked unless you explicitly allowed it. Bleach is actively maintained and handles complex cases like CSS expression injection and nested tag attacks.

Framework-Specific Guidance

Django

# SECURE - Default behavior is safe

from django.shortcuts import render
from django.utils.html import escape, format_html

def comment_view(request):
    author = request.GET.get('author', '')
    text = request.GET.get('text', '')

    # Template auto-escapes these
    return render(request, 'comment.html', {
        'author': author,
        'text': text
    })

# Template (comment.html):

# <div class="comment">

#     <strong>{{ author }}</strong>: {{ text }}

# </div>

# For building HTML in Python code:

from django.utils.html import format_html

def build_message(username, msg):
    return format_html(
        '<div class="msg"><b>{}</b>: {}</div>',
        username,
        msg
    )  # format_html auto-escapes arguments

Django Settings:

# settings.py - Ensure templates auto-escape

TEMPLATES = [{
    'BACKEND': 'django.template.backends.django.DjangoTemplates',
    'OPTIONS': {
        'autoescape': True,  # Default, don't disable!
    },
}]

Flask / Jinja2

# SECURE - Jinja2 auto-escapes .html templates

from flask import Flask, render_template, request
from markupsafe import escape

app = Flask(__name__)

@app.route('/profile/<username>')
def profile(username):
    bio = request.args.get('bio', '')
    # Auto-escaped in template
    return render_template('profile.html', 
                          username=username, 
                          bio=bio)

# profile.html:

# <h1>{{ username }}'s Profile</h1>

# <p>{{ bio }}</p>

# For manual HTML building:

@app.route('/message')
def message():
    text = request.args.get('text', '')
    escaped_text = escape(text)
    return f'<div>{escaped_text}</div>'

FastAPI

# SECURE - Jinja2 templates with FastAPI

from fastapi import FastAPI, Request
from fastapi.templating import Jinja2Templates

app = FastAPI()
templates = Jinja2Templates(directory="templates")

@app.get("/profile/{user_id}")
async def profile(request: Request, user_id: int, bio: str = ""):
    return templates.TemplateResponse("profile.html", {
        "request": request,
        "user_id": user_id,
        "bio": bio  # Auto-escaped
    })

# JSON responses are automatically safe:

@app.get("/api/user/{user_id}")
async def get_user(user_id: int):
    return {"name": "John", "bio": "<script>alert('xss')</script>"}
    # FastAPI serializes to JSON, which escapes special chars

Rich HTML Sanitization

When you need to allow safe HTML (e.g., WYSIWYG editor):

# Use bleach library for HTML sanitization

import bleach

ALLOWED_TAGS = ['p', 'br', 'strong', 'em', 'ul', 'ol', 'li', 'a']
ALLOWED_ATTRIBUTES = {'a': ['href', 'title']}

def sanitize_html(dirty_html):
    clean = bleach.clean(
        dirty_html,
        tags=ALLOWED_TAGS,
        attributes=ALLOWED_ATTRIBUTES,
        strip=True
    )
    return clean

# Django view:

from django.utils.safestring import mark_safe

def save_article(request):
    content = request.POST.get('content', '')
    sanitized = sanitize_html(content)

    article = Article.objects.create(
        title=request.POST.get('title'),
        content=sanitized
    )
    return redirect('article_detail', pk=article.pk)

# Template (only mark_safe AFTER sanitization):

# <div class="article-content">

#     {{ article.content|safe }}

# </div>

Installation:

pip install bleach

Input Validation (Defense in Depth)

# Django forms with validation

from django import forms

class CommentForm(forms.Form):
    author = forms.CharField(
        max_length=100,
        required=True,
        validators=[
            RegexValidator(
                regex=r'^[a-zA-Z0-9\s]+$',
                message='Only alphanumeric characters allowed'
            )
        ]
    )

    text = forms.CharField(
        max_length=1000,
        widget=forms.Textarea,
        validators=[
            lambda value: '<script>' not in value.lower()
        ]
    )

# View:

def post_comment(request):
    form = CommentForm(request.POST)
    if form.is_valid():
        # Even with validation, template still auto-escapes
        comment = form.cleaned_data['text']
        Comment.objects.create(text=comment)
    return redirect('comments')

Content Security Policy

# Django middleware for CSP

from django.utils.deprecation import MiddlewareMixin

class SecurityHeadersMiddleware(MiddlewareMixin):
    def process_response(self, request, response):
        response['Content-Security-Policy'] = (
            "default-src 'self'; "
            "script-src 'self' https://trusted-cdn.com; "
            "style-src 'self' 'unsafe-inline'; "
            "img-src 'self' data: https:; "
            "frame-ancestors 'none';"
        )
        response['X-Content-Type-Options'] = 'nosniff'
        response['X-Frame-Options'] = 'DENY'
        response['X-XSS-Protection'] = '1; mode=block'
        return response

# settings.py

MIDDLEWARE = [
    'myapp.middleware.SecurityHeadersMiddleware',
    # ... other middleware
]

# Flask:

from flask import Flask

app = Flask(__name__)

@app.after_request
def set_security_headers(response):
    response.headers['Content-Security-Policy'] = (
        "default-src 'self'; "
        "script-src 'self'"
    )
    response.headers['X-Content-Type-Options'] = 'nosniff'
    return response

Verification and Detection

Security testing requires multiple approaches - unit tests alone are insufficient.

Static Application Security Testing (SAST)

Use automated tools to detect XSS in Python code:

Commercial Tools:

Checkmarx - Python web framework security scanning
Fortify - Data flow analysis for Django/Flask
Snyk Code - Real-time Python security scanning
Veracode - Python application security testing

Open Source Tools: - Bandit - Python security linter

pip install bandit
bandit -r . -f json -o bandit-report.json

Semgrep - Pattern-based security scanning

pip install semgrep
semgrep --config=p/python .
semgrep --config=p/django .
semgrep --config=p/flask .

Pylint with security plugins

pip install pylint pylint-django pylint-flask
pylint --load-plugins pylint_django,pylint_flask your_app/

SonarQube - Continuous security analysis
```
pip install sonar-scanner
```

Dynamic Application Security Testing (DAST)

Test running Python applications:

OWASP ZAP - Automated web vulnerability scanner
Burp Suite Professional - Comprehensive testing
Arachni - Web application security scanner
w3af - Python-based web attack framework

Framework-Specific Tools

Django:

# Django security check
python manage.py check --deploy

# Check for security issues
python manage.py check --tag security

# Bandit with Django configuration
bandit -r . -ll -i -x */tests/*,*/migrations/*

Flask:

# Flask-Talisman for security headers
pip install flask-talisman

# Safety check for vulnerable dependencies
pip install safety
safety check --json

Limited Role of Unit Tests

Unit tests can verify encoding functions work but not comprehensive security:

import html
from markupsafe import escape

# Tests verify encoding - NOT comprehensive security
def test_html_escape():
    malicious = '<script>alert("xss")</script>'
    safe = html.escape(malicious)

    assert '<script>' not in safe
    assert '&lt;script&gt;' in safe

def test_markupsafe_escape():
    payload = '<img src=x onerror=alert(1)>'
    safe = escape(payload)

    assert '<img' not in safe
    assert '&lt;img' in safe

Important: Passing these tests does NOT mean your application is secure. Use SAST/DAST tools to find actual vulnerabilities.

Integration Testing

from django.test import TestCase, Client

class XssIntegrationTest(TestCase):
    def test_xss_payload_encoded(self):
        """Verify XSS payloads are encoded in responses"""
        client = Client()
        xss_payload = '<script>alert("XSS")</script>'

        response = client.get('/profile/', {'bio': xss_payload})

        # Should not contain unencoded script tag
        self.assertNotContains(response, '<script>')
        # Should contain encoded version
        self.assertContains(response, '&lt;script&gt;')

# pytest with Flask
def test_flask_xss_protection(client):
    payload = '<script>alert(1)</script>'
    response = client.get(f'/comment?text={payload}')

    assert b'<script>' not in response.data
    assert b'&lt;script&gt;' in response.data

Security Headers Testing

# Check security headers with curl
curl -I http://localhost:5000/ | grep -i "content-security-policy\|x-frame-options\|x-content-type"

# Use Django check command
python manage.py check --deploy

Continuous Security

CI/CD Integration - Run Bandit and Semgrep in GitHub Actions/GitLab CI

.github/workflows/security.yml

- name: Run Bandit
  run: |
    pip install bandit
    bandit -r . -f json -o bandit-report.json

- name: Run Semgrep
  run: |
    pip install semgrep
    semgrep --config=p/python --config=p/django .

Pre-commit Hooks - Scan code before commits

pip install pre-commit
# Add bandit and semgrep to .pre-commit-config.yaml

Dependency Scanning - Check for vulnerable packages

pip install safety
safety check
pip-audit  # Alternative tool

Security Champions - Train developers on secure coding
Penetration Testing - Regular professional assessments

Security Checklist

Manually verify:

Django templates use {{ }} not {{ | safe }} for user data
Flask/Jinja2 templates use {{ }} not {{ | safe }} or {% autoescape false %}
No mark_safe() or Markup() on user input
All render() calls use template auto-escaping
JSON responses use jsonify() or json.dumps()
HTML manual construction uses html.escape() or markupsafe.escape()
All input sources identified (request.args, request.form, request.cookies, headers)