CWE-35: Path Equivalence - Python

Overview

Path equivalence vulnerabilities in Python occur when different representations of file paths (e.g., /app/files vs /app//files vs /app/files/.) bypass validation due to string comparison without canonicalization. Python's pathlib module provides robust path handling, but legacy code using os.path string operations remains vulnerable.

Primary Defence: Use pathlib.Path.resolve() to canonicalize all user-supplied paths to absolute form (resolving ., .., and symlinks), then verify the resolved path is within the allowed directory using is_relative_to() (Python 3.9+) or relative_to(). Apply Unicode normalization with unicodedata.normalize('NFC') before path operations, reject filenames containing path separators, and validate file type with is_file() before access.

Common Vulnerable Patterns

Weak String Check for Path Traversal

import os

@app.route('/download')
def download_file():
    filename = request.args.get('file')
    filepath = f'/app/public/{filename}'

    # Weak check - can be bypassed with encoding or equivalents
    if '..' not in filename:
        return send_file(filepath)
    # Attack: file=....//....//etc/passwd (bypasses '..' check)
    # Attack: file=public/../../../etc/passwd (bypasses check)

Why this is vulnerable:

String check '..' not in filename doesn't catch encoded traversal sequences like %2e%2e or variations like ....//
No path normalization means equivalent paths like /app/public//file vs /app/public/file bypass validation
Symbolic links pointing outside allowed directory aren't resolved
Path construction via string concatenation doesn't prevent directory escapes

Missing Unicode Normalization

import os

ALLOWED_DIR = '/app/files'

@app.route('/get')
def get_file():
    filename = request.args.get('name')
    filepath = os.path.join(ALLOWED_DIR, filename)

    # Doesn't handle Unicode equivalents
    # Attack: Use Unicode combining characters or normalization forms
    # Example: 'file' vs 'file' (with combining diacritical marks)
    # Example: NFC vs NFD normalization differences

    if os.path.exists(filepath):
        return send_file(filepath)

Why this is vulnerable:

Unicode normalization forms (NFC, NFD, NFKC, NFKD) can represent the same filename differently
Combining diacritical marks can create visually identical but byte-different filenames
No Unicode normalization allows bypassing string-based allowlists or denylists
File system may normalize differently than application, creating mismatches

Secure Patterns

Path Canonicalization with Containment Check

import os
from pathlib import Path

ALLOWED_DIR = Path('/app/public').resolve()

@app.route('/download')
def download_file():
    filename = request.args.get('file', '')

    if not filename:
        abort(400, "Filename required")

    # Construct path
    requested_path = ALLOWED_DIR / filename

    try:
        # Resolve to canonical absolute path (resolves .., ., symlinks)
        resolved_path = requested_path.resolve()
    except FileNotFoundError:
        abort(404, "File not found")
    except (OSError, RuntimeError):
        abort(400, "Invalid path")

    # Verify resolved path is within allowed directory
    # Python 3.9+: use resolved_path.is_relative_to(ALLOWED_DIR)
    try:
        resolved_path.relative_to(ALLOWED_DIR)
    except ValueError:
        abort(403, "Access denied")

    # Verify file exists and is a file (not directory)
    if not resolved_path.is_file():
        abort(404, "File not found")

    return send_file(resolved_path)

Why this works:

Uses resolve() to canonicalize path, eliminating ., .., symlinks, and path equivalents
relative_to() (or is_relative_to() in Python 3.9+) verifies canonical path remains within allowed directory after resolution
Prevents all path traversal and equivalence attacks by comparing canonical forms
Validates file existence and type after canonicalization
Rejects paths outside allowed directory regardless of representation

Unicode Normalization with Path Validation

import os
import unicodedata
from pathlib import Path

ALLOWED_DIR = Path('/app/files').resolve()

@app.route('/get')
def get_file():
    filename = request.args.get('name', '')

    if not filename:
        abort(400, "Filename required")

    # Normalize Unicode to NFC (canonical composition)
    filename = unicodedata.normalize('NFC', filename)

    # Remove any null bytes
    filename = filename.replace('\x00', '')

    # Prevent path separators
    if '/' in filename or '\\' in filename:
        abort(400, "Invalid filename")

    # Construct and resolve path
    try:
        filepath = (ALLOWED_DIR / filename).resolve()
    except FileNotFoundError:
        abort(404, "File not found")
    except (OSError, RuntimeError):
        abort(400, "Invalid path")

    # Verify within allowed directory
    # Python 3.9+: use filepath.is_relative_to(ALLOWED_DIR)
    try:
        filepath.relative_to(ALLOWED_DIR)
    except ValueError:
        abort(403, "Access denied")

    # Verify file exists and is a file
    if not filepath.is_file():
        abort(404, "File not found")

    return send_file(filepath)

Why this works:

unicodedata.normalize('NFC') converts Unicode to canonical composed form, preventing normalization bypasses
Removes null bytes that could truncate paths in some contexts
resolve() canonicalizes path, eliminating ., .., symlinks, and path equivalents
relative_to() (or is_relative_to() in Python 3.9+) verifies canonical path is within allowed directory
Handles both path equivalence and Unicode normalization attacks

Additional Resources

CWE-35: Path Equivalence
OWASP Path Traversal
Python pathlib Documentation - Path.resolve(), is_relative_to()
Python os.path Documentation - realpath(), normpath()
Unicode Normalization in Python