Skip to content

CWE-35: Path Equivalence - Python

Overview

Path equivalence vulnerabilities in Python occur when different representations of file paths (e.g., /app/files vs /app//files vs /app/files/.) bypass validation due to string comparison without canonicalization. Python's pathlib module provides robust path handling, but legacy code using os.path string operations remains vulnerable.

Primary Defence: Use pathlib.Path.resolve() to canonicalize all user-supplied paths to absolute form (resolving ., .., and symlinks), then verify the resolved path is within the allowed directory using is_relative_to() (Python 3.9+) or relative_to(). Apply Unicode normalization with unicodedata.normalize('NFC') before path operations, reject filenames containing path separators, and validate file type with is_file() before access.

Common Vulnerable Patterns

Weak String Check for Path Traversal

import os

@app.route('/download')
def download_file():
    filename = request.args.get('file')
    filepath = f'/app/public/{filename}'

    # Weak check - can be bypassed with encoding or equivalents
    if '..' not in filename:
        return send_file(filepath)
    # Attack: file=....//....//etc/passwd (bypasses '..' check)
    # Attack: file=public/../../../etc/passwd (bypasses check)

Why this is vulnerable:

  • String check '..' not in filename doesn't catch encoded traversal sequences like %2e%2e or variations like ....//
  • No path normalization means equivalent paths like /app/public//file vs /app/public/file bypass validation
  • Symbolic links pointing outside allowed directory aren't resolved
  • Path construction via string concatenation doesn't prevent directory escapes

Missing Unicode Normalization

import os

ALLOWED_DIR = '/app/files'

@app.route('/get')
def get_file():
    filename = request.args.get('name')
    filepath = os.path.join(ALLOWED_DIR, filename)

    # Doesn't handle Unicode equivalents
    # Attack: Use Unicode combining characters or normalization forms
    # Example: 'file' vs 'file' (with combining diacritical marks)
    # Example: NFC vs NFD normalization differences

    if os.path.exists(filepath):
        return send_file(filepath)

Why this is vulnerable:

  • Unicode normalization forms (NFC, NFD, NFKC, NFKD) can represent the same filename differently
  • Combining diacritical marks can create visually identical but byte-different filenames
  • No Unicode normalization allows bypassing string-based allowlists or denylists
  • File system may normalize differently than application, creating mismatches

Secure Patterns

Path Canonicalization with Containment Check

import os
from pathlib import Path

ALLOWED_DIR = Path('/app/public').resolve()

@app.route('/download')
def download_file():
    filename = request.args.get('file', '')

    if not filename:
        abort(400, "Filename required")

    # Construct path
    requested_path = ALLOWED_DIR / filename

    try:
        # Resolve to canonical absolute path (resolves .., ., symlinks)
        resolved_path = requested_path.resolve()
    except FileNotFoundError:
        abort(404, "File not found")
    except (OSError, RuntimeError):
        abort(400, "Invalid path")

    # Verify resolved path is within allowed directory
    # Python 3.9+: use resolved_path.is_relative_to(ALLOWED_DIR)
    try:
        resolved_path.relative_to(ALLOWED_DIR)
    except ValueError:
        abort(403, "Access denied")

    # Verify file exists and is a file (not directory)
    if not resolved_path.is_file():
        abort(404, "File not found")

    return send_file(resolved_path)

Why this works:

  • Uses resolve() to canonicalize path, eliminating ., .., symlinks, and path equivalents
  • relative_to() (or is_relative_to() in Python 3.9+) verifies canonical path remains within allowed directory after resolution
  • Prevents all path traversal and equivalence attacks by comparing canonical forms
  • Validates file existence and type after canonicalization
  • Rejects paths outside allowed directory regardless of representation

Unicode Normalization with Path Validation

import os
import unicodedata
from pathlib import Path

ALLOWED_DIR = Path('/app/files').resolve()

@app.route('/get')
def get_file():
    filename = request.args.get('name', '')

    if not filename:
        abort(400, "Filename required")

    # Normalize Unicode to NFC (canonical composition)
    filename = unicodedata.normalize('NFC', filename)

    # Remove any null bytes
    filename = filename.replace('\x00', '')

    # Prevent path separators
    if '/' in filename or '\\' in filename:
        abort(400, "Invalid filename")

    # Construct and resolve path
    try:
        filepath = (ALLOWED_DIR / filename).resolve()
    except FileNotFoundError:
        abort(404, "File not found")
    except (OSError, RuntimeError):
        abort(400, "Invalid path")

    # Verify within allowed directory
    # Python 3.9+: use filepath.is_relative_to(ALLOWED_DIR)
    try:
        filepath.relative_to(ALLOWED_DIR)
    except ValueError:
        abort(403, "Access denied")

    # Verify file exists and is a file
    if not filepath.is_file():
        abort(404, "File not found")

    return send_file(filepath)

Why this works:

  • unicodedata.normalize('NFC') converts Unicode to canonical composed form, preventing normalization bypasses
  • Removes null bytes that could truncate paths in some contexts
  • resolve() canonicalizes path, eliminating ., .., symlinks, and path equivalents
  • relative_to() (or is_relative_to() in Python 3.9+) verifies canonical path is within allowed directory
  • Handles both path equivalence and Unicode normalization attacks

Additional Resources