CWE-35: Path Equivalence - Python
Overview
Path equivalence vulnerabilities in Python occur when different representations of file paths (e.g., /app/files vs /app//files vs /app/files/.) bypass validation due to string comparison without canonicalization. Python's pathlib module provides robust path handling, but legacy code using os.path string operations remains vulnerable.
Primary Defence: Use pathlib.Path.resolve() to canonicalize all user-supplied paths to absolute form (resolving ., .., and symlinks), then verify the resolved path is within the allowed directory using is_relative_to() (Python 3.9+) or relative_to(). Apply Unicode normalization with unicodedata.normalize('NFC') before path operations, reject filenames containing path separators, and validate file type with is_file() before access.
Common Vulnerable Patterns
Weak String Check for Path Traversal
import os
@app.route('/download')
def download_file():
filename = request.args.get('file')
filepath = f'/app/public/{filename}'
# Weak check - can be bypassed with encoding or equivalents
if '..' not in filename:
return send_file(filepath)
# Attack: file=....//....//etc/passwd (bypasses '..' check)
# Attack: file=public/../../../etc/passwd (bypasses check)
Why this is vulnerable:
- String check
'..' not in filenamedoesn't catch encoded traversal sequences like%2e%2eor variations like....// - No path normalization means equivalent paths like
/app/public//filevs/app/public/filebypass validation - Symbolic links pointing outside allowed directory aren't resolved
- Path construction via string concatenation doesn't prevent directory escapes
Missing Unicode Normalization
import os
ALLOWED_DIR = '/app/files'
@app.route('/get')
def get_file():
filename = request.args.get('name')
filepath = os.path.join(ALLOWED_DIR, filename)
# Doesn't handle Unicode equivalents
# Attack: Use Unicode combining characters or normalization forms
# Example: 'file' vs 'file' (with combining diacritical marks)
# Example: NFC vs NFD normalization differences
if os.path.exists(filepath):
return send_file(filepath)
Why this is vulnerable:
- Unicode normalization forms (NFC, NFD, NFKC, NFKD) can represent the same filename differently
- Combining diacritical marks can create visually identical but byte-different filenames
- No Unicode normalization allows bypassing string-based allowlists or denylists
- File system may normalize differently than application, creating mismatches
Secure Patterns
Path Canonicalization with Containment Check
import os
from pathlib import Path
ALLOWED_DIR = Path('/app/public').resolve()
@app.route('/download')
def download_file():
filename = request.args.get('file', '')
if not filename:
abort(400, "Filename required")
# Construct path
requested_path = ALLOWED_DIR / filename
try:
# Resolve to canonical absolute path (resolves .., ., symlinks)
resolved_path = requested_path.resolve()
except FileNotFoundError:
abort(404, "File not found")
except (OSError, RuntimeError):
abort(400, "Invalid path")
# Verify resolved path is within allowed directory
# Python 3.9+: use resolved_path.is_relative_to(ALLOWED_DIR)
try:
resolved_path.relative_to(ALLOWED_DIR)
except ValueError:
abort(403, "Access denied")
# Verify file exists and is a file (not directory)
if not resolved_path.is_file():
abort(404, "File not found")
return send_file(resolved_path)
Why this works:
- Uses
resolve()to canonicalize path, eliminating.,.., symlinks, and path equivalents relative_to()(oris_relative_to()in Python 3.9+) verifies canonical path remains within allowed directory after resolution- Prevents all path traversal and equivalence attacks by comparing canonical forms
- Validates file existence and type after canonicalization
- Rejects paths outside allowed directory regardless of representation
Unicode Normalization with Path Validation
import os
import unicodedata
from pathlib import Path
ALLOWED_DIR = Path('/app/files').resolve()
@app.route('/get')
def get_file():
filename = request.args.get('name', '')
if not filename:
abort(400, "Filename required")
# Normalize Unicode to NFC (canonical composition)
filename = unicodedata.normalize('NFC', filename)
# Remove any null bytes
filename = filename.replace('\x00', '')
# Prevent path separators
if '/' in filename or '\\' in filename:
abort(400, "Invalid filename")
# Construct and resolve path
try:
filepath = (ALLOWED_DIR / filename).resolve()
except FileNotFoundError:
abort(404, "File not found")
except (OSError, RuntimeError):
abort(400, "Invalid path")
# Verify within allowed directory
# Python 3.9+: use filepath.is_relative_to(ALLOWED_DIR)
try:
filepath.relative_to(ALLOWED_DIR)
except ValueError:
abort(403, "Access denied")
# Verify file exists and is a file
if not filepath.is_file():
abort(404, "File not found")
return send_file(filepath)
Why this works:
unicodedata.normalize('NFC')converts Unicode to canonical composed form, preventing normalization bypasses- Removes null bytes that could truncate paths in some contexts
resolve()canonicalizes path, eliminating.,.., symlinks, and path equivalentsrelative_to()(oris_relative_to()in Python 3.9+) verifies canonical path is within allowed directory- Handles both path equivalence and Unicode normalization attacks
Additional Resources
- CWE-35: Path Equivalence
- OWASP Path Traversal
- Python pathlib Documentation - Path.resolve(), is_relative_to()
- Python os.path Documentation - realpath(), normpath()
- Unicode Normalization in Python