CWE-73: External Control of File Name or Path - Python
Overview
External control of file names or paths is a critical vulnerability in Python applications that occurs when user-supplied input is used to construct file system paths without proper validation. Python's flexible file handling with open(), os.path, and pathlib modules provides powerful capabilities but minimal built-in protection against path traversal.
Primary Defence: Use Path.resolve() with relative_to() validation to ensure canonicalized paths (with symlinks resolved) stay within the intended base directory, implement allowlists for known file sets, use UUID-based indirect reference maps for sensitive file access, and use Flask's send_from_directory() or sanitize filenames with Path(filename).name combined with extension validation to prevent path traversal, absolute path injection, and symlink attacks.
Remediation Steps
Locate the Finding
Identify the Source (User Input)
- Look for where untrusted data enters the application:
- Flask:
request.args.get(),request.form[],request.json[] - Django:
request.GET[],request.POST[],request.FILES[] - FastAPI: Function parameters with
Query(),Path(),Form() - Form fields: File upload
filenameattributes - URL path parameters
- HTTP headers
- JSON/XML API payloads
- Flask:
Identify the Sink (File Operation)
- Trace to where the data reaches file system operations:
open(path),open(filename, 'r')Path(user_input).read_text(),Path().write_bytes()os.remove(),os.unlink()shutil.copy(),shutil.move()os.path.join(),os.path.exists()send_file(),send_from_directory()(Flask)FileResponse()(FastAPI)
Example from Security Scan Finding:
# Source: User-controlled filename from query parameter
@app.route('/download')
def download_file():
filename = request.args.get('file') # ← SOURCE
# Sink: File operation using untrusted input
with open(filename, 'rb') as f: # ← SINK
return f.read()
Understand the Data Flow
Review the data flow in the security finding:
- Entry Point: Where user input enters (route handler, view function)
- Intermediate Processing: Any transformations or validations (often insufficient)
- File Operation: The vulnerable sink where file access occurs
Key Questions:
- Is there any validation between source and sink?
- Are there string operations that might be bypassable? (
if '..' in path,.replace()) - Does the path go through
os.path.join()without validation? - Is there any path resolution with
Path.resolve()oros.path.abspath()? - Are legacy
os.pathfunctions used instead of modernpathlib?
Common Data Flow Pattern:
@app.route('/read') # ← Entry point
def read_file():
filename = request.args['filename'] # ← Source
# Insufficient validation (bypassable)
if '..' in filename:
abort(400, 'Invalid filename')
# String concatenation for path
path = '/app/data/' + filename # ← Transformation
with open(path) as f: # ← Sink
return f.read()
Identify the Pattern
Match the code to vulnerable patterns:
- Absolute path injection → Pattern:
open(user_input)where user_input =/etc/passwd - Directory traversal → Pattern:
open(f'/uploads/{filename}')where filename =../../etc/passwd - os.path.join with absolute path → Pattern:
os.path.join('/data', '/etc/passwd')=/etc/passwd - Unsafe uploaded filename → Pattern:
file.save(file.filename)without sanitization - Denylist bypass → Pattern:
if '..' in path(bypassable with URL encoding:..%2F) - String concatenation → Pattern:
'/data/' + category + '/' + filenameinstead of proper joining - Not using Path.resolve() → Pattern: File operations without canonicalization
Verify the Fix
Test with malicious inputs:
import pytest
from pathlib import Path
def test_path_traversal_prevention(tmp_path):
"""Test that malicious inputs are rejected."""
validator = SecurePathValidator(str(tmp_path))
attacks = [
'../../../../etc/passwd', # Unix traversal
'..\\..\\..\\Windows\\win.ini', # Windows traversal
'/etc/passwd', # Absolute Unix path
'C:\\Windows\\System32\\config\\SAM', # Absolute Windows path
'..%2f..%2fetc%2fpasswd', # URL-encoded
'....//....//etc/passwd', # Double-dot attempt
'file.txt\x00.jpg', # Null byte injection
]
for attack in attacks:
with pytest.raises(ValueError, match='Path traversal'):
validator.validate_path(attack)
def test_valid_paths(tmp_path):
"""Test that valid paths are allowed."""
# Create test files
(tmp_path / 'allowed.txt').write_text('content')
(tmp_path / 'subdir').mkdir()
(tmp_path / 'subdir' / 'nested.txt').write_text('nested')
validator = SecurePathValidator(str(tmp_path))
valid_paths = ['allowed.txt', 'subdir/nested.txt']
for path in valid_paths:
result = validator.validate_path(path)
assert result.exists()
assert result.is_relative_to(tmp_path)
Manual Testing:
- Try accessing:
/download?file=../../../../etc/passwd - Try accessing:
/download?file=/etc/shadow - Try uploading file named:
../../app.py - Try accessing:
/download?file=..%2F..%2Fsettings.py(URL-encoded) - Test with symlinks pointing outside base directory
- Verify legitimate files are still accessible
- Test on both Windows and Linux if cross-platform
Check for Similar Issues
Search your codebase for other vulnerable patterns:
VS Code / PyCharm Search Patterns (Regex):
open\(
Path\(.*\)\.read
Path\(.*\)\.write
os\.path\.join
send_file\(
send_from_directory\(
request\.args\.get.*file
request\.form.*file
request\.GET.*file
request\.POST.*path
file\.filename
@app\.route.*download
@app\.route.*file
Common locations to check:
- All Flask routes with file-related parameters
- Django views handling file operations
- FastAPI endpoints with file parameters
- File upload handlers
- Static file serving configuration
- Document download endpoints
- Report generation functionality
- Backup/restore features
- Log file access views
- Template file handling
- Configuration file loading from user input
Review code patterns like:
# Pattern 1: Direct file parameter in open()
@app.route('/read')
def read():
return open(request.args['file']).read()
# Pattern 2: String concatenation for paths
path = '/data/' + user_input
# Pattern 3: Insufficient validation
if '..' in filename:
abort(400)
# Pattern 4: Using uploaded filename directly
file.save(file.filename)
# Pattern 5: os.path.join without validation
path = os.path.join(base_dir, user_input)
open(path)
Flask specific areas:
- Routes using
send_file()orsend_from_directory() - Static folder configuration
- Custom file serving middleware
- Template rendering with file paths
app.config['UPLOAD_FOLDER']usage
Django specific areas:
- Views using
FileResponseorHttpResponsewith files MEDIA_ROOTandMEDIA_URLconfiguration- File upload handlers in forms
- Static file serving in
urls.py - Template tags loading files
- Management commands accessing files
FastAPI specific areas:
- Endpoints returning
FileResponse UploadFileparameter handling- Static file mounting with
StaticFiles - Background tasks accessing files
Common Vulnerable Patterns
Direct Use of User Input in open()
# VULNERABLE - No validation of user-supplied filename
from flask import request
@app.route('/download')
def download_file():
filename = request.args.get('file')
with open(filename, 'rb') as f:
return f.read()
# Attack example:
# GET /download?file=../../../../etc/passwd
# Result: Reads /etc/passwd from the server
Insufficient String-Based Validation
# VULNERABLE - Denylist can be bypassed
@app.route('/read')
def read_file():
filename = request.args['filename']
# Incomplete validation
if '..' in filename:
abort(400, 'Invalid filename')
with open(f'/app/data/{filename}') as f:
return f.read()
# Attack example:
# GET /read?filename=..%2F..%2Fetc%2Fpasswd
# Result: URL-encoded ".." bypasses the check
Using os.path.join with Absolute Paths
# VULNERABLE - os.path.join allows absolute paths
import os
@app.route('/file/<path:filename>')
def get_file(filename):
# os.path.join ignores first arg if second is absolute
full_path = os.path.join('/app/data/', filename)
with open(full_path) as f:
return f.read()
# Attack example:
# GET /file//etc/passwd
# Result: os.path.join('/app/data/', '/etc/passwd') = '/etc/passwd'
Trusting Uploaded Filenames
# VULNERABLE - Using original filename without sanitization
from werkzeug.utils import secure_filename
@app.route('/upload', methods=['POST'])
def upload_file():
file = request.files['file']
# secure_filename helps but doesn't prevent all attacks
filename = secure_filename(file.filename)
file.save(os.path.join('/uploads', filename))
return 'Uploaded'
# Attack example:
# Upload file with path components in multipart name
# Still vulnerable to various encoding attacks
String Concatenation for Paths
# VULNERABLE - String concat allows injection
@app.route('/delete')
def delete_file():
filename = request.args['file']
filepath = '/app/temp/' + filename
os.remove(filepath)
return 'Deleted'
# Attack example:
# GET /delete?file=../../app.py
# Result: Deletes /app/app.py
Secure Patterns
Allowlist with pathlib
from pathlib import Path
class SecureFileService:
def __init__(self, base_dir="/app/data"):
self.base_dir = Path(base_dir).resolve()
self.allowed_files = {"report.pdf", "summary.txt", "data.csv"}
def read_file(self, filename: str) -> bytes:
if filename not in self.allowed_files:
raise PermissionError("File not allowed")
candidate = (self.base_dir / filename)
# Resolve symlinks and require existence
real = candidate.resolve(strict=True)
# Prevent symlink escapes (Python 3.9+)
if not real.is_relative_to(self.base_dir):
raise PermissionError("Invalid file location")
if not real.is_file():
raise FileNotFoundError("Not a file")
return real.read_bytes()
Why this works:
- Only exact allowlisted basenames are permitted (no user-controlled paths).
- The file is resolved to its real path and verified to remain under the trusted base directory.
- Only regular files are read.
Path Resolution with Ancestor Validation (Flexible and Secure)
from pathlib import Path
class SecurePathValidator:
def __init__(self, base_directory: str):
self.base_dir = Path(base_directory).resolve(strict=True)
def validate_existing_path(self, user_path: str) -> Path:
if not user_path:
raise ValueError("Missing path")
candidate = (self.base_dir / user_path)
# Require existence so symlink resolution is meaningful
real = candidate.resolve(strict=True)
# Ensure the real target is within the base directory
real.relative_to(self.base_dir)
if not real.is_file():
raise FileNotFoundError("Not a file")
return real
# Usage:
validator = SecurePathValidator('/app/data')
safe_path = validator.validate_path(user_input)
content = safe_path.read_text()
Why this works:
- The base directory is resolved to a real, canonical path before any validation.
- User input is resolved to a real filesystem path, collapsing
./..components and resolving symlinks (for existing paths). relative_to()enforces that the resolved target is a descendant of the trusted base directory.- Validation is performed on the resolved filesystem path rather than raw user input, preventing traversal via path syntax.
UUID-Based Indirect References
# SECURE - Maps tokens to actual file paths
import uuid
from pathlib import Path
from typing import Dict, Optional
class SecureFileRegistry:
def __init__(self, base_directory: str):
self.base_dir = Path(base_directory).resolve()
self.registry: Dict[uuid.UUID, Path] = {}
def register_file(self, internal_path: str) -> uuid.UUID:
"""Register a file and return access token."""
file_path = (self.base_dir / internal_path).resolve()
# Validate file is within base directory
try:
file_path.relative_to(self.base_dir)
except ValueError:
raise ValueError(f'Invalid file path: {internal_path}')
if not file_path.is_file():
raise FileNotFoundError(f'File not found: {internal_path}')
# Generate unique token
token = uuid.uuid4()
self.registry[token] = file_path
return token
def get_file(self, token: uuid.UUID) -> bytes:
file_path = self.registry.get(token)
if not file_path:
raise FileNotFoundError("Invalid file token")
real = file_path.resolve(strict=True)
if not real.is_relative_to(self.base_dir):
raise PermissionError("Invalid file location")
if not real.is_file():
raise FileNotFoundError("Not a file")
return real.read_bytes()
# Usage:
registry = SecureFileRegistry('/app/data')
token = registry.register_file('reports/2024/q1.pdf')
# Return token to user, they can only access via this token
content = registry.get_file(token)
Why this works:
- Users interact only with opaque tokens, not filesystem paths.
- Tokens map to server-validated, canonical file paths under a trusted base directory.
- Path traversal is prevented by enforcing containment during registration.
- Because access uses server-controlled mappings, user input cannot influence path resolution at read time.
- When combined with proper filesystem permissions and token scoping, this significantly reduces file access risk.
Filename Sanitization
# Filename sanitization is an input cleanup step, not a security boundary.
import re
from pathlib import Path
from typing import Set
class SecureFilenameHandler:
ALLOWED_EXTENSIONS: Set[str] = {'.pdf', '.txt', '.csv', '.xlsx'}
@staticmethod
def sanitize_filename(filename: str) -> str:
"""
Sanitize filename to prevent path traversal.
Raises:
ValueError: If filename is invalid or has forbidden extension
"""
if not filename or not filename.strip():
raise ValueError('Filename cannot be empty')
# Get just the filename, removing any path components
clean_name = Path(filename).name
# Remove dangerous characters
clean_name = re.sub(r'[^a-zA-Z0-9._-]', '_', clean_name)
# Validate extension
extension = Path(clean_name).suffix.lower()
if extension not in SecureFilenameHandler.ALLOWED_EXTENSIONS:
raise ValueError(f'File type not allowed: {extension}')
return clean_name
# Usage:
safe_filename = SecureFilenameHandler.sanitize_filename(user_input)
file_path = Path('/uploads') / safe_filename
Why this works:
Path(filename).namediscards all directory components, ensuring the result is a simple filename.- Character allowlisting reduces problematic characters for filesystem use and logging.
- Extension allowlisting enforces a restricted set of accepted file types.
- This provides filename hygiene only; real-path containment and safe file handling must still be enforced when accessing the filesystem.
Framework-Specific Guidance
Django - Secure File Handling
# SECURE - Django file upload and serving
from django.core.files.storage import FileSystemStorage
from django.core.exceptions import SuspiciousFileOperation
from django.utils.text import get_valid_filename
from pathlib import PurePosixPath
class SecureFileStorage(FileSystemStorage):
def generate_filename(self, filename):
# Django expects POSIX-style paths for storage names (even on Windows)
p = PurePosixPath(filename)
if p.is_absolute() or ".." in p.parts:
raise SuspiciousFileOperation("Invalid upload path")
# Clean only the final component; keep upload_to subdirs
safe_name = get_valid_filename(p.name)
return str(p.with_name(safe_name))
# In models.py:
from django.db import models
class Document(models.Model):
file = models.FileField(
upload_to='documents/%Y/%m/',
storage=SecureFileStorage()
)
uploaded_at = models.DateTimeField(auto_now_add=True)
# Secure file serving view:
from django.http import FileResponse, Http404
from django.views import View
from pathlib import Path
class SecureFileDownloadView(View):
def get(self, request, file_id):
try:
doc = Document.objects.get(id=file_id)
except Document.DoesNotExist:
raise Http404("File not found")
# TODO: enforce authorization (owner/role/tenant checks) here.
fh = doc.file.open("rb")
return FileResponse(
fh,
as_attachment=True,
filename=Path(doc.file.name).name, # storage name, not local path
)
# settings.py:
MEDIA_ROOT = '/var/app/media/'
MEDIA_URL = '/media/'
Why this works:
- Upload names are validated during Django’s filename generation, rejecting absolute paths and
..segments. - Filenames are cleaned using Django utilities, reducing unsafe characters in stored names.
- Files are accessed through Django’s storage API (
doc.file.open()), avoiding direct filesystem path handling in views. - Downloads should enforce authorization on the model object to prevent IDOR (streaming alone is not access control).
Flask - Secure File Operations
import uuid
from pathlib import Path
from flask import Flask, request, abort, send_from_directory
from werkzeug.utils import secure_filename
app = Flask(__name__)
UPLOAD_FOLDER = Path("/var/app/uploads").resolve()
ALLOWED_EXTENSIONS = {"pdf", "txt", "csv", "xlsx"}
def allowed_file(filename: str) -> bool:
return "." in filename and filename.rsplit(".", 1)[1].lower() in ALLOWED_EXTENSIONS
def save_upload(file) -> str:
if not file or not file.filename:
raise ValueError("No file provided")
original = secure_filename(file.filename)
if not original or not allowed_file(original):
raise ValueError("Invalid file type")
ext = original.rsplit(".", 1)[1].lower()
stored = f"{uuid.uuid4().hex}.{ext}" # server-controlled name
UPLOAD_FOLDER.mkdir(parents=True, exist_ok=True)
dest = UPLOAD_FOLDER / stored
# Avoid overwrites; fail if exists
if dest.exists():
raise ValueError("Upload collision")
file.save(str(dest))
return stored
@app.route("/upload", methods=["POST"])
def upload_file():
f = request.files.get("file")
if not f:
abort(400, "No file provided")
try:
stored_name = save_upload(f)
return {"file": stored_name}, 200
except ValueError as e:
abort(400, str(e))
@app.route("/download/<path:filename>")
def download_file(filename):
# TODO: enforce authorization here (token/user/tenant checks)
return send_from_directory(
UPLOAD_FOLDER,
filename,
as_attachment=True,
download_name=filename, # Flask>=2.0; optional
)
Why this works:
- Uploads are stored under a fixed, server-controlled directory with server-generated filenames (no user-controlled paths).
- Filenames are sanitized and restricted to an extension allowlist (policy enforcement).
- Downloads use
send_from_directory(), which applies Werkzeug’ssafe_join()to prevent path traversal outside the upload directory. - Authorization should be enforced on download requests; safe path handling alone does not prevent IDOR.
FastAPI - Async Secure File Handling
import re
import uuid
from pathlib import Path
from fastapi import FastAPI, UploadFile, HTTPException, File
from fastapi.responses import FileResponse
app = FastAPI()
class FastAPISecureFileHandler:
UPLOAD_DIR = Path("/var/app/uploads").resolve()
MAX_FILE_SIZE = 10 * 1024 * 1024 # 10MB
# Keep both; content_type is advisory, extension is policy
ALLOWED = {
"application/pdf": ".pdf",
"text/plain": ".txt",
"text/csv": ".csv",
}
@classmethod
async def save_upload(cls, file: UploadFile) -> str:
ext = cls.ALLOWED.get(file.content_type)
if not ext:
raise HTTPException(400, f"File type not allowed: {file.content_type}")
cls.UPLOAD_DIR.mkdir(parents=True, exist_ok=True)
safe_filename = f"{uuid.uuid4().hex}{ext}"
file_path = cls.UPLOAD_DIR / safe_filename
try:
size = 0
with file_path.open("wb") as f:
while True:
chunk = await file.read(8192)
if not chunk:
break
size += len(chunk)
if size > cls.MAX_FILE_SIZE:
raise HTTPException(413, "File too large")
f.write(chunk)
except Exception:
if file_path.exists():
file_path.unlink()
raise
finally:
await file.close()
return safe_filename
@app.post("/upload")
async def upload_file(file: UploadFile = File(...)):
return {"filename": await FastAPISecureFileHandler.save_upload(file)}
@app.get("/download/{filename}")
async def download_file(filename: str):
# UUID hex + approved extension
if not re.fullmatch(r"[0-9a-f]{32}\.(pdf|txt|csv)", filename):
raise HTTPException(400, "Invalid filename")
file_path = FastAPISecureFileHandler.UPLOAD_DIR / filename
if not file_path.is_file():
raise HTTPException(404, "File not found")
# TODO: enforce authorization (user/tenant ownership) here.
return FileResponse(path=file_path, filename=filename, media_type="application/octet-stream")
Why this works:
- Files are stored under a fixed server-controlled directory using server-generated UUID filenames (no user-controlled paths).
- Uploads are constrained by an allowlist (size limit plus approved types/extensions), reducing abuse and DoS risk.
- Downloads only serve filenames matching a strict UUID+extension pattern, preventing path traversal via crafted names.
- Partial files are removed on error, avoiding accumulation of incomplete uploads.
- Authorization should still be enforced on downloads; unguessable names reduce risk but are not access control.
Verification
After implementing the recommended secure patterns, verify the fix through multiple approaches:
- Manual testing: Submit malicious payloads relevant to this vulnerability and confirm they're handled safely without executing unintended operations
- Code review: Confirm all instances use the secure pattern (parameterized queries, safe APIs, proper encoding) with no string concatenation or unsafe operations
- Static analysis: Use security scanners to verify no new vulnerabilities exist and the original finding is resolved
- Regression testing: Ensure legitimate user inputs and application workflows continue to function correctly
- Edge case validation: Test with special characters, boundary conditions, and unusual inputs to verify proper handling
- Framework verification: If using a framework or library, confirm the recommended APIs are used correctly according to documentation
- Authentication/session testing: Verify security controls remain effective and cannot be bypassed (if applicable to the vulnerability type)
- Rescan: Run the security scanner again to confirm the finding is resolved and no new issues were introduced