CWE-73: External Control of File Name or Path - Python

Overview

External control of file names or paths is a critical vulnerability in Python applications that occurs when user-supplied input is used to construct file system paths without proper validation. Python's flexible file handling with open(), os.path, and pathlib modules provides powerful capabilities but minimal built-in protection against path traversal.

Primary Defence: Use Path.resolve() with relative_to() validation to ensure canonicalized paths (with symlinks resolved) stay within the intended base directory, implement allowlists for known file sets, use UUID-based indirect reference maps for sensitive file access, and use Flask's send_from_directory() or sanitize filenames with Path(filename).name combined with extension validation to prevent path traversal, absolute path injection, and symlink attacks.

Remediation Steps

Locate the Finding

Identify the Source (User Input)

Look for where untrusted data enters the application:
- Flask: request.args.get(), request.form[], request.json[]
- Django: request.GET[], request.POST[], request.FILES[]
- FastAPI: Function parameters with Query(), Path(), Form()
- Form fields: File upload filename attributes
- URL path parameters
- HTTP headers
- JSON/XML API payloads

Identify the Sink (File Operation)

Trace to where the data reaches file system operations:
- open(path), open(filename, 'r')
- Path(user_input).read_text(), Path().write_bytes()
- os.remove(), os.unlink()
- shutil.copy(), shutil.move()
- os.path.join(), os.path.exists()
- send_file(), send_from_directory() (Flask)
- FileResponse() (FastAPI)

Example from Security Scan Finding:

# Source: User-controlled filename from query parameter

@app.route('/download')
def download_file():
    filename = request.args.get('file')  # ← SOURCE

    # Sink: File operation using untrusted input
    with open(filename, 'rb') as f:  # ← SINK
        return f.read()

Understand the Data Flow

Review the data flow in the security finding:

Entry Point: Where user input enters (route handler, view function)
Intermediate Processing: Any transformations or validations (often insufficient)
File Operation: The vulnerable sink where file access occurs

Key Questions:

Is there any validation between source and sink?
Are there string operations that might be bypassable? (if '..' in path, .replace())
Does the path go through os.path.join() without validation?
Is there any path resolution with Path.resolve() or os.path.abspath()?
Are legacy os.path functions used instead of modern pathlib?

Common Data Flow Pattern:

@app.route('/read')  # ← Entry point
def read_file():
    filename = request.args['filename']  # ← Source

    # Insufficient validation (bypassable)
    if '..' in filename:
        abort(400, 'Invalid filename')

    # String concatenation for path
    path = '/app/data/' + filename  # ← Transformation

    with open(path) as f:  # ← Sink
        return f.read()

Identify the Pattern

Match the code to vulnerable patterns:

Absolute path injection → Pattern: open(user_input) where user_input = /etc/passwd
Directory traversal → Pattern: open(f'/uploads/{filename}') where filename = ../../etc/passwd
os.path.join with absolute path → Pattern: os.path.join('/data', '/etc/passwd') = /etc/passwd
Unsafe uploaded filename → Pattern: file.save(file.filename) without sanitization
Denylist bypass → Pattern: if '..' in path (bypassable with URL encoding: ..%2F)
String concatenation → Pattern: '/data/' + category + '/' + filename instead of proper joining
Not using Path.resolve() → Pattern: File operations without canonicalization

Verify the Fix

Test with malicious inputs:

import pytest
from pathlib import Path

def test_path_traversal_prevention(tmp_path):
    """Test that malicious inputs are rejected."""
    validator = SecurePathValidator(str(tmp_path))

    attacks = [
        '../../../../etc/passwd',           # Unix traversal
        '..\\..\\..\\Windows\\win.ini',   # Windows traversal
        '/etc/passwd',                      # Absolute Unix path
        'C:\\Windows\\System32\\config\\SAM', # Absolute Windows path
        '..%2f..%2fetc%2fpasswd',          # URL-encoded
        '....//....//etc/passwd',          # Double-dot attempt
        'file.txt\x00.jpg',                # Null byte injection
    ]

    for attack in attacks:
        with pytest.raises(ValueError, match='Path traversal'):
            validator.validate_path(attack)

def test_valid_paths(tmp_path):
    """Test that valid paths are allowed."""
    # Create test files
    (tmp_path / 'allowed.txt').write_text('content')
    (tmp_path / 'subdir').mkdir()
    (tmp_path / 'subdir' / 'nested.txt').write_text('nested')

    validator = SecurePathValidator(str(tmp_path))

    valid_paths = ['allowed.txt', 'subdir/nested.txt']

    for path in valid_paths:
        result = validator.validate_path(path)
        assert result.exists()
        assert result.is_relative_to(tmp_path)

Manual Testing:

Try accessing: /download?file=../../../../etc/passwd
Try accessing: /download?file=/etc/shadow
Try uploading file named: ../../app.py
Try accessing: /download?file=..%2F..%2Fsettings.py (URL-encoded)
Test with symlinks pointing outside base directory
Verify legitimate files are still accessible
Test on both Windows and Linux if cross-platform

Check for Similar Issues

Search your codebase for other vulnerable patterns:

VS Code / PyCharm Search Patterns (Regex):

open\(
Path\(.*\)\.read
Path\(.*\)\.write
os\.path\.join
send_file\(
send_from_directory\(
request\.args\.get.*file
request\.form.*file
request\.GET.*file
request\.POST.*path
file\.filename
@app\.route.*download
@app\.route.*file

Common locations to check:

All Flask routes with file-related parameters
Django views handling file operations
FastAPI endpoints with file parameters
File upload handlers
Static file serving configuration
Document download endpoints
Report generation functionality
Backup/restore features
Log file access views
Template file handling
Configuration file loading from user input

Review code patterns like:

# Pattern 1: Direct file parameter in open()

@app.route('/read')
def read():
    return open(request.args['file']).read()

# Pattern 2: String concatenation for paths

path = '/data/' + user_input

# Pattern 3: Insufficient validation

if '..' in filename:
    abort(400)

# Pattern 4: Using uploaded filename directly

file.save(file.filename)

# Pattern 5: os.path.join without validation

path = os.path.join(base_dir, user_input)
open(path)

Flask specific areas:

Routes using send_file() or send_from_directory()
Static folder configuration
Custom file serving middleware
Template rendering with file paths
app.config['UPLOAD_FOLDER'] usage

Django specific areas:

Views using FileResponse or HttpResponse with files
MEDIA_ROOT and MEDIA_URL configuration
File upload handlers in forms
Static file serving in urls.py
Template tags loading files
Management commands accessing files

FastAPI specific areas:

Endpoints returning FileResponse
UploadFile parameter handling
Static file mounting with StaticFiles
Background tasks accessing files

Common Vulnerable Patterns

Direct Use of User Input in open()

# VULNERABLE - No validation of user-supplied filename

from flask import request

@app.route('/download')
def download_file():
    filename = request.args.get('file')
    with open(filename, 'rb') as f:
        return f.read()

# Attack example:

# GET /download?file=../../../../etc/passwd

# Result: Reads /etc/passwd from the server

Insufficient String-Based Validation

# VULNERABLE - Denylist can be bypassed
@app.route('/read')
def read_file():
    filename = request.args['filename']

    # Incomplete validation
    if '..' in filename:
        abort(400, 'Invalid filename')

    with open(f'/app/data/{filename}') as f:
        return f.read()

# Attack example:
# GET /read?filename=..%2F..%2Fetc%2Fpasswd
# Result: URL-encoded ".." bypasses the check

Using os.path.join with Absolute Paths

# VULNERABLE - os.path.join allows absolute paths
import os

@app.route('/file/<path:filename>')
def get_file(filename):
    # os.path.join ignores first arg if second is absolute
    full_path = os.path.join('/app/data/', filename)
    with open(full_path) as f:
        return f.read()

# Attack example:
# GET /file//etc/passwd
# Result: os.path.join('/app/data/', '/etc/passwd') = '/etc/passwd'

Trusting Uploaded Filenames

# VULNERABLE - Using original filename without sanitization
from werkzeug.utils import secure_filename

@app.route('/upload', methods=['POST'])
def upload_file():
    file = request.files['file']
    # secure_filename helps but doesn't prevent all attacks
    filename = secure_filename(file.filename)
    file.save(os.path.join('/uploads', filename))
    return 'Uploaded'

# Attack example:
# Upload file with path components in multipart name
# Still vulnerable to various encoding attacks

String Concatenation for Paths

# VULNERABLE - String concat allows injection
@app.route('/delete')
def delete_file():
    filename = request.args['file']
    filepath = '/app/temp/' + filename
    os.remove(filepath)
    return 'Deleted'

# Attack example:
# GET /delete?file=../../app.py
# Result: Deletes /app/app.py

Secure Patterns

Allowlist with pathlib

from pathlib import Path

class SecureFileService:
    def __init__(self, base_dir="/app/data"):
        self.base_dir = Path(base_dir).resolve()
        self.allowed_files = {"report.pdf", "summary.txt", "data.csv"}

    def read_file(self, filename: str) -> bytes:
        if filename not in self.allowed_files:
            raise PermissionError("File not allowed")

        candidate = (self.base_dir / filename)

        # Resolve symlinks and require existence
        real = candidate.resolve(strict=True)

        # Prevent symlink escapes (Python 3.9+)
        if not real.is_relative_to(self.base_dir):
            raise PermissionError("Invalid file location")

        if not real.is_file():
            raise FileNotFoundError("Not a file")

        return real.read_bytes()

Why this works:

Only exact allowlisted basenames are permitted (no user-controlled paths).
The file is resolved to its real path and verified to remain under the trusted base directory.
Only regular files are read.

Path Resolution with Ancestor Validation (Flexible and Secure)

from pathlib import Path

class SecurePathValidator:
    def __init__(self, base_directory: str):
        self.base_dir = Path(base_directory).resolve(strict=True)

    def validate_existing_path(self, user_path: str) -> Path:
        if not user_path:
            raise ValueError("Missing path")

        candidate = (self.base_dir / user_path)

        # Require existence so symlink resolution is meaningful
        real = candidate.resolve(strict=True)

        # Ensure the real target is within the base directory
        real.relative_to(self.base_dir)

        if not real.is_file():
            raise FileNotFoundError("Not a file")

        return real


# Usage:

validator = SecurePathValidator('/app/data')
safe_path = validator.validate_path(user_input)
content = safe_path.read_text()

Why this works:

The base directory is resolved to a real, canonical path before any validation.
User input is resolved to a real filesystem path, collapsing ./.. components and resolving symlinks (for existing paths).
relative_to() enforces that the resolved target is a descendant of the trusted base directory.
Validation is performed on the resolved filesystem path rather than raw user input, preventing traversal via path syntax.

UUID-Based Indirect References

# SECURE - Maps tokens to actual file paths

import uuid
from pathlib import Path
from typing import Dict, Optional

class SecureFileRegistry:
    def __init__(self, base_directory: str):
        self.base_dir = Path(base_directory).resolve()
        self.registry: Dict[uuid.UUID, Path] = {}

    def register_file(self, internal_path: str) -> uuid.UUID:
        """Register a file and return access token."""
        file_path = (self.base_dir / internal_path).resolve()

        # Validate file is within base directory
        try:
            file_path.relative_to(self.base_dir)
        except ValueError:
            raise ValueError(f'Invalid file path: {internal_path}')

        if not file_path.is_file():
            raise FileNotFoundError(f'File not found: {internal_path}')

        # Generate unique token
        token = uuid.uuid4()
        self.registry[token] = file_path
        return token

    def get_file(self, token: uuid.UUID) -> bytes:
        file_path = self.registry.get(token)
        if not file_path:
            raise FileNotFoundError("Invalid file token")

        real = file_path.resolve(strict=True)

        if not real.is_relative_to(self.base_dir):
            raise PermissionError("Invalid file location")

        if not real.is_file():
            raise FileNotFoundError("Not a file")

        return real.read_bytes()


# Usage:

registry = SecureFileRegistry('/app/data')
token = registry.register_file('reports/2024/q1.pdf')

# Return token to user, they can only access via this token

content = registry.get_file(token)

Why this works:

Users interact only with opaque tokens, not filesystem paths.
Tokens map to server-validated, canonical file paths under a trusted base directory.
Path traversal is prevented by enforcing containment during registration.
Because access uses server-controlled mappings, user input cannot influence path resolution at read time.
When combined with proper filesystem permissions and token scoping, this significantly reduces file access risk.

Filename Sanitization

# Filename sanitization is an input cleanup step, not a security boundary.

import re
from pathlib import Path
from typing import Set

class SecureFilenameHandler:
    ALLOWED_EXTENSIONS: Set[str] = {'.pdf', '.txt', '.csv', '.xlsx'}

    @staticmethod
    def sanitize_filename(filename: str) -> str:
        """
        Sanitize filename to prevent path traversal.

        Raises:
            ValueError: If filename is invalid or has forbidden extension
        """
        if not filename or not filename.strip():
            raise ValueError('Filename cannot be empty')

        # Get just the filename, removing any path components
        clean_name = Path(filename).name

        # Remove dangerous characters
        clean_name = re.sub(r'[^a-zA-Z0-9._-]', '_', clean_name)

        # Validate extension
        extension = Path(clean_name).suffix.lower()
        if extension not in SecureFilenameHandler.ALLOWED_EXTENSIONS:
            raise ValueError(f'File type not allowed: {extension}')

        return clean_name

# Usage:

safe_filename = SecureFilenameHandler.sanitize_filename(user_input)
file_path = Path('/uploads') / safe_filename

Why this works:

Path(filename).name discards all directory components, ensuring the result is a simple filename.
Character allowlisting reduces problematic characters for filesystem use and logging.
Extension allowlisting enforces a restricted set of accepted file types.
This provides filename hygiene only; real-path containment and safe file handling must still be enforced when accessing the filesystem.

Framework-Specific Guidance

Django - Secure File Handling

# SECURE - Django file upload and serving

from django.core.files.storage import FileSystemStorage
from django.core.exceptions import SuspiciousFileOperation
from django.utils.text import get_valid_filename
from pathlib import PurePosixPath

class SecureFileStorage(FileSystemStorage):
    def generate_filename(self, filename):
        # Django expects POSIX-style paths for storage names (even on Windows)
        p = PurePosixPath(filename)

        if p.is_absolute() or ".." in p.parts:
            raise SuspiciousFileOperation("Invalid upload path")

        # Clean only the final component; keep upload_to subdirs
        safe_name = get_valid_filename(p.name)
        return str(p.with_name(safe_name))

# In models.py:

from django.db import models

class Document(models.Model):
    file = models.FileField(
        upload_to='documents/%Y/%m/',
        storage=SecureFileStorage()
    )
    uploaded_at = models.DateTimeField(auto_now_add=True)

# Secure file serving view:

from django.http import FileResponse, Http404
from django.views import View
from pathlib import Path

class SecureFileDownloadView(View):
    def get(self, request, file_id):
        try:
            doc = Document.objects.get(id=file_id)
        except Document.DoesNotExist:
            raise Http404("File not found")

        # TODO: enforce authorization (owner/role/tenant checks) here.

        fh = doc.file.open("rb")
        return FileResponse(
            fh,
            as_attachment=True,
            filename=Path(doc.file.name).name,  # storage name, not local path
        )


# settings.py:

MEDIA_ROOT = '/var/app/media/'
MEDIA_URL = '/media/'

Why this works:

Upload names are validated during Django’s filename generation, rejecting absolute paths and .. segments.
Filenames are cleaned using Django utilities, reducing unsafe characters in stored names.
Files are accessed through Django’s storage API (doc.file.open()), avoiding direct filesystem path handling in views.
Downloads should enforce authorization on the model object to prevent IDOR (streaming alone is not access control).

Flask - Secure File Operations

import uuid
from pathlib import Path
from flask import Flask, request, abort, send_from_directory
from werkzeug.utils import secure_filename

app = Flask(__name__)

UPLOAD_FOLDER = Path("/var/app/uploads").resolve()
ALLOWED_EXTENSIONS = {"pdf", "txt", "csv", "xlsx"}

def allowed_file(filename: str) -> bool:
    return "." in filename and filename.rsplit(".", 1)[1].lower() in ALLOWED_EXTENSIONS

def save_upload(file) -> str:
    if not file or not file.filename:
        raise ValueError("No file provided")

    original = secure_filename(file.filename)
    if not original or not allowed_file(original):
        raise ValueError("Invalid file type")

    ext = original.rsplit(".", 1)[1].lower()
    stored = f"{uuid.uuid4().hex}.{ext}"   # server-controlled name

    UPLOAD_FOLDER.mkdir(parents=True, exist_ok=True)
    dest = UPLOAD_FOLDER / stored

    # Avoid overwrites; fail if exists
    if dest.exists():
        raise ValueError("Upload collision")

    file.save(str(dest))
    return stored

@app.route("/upload", methods=["POST"])
def upload_file():
    f = request.files.get("file")
    if not f:
        abort(400, "No file provided")
    try:
        stored_name = save_upload(f)
        return {"file": stored_name}, 200
    except ValueError as e:
        abort(400, str(e))

@app.route("/download/<path:filename>")
def download_file(filename):
    # TODO: enforce authorization here (token/user/tenant checks)

    return send_from_directory(
        UPLOAD_FOLDER,
        filename,
        as_attachment=True,
        download_name=filename,  # Flask>=2.0; optional
    )

Why this works:

Uploads are stored under a fixed, server-controlled directory with server-generated filenames (no user-controlled paths).
Filenames are sanitized and restricted to an extension allowlist (policy enforcement).
Downloads use send_from_directory(), which applies Werkzeug’s safe_join() to prevent path traversal outside the upload directory.
Authorization should be enforced on download requests; safe path handling alone does not prevent IDOR.

FastAPI - Async Secure File Handling

import re
import uuid
from pathlib import Path
from fastapi import FastAPI, UploadFile, HTTPException, File
from fastapi.responses import FileResponse

app = FastAPI()

class FastAPISecureFileHandler:
    UPLOAD_DIR = Path("/var/app/uploads").resolve()
    MAX_FILE_SIZE = 10 * 1024 * 1024  # 10MB

    # Keep both; content_type is advisory, extension is policy
    ALLOWED = {
        "application/pdf": ".pdf",
        "text/plain": ".txt",
        "text/csv": ".csv",
    }

    @classmethod
    async def save_upload(cls, file: UploadFile) -> str:
        ext = cls.ALLOWED.get(file.content_type)
        if not ext:
            raise HTTPException(400, f"File type not allowed: {file.content_type}")

        cls.UPLOAD_DIR.mkdir(parents=True, exist_ok=True)

        safe_filename = f"{uuid.uuid4().hex}{ext}"
        file_path = cls.UPLOAD_DIR / safe_filename

        try:
            size = 0
            with file_path.open("wb") as f:
                while True:
                    chunk = await file.read(8192)
                    if not chunk:
                        break
                    size += len(chunk)
                    if size > cls.MAX_FILE_SIZE:
                        raise HTTPException(413, "File too large")
                    f.write(chunk)
        except Exception:
            if file_path.exists():
                file_path.unlink()
            raise
        finally:
            await file.close()

        return safe_filename

@app.post("/upload")
async def upload_file(file: UploadFile = File(...)):
    return {"filename": await FastAPISecureFileHandler.save_upload(file)}

@app.get("/download/{filename}")
async def download_file(filename: str):
    # UUID hex + approved extension
    if not re.fullmatch(r"[0-9a-f]{32}\.(pdf|txt|csv)", filename):
        raise HTTPException(400, "Invalid filename")

    file_path = FastAPISecureFileHandler.UPLOAD_DIR / filename
    if not file_path.is_file():
        raise HTTPException(404, "File not found")

    # TODO: enforce authorization (user/tenant ownership) here.

    return FileResponse(path=file_path, filename=filename, media_type="application/octet-stream")

Why this works:

Files are stored under a fixed server-controlled directory using server-generated UUID filenames (no user-controlled paths).
Uploads are constrained by an allowlist (size limit plus approved types/extensions), reducing abuse and DoS risk.
Downloads only serve filenames matching a strict UUID+extension pattern, preventing path traversal via crafted names.
Partial files are removed on error, avoiding accumulation of incomplete uploads.
Authorization should still be enforced on downloads; unguessable names reduce risk but are not access control.

Verification

After implementing the recommended secure patterns, verify the fix through multiple approaches:

Manual testing: Submit malicious payloads relevant to this vulnerability and confirm they're handled safely without executing unintended operations
Code review: Confirm all instances use the secure pattern (parameterized queries, safe APIs, proper encoding) with no string concatenation or unsafe operations
Static analysis: Use security scanners to verify no new vulnerabilities exist and the original finding is resolved
Regression testing: Ensure legitimate user inputs and application workflows continue to function correctly
Edge case validation: Test with special characters, boundary conditions, and unusual inputs to verify proper handling
Framework verification: If using a framework or library, confirm the recommended APIs are used correctly according to documentation
Authentication/session testing: Verify security controls remain effective and cannot be bypassed (if applicable to the vulnerability type)
Rescan: Run the security scanner again to confirm the finding is resolved and no new issues were introduced

CWE-73: External Control of File Name or Path - Python

Overview

Remediation Steps

Locate the Finding

Understand the Data Flow

Identify the Pattern

Verify the Fix

Check for Similar Issues

Common Vulnerable Patterns

Direct Use of User Input in open()

Insufficient String-Based Validation

Using os.path.join with Absolute Paths

Trusting Uploaded Filenames

String Concatenation for Paths

Secure Patterns

Allowlist with pathlib

Path Resolution with Ancestor Validation (Flexible and Secure)

UUID-Based Indirect References

Filename Sanitization

Framework-Specific Guidance

Django - Secure File Handling

Flask - Secure File Operations

FastAPI - Async Secure File Handling

Verification

Additional Resources