Skip to content

CWE-73: External Control of File Name or Path - Python

Overview

External control of file names or paths is a critical vulnerability in Python applications that occurs when user-supplied input is used to construct file system paths without proper validation. Python's flexible file handling with open(), os.path, and pathlib modules provides powerful capabilities but minimal built-in protection against path traversal.

Primary Defence: Use Path.resolve() with relative_to() validation to ensure canonicalized paths (with symlinks resolved) stay within the intended base directory, implement allowlists for known file sets, use UUID-based indirect reference maps for sensitive file access, and use Flask's send_from_directory() or sanitize filenames with Path(filename).name combined with extension validation to prevent path traversal, absolute path injection, and symlink attacks.

Remediation Steps

Locate the Finding

Identify the Source (User Input)

  • Look for where untrusted data enters the application:
    • Flask: request.args.get(), request.form[], request.json[]
    • Django: request.GET[], request.POST[], request.FILES[]
    • FastAPI: Function parameters with Query(), Path(), Form()
    • Form fields: File upload filename attributes
    • URL path parameters
    • HTTP headers
    • JSON/XML API payloads

Identify the Sink (File Operation)

  • Trace to where the data reaches file system operations:
    • open(path), open(filename, 'r')
    • Path(user_input).read_text(), Path().write_bytes()
    • os.remove(), os.unlink()
    • shutil.copy(), shutil.move()
    • os.path.join(), os.path.exists()
    • send_file(), send_from_directory() (Flask)
    • FileResponse() (FastAPI)

Example from Security Scan Finding:

# Source: User-controlled filename from query parameter

@app.route('/download')
def download_file():
    filename = request.args.get('file')  # ← SOURCE

    # Sink: File operation using untrusted input
    with open(filename, 'rb') as f:  # ← SINK
        return f.read()

Understand the Data Flow

Review the data flow in the security finding:

  1. Entry Point: Where user input enters (route handler, view function)
  2. Intermediate Processing: Any transformations or validations (often insufficient)
  3. File Operation: The vulnerable sink where file access occurs

Key Questions:

  • Is there any validation between source and sink?
  • Are there string operations that might be bypassable? (if '..' in path, .replace())
  • Does the path go through os.path.join() without validation?
  • Is there any path resolution with Path.resolve() or os.path.abspath()?
  • Are legacy os.path functions used instead of modern pathlib?

Common Data Flow Pattern:

@app.route('/read')  # ← Entry point
def read_file():
    filename = request.args['filename']  # ← Source

    # Insufficient validation (bypassable)
    if '..' in filename:
        abort(400, 'Invalid filename')

    # String concatenation for path
    path = '/app/data/' + filename  # ← Transformation

    with open(path) as f:  # ← Sink
        return f.read()

Identify the Pattern

Match the code to vulnerable patterns:

  • Absolute path injection → Pattern: open(user_input) where user_input = /etc/passwd
  • Directory traversal → Pattern: open(f'/uploads/{filename}') where filename = ../../etc/passwd
  • os.path.join with absolute path → Pattern: os.path.join('/data', '/etc/passwd') = /etc/passwd
  • Unsafe uploaded filename → Pattern: file.save(file.filename) without sanitization
  • Denylist bypass → Pattern: if '..' in path (bypassable with URL encoding: ..%2F)
  • String concatenation → Pattern: '/data/' + category + '/' + filename instead of proper joining
  • Not using Path.resolve() → Pattern: File operations without canonicalization

Verify the Fix

Test with malicious inputs:

import pytest
from pathlib import Path

def test_path_traversal_prevention(tmp_path):
    """Test that malicious inputs are rejected."""
    validator = SecurePathValidator(str(tmp_path))

    attacks = [
        '../../../../etc/passwd',           # Unix traversal
        '..\\..\\..\\Windows\\win.ini',   # Windows traversal
        '/etc/passwd',                      # Absolute Unix path
        'C:\\Windows\\System32\\config\\SAM', # Absolute Windows path
        '..%2f..%2fetc%2fpasswd',          # URL-encoded
        '....//....//etc/passwd',          # Double-dot attempt
        'file.txt\x00.jpg',                # Null byte injection
    ]

    for attack in attacks:
        with pytest.raises(ValueError, match='Path traversal'):
            validator.validate_path(attack)

def test_valid_paths(tmp_path):
    """Test that valid paths are allowed."""
    # Create test files
    (tmp_path / 'allowed.txt').write_text('content')
    (tmp_path / 'subdir').mkdir()
    (tmp_path / 'subdir' / 'nested.txt').write_text('nested')

    validator = SecurePathValidator(str(tmp_path))

    valid_paths = ['allowed.txt', 'subdir/nested.txt']

    for path in valid_paths:
        result = validator.validate_path(path)
        assert result.exists()
        assert result.is_relative_to(tmp_path)

Manual Testing:

  1. Try accessing: /download?file=../../../../etc/passwd
  2. Try accessing: /download?file=/etc/shadow
  3. Try uploading file named: ../../app.py
  4. Try accessing: /download?file=..%2F..%2Fsettings.py (URL-encoded)
  5. Test with symlinks pointing outside base directory
  6. Verify legitimate files are still accessible
  7. Test on both Windows and Linux if cross-platform

Check for Similar Issues

Search your codebase for other vulnerable patterns:

VS Code / PyCharm Search Patterns (Regex):

open\(
Path\(.*\)\.read
Path\(.*\)\.write
os\.path\.join
send_file\(
send_from_directory\(
request\.args\.get.*file
request\.form.*file
request\.GET.*file
request\.POST.*path
file\.filename
@app\.route.*download
@app\.route.*file

Common locations to check:

  • All Flask routes with file-related parameters
  • Django views handling file operations
  • FastAPI endpoints with file parameters
  • File upload handlers
  • Static file serving configuration
  • Document download endpoints
  • Report generation functionality
  • Backup/restore features
  • Log file access views
  • Template file handling
  • Configuration file loading from user input

Review code patterns like:

# Pattern 1: Direct file parameter in open()

@app.route('/read')
def read():
    return open(request.args['file']).read()

# Pattern 2: String concatenation for paths

path = '/data/' + user_input

# Pattern 3: Insufficient validation

if '..' in filename:
    abort(400)

# Pattern 4: Using uploaded filename directly

file.save(file.filename)

# Pattern 5: os.path.join without validation

path = os.path.join(base_dir, user_input)
open(path)

Flask specific areas:

  • Routes using send_file() or send_from_directory()
  • Static folder configuration
  • Custom file serving middleware
  • Template rendering with file paths
  • app.config['UPLOAD_FOLDER'] usage

Django specific areas:

  • Views using FileResponse or HttpResponse with files
  • MEDIA_ROOT and MEDIA_URL configuration
  • File upload handlers in forms
  • Static file serving in urls.py
  • Template tags loading files
  • Management commands accessing files

FastAPI specific areas:

  • Endpoints returning FileResponse
  • UploadFile parameter handling
  • Static file mounting with StaticFiles
  • Background tasks accessing files

Common Vulnerable Patterns

Direct Use of User Input in open()

# VULNERABLE - No validation of user-supplied filename

from flask import request

@app.route('/download')
def download_file():
    filename = request.args.get('file')
    with open(filename, 'rb') as f:
        return f.read()

# Attack example:

# GET /download?file=../../../../etc/passwd

# Result: Reads /etc/passwd from the server

Insufficient String-Based Validation

# VULNERABLE - Denylist can be bypassed
@app.route('/read')
def read_file():
    filename = request.args['filename']

    # Incomplete validation
    if '..' in filename:
        abort(400, 'Invalid filename')

    with open(f'/app/data/{filename}') as f:
        return f.read()

# Attack example:
# GET /read?filename=..%2F..%2Fetc%2Fpasswd
# Result: URL-encoded ".." bypasses the check

Using os.path.join with Absolute Paths

# VULNERABLE - os.path.join allows absolute paths
import os

@app.route('/file/<path:filename>')
def get_file(filename):
    # os.path.join ignores first arg if second is absolute
    full_path = os.path.join('/app/data/', filename)
    with open(full_path) as f:
        return f.read()

# Attack example:
# GET /file//etc/passwd
# Result: os.path.join('/app/data/', '/etc/passwd') = '/etc/passwd'

Trusting Uploaded Filenames

# VULNERABLE - Using original filename without sanitization
from werkzeug.utils import secure_filename

@app.route('/upload', methods=['POST'])
def upload_file():
    file = request.files['file']
    # secure_filename helps but doesn't prevent all attacks
    filename = secure_filename(file.filename)
    file.save(os.path.join('/uploads', filename))
    return 'Uploaded'

# Attack example:
# Upload file with path components in multipart name
# Still vulnerable to various encoding attacks

String Concatenation for Paths

# VULNERABLE - String concat allows injection
@app.route('/delete')
def delete_file():
    filename = request.args['file']
    filepath = '/app/temp/' + filename
    os.remove(filepath)
    return 'Deleted'

# Attack example:
# GET /delete?file=../../app.py
# Result: Deletes /app/app.py

Secure Patterns

Allowlist with pathlib

from pathlib import Path

class SecureFileService:
    def __init__(self, base_dir="/app/data"):
        self.base_dir = Path(base_dir).resolve()
        self.allowed_files = {"report.pdf", "summary.txt", "data.csv"}

    def read_file(self, filename: str) -> bytes:
        if filename not in self.allowed_files:
            raise PermissionError("File not allowed")

        candidate = (self.base_dir / filename)

        # Resolve symlinks and require existence
        real = candidate.resolve(strict=True)

        # Prevent symlink escapes (Python 3.9+)
        if not real.is_relative_to(self.base_dir):
            raise PermissionError("Invalid file location")

        if not real.is_file():
            raise FileNotFoundError("Not a file")

        return real.read_bytes()

Why this works:

  • Only exact allowlisted basenames are permitted (no user-controlled paths).
  • The file is resolved to its real path and verified to remain under the trusted base directory.
  • Only regular files are read.

Path Resolution with Ancestor Validation (Flexible and Secure)

from pathlib import Path

class SecurePathValidator:
    def __init__(self, base_directory: str):
        self.base_dir = Path(base_directory).resolve(strict=True)

    def validate_existing_path(self, user_path: str) -> Path:
        if not user_path:
            raise ValueError("Missing path")

        candidate = (self.base_dir / user_path)

        # Require existence so symlink resolution is meaningful
        real = candidate.resolve(strict=True)

        # Ensure the real target is within the base directory
        real.relative_to(self.base_dir)

        if not real.is_file():
            raise FileNotFoundError("Not a file")

        return real


# Usage:

validator = SecurePathValidator('/app/data')
safe_path = validator.validate_path(user_input)
content = safe_path.read_text()

Why this works:

  • The base directory is resolved to a real, canonical path before any validation.
  • User input is resolved to a real filesystem path, collapsing ./.. components and resolving symlinks (for existing paths).
  • relative_to() enforces that the resolved target is a descendant of the trusted base directory.
  • Validation is performed on the resolved filesystem path rather than raw user input, preventing traversal via path syntax.

UUID-Based Indirect References

# SECURE - Maps tokens to actual file paths

import uuid
from pathlib import Path
from typing import Dict, Optional

class SecureFileRegistry:
    def __init__(self, base_directory: str):
        self.base_dir = Path(base_directory).resolve()
        self.registry: Dict[uuid.UUID, Path] = {}

    def register_file(self, internal_path: str) -> uuid.UUID:
        """Register a file and return access token."""
        file_path = (self.base_dir / internal_path).resolve()

        # Validate file is within base directory
        try:
            file_path.relative_to(self.base_dir)
        except ValueError:
            raise ValueError(f'Invalid file path: {internal_path}')

        if not file_path.is_file():
            raise FileNotFoundError(f'File not found: {internal_path}')

        # Generate unique token
        token = uuid.uuid4()
        self.registry[token] = file_path
        return token

    def get_file(self, token: uuid.UUID) -> bytes:
        file_path = self.registry.get(token)
        if not file_path:
            raise FileNotFoundError("Invalid file token")

        real = file_path.resolve(strict=True)

        if not real.is_relative_to(self.base_dir):
            raise PermissionError("Invalid file location")

        if not real.is_file():
            raise FileNotFoundError("Not a file")

        return real.read_bytes()


# Usage:

registry = SecureFileRegistry('/app/data')
token = registry.register_file('reports/2024/q1.pdf')

# Return token to user, they can only access via this token

content = registry.get_file(token)

Why this works:

  • Users interact only with opaque tokens, not filesystem paths.
  • Tokens map to server-validated, canonical file paths under a trusted base directory.
  • Path traversal is prevented by enforcing containment during registration.
  • Because access uses server-controlled mappings, user input cannot influence path resolution at read time.
  • When combined with proper filesystem permissions and token scoping, this significantly reduces file access risk.

Filename Sanitization

# Filename sanitization is an input cleanup step, not a security boundary.

import re
from pathlib import Path
from typing import Set

class SecureFilenameHandler:
    ALLOWED_EXTENSIONS: Set[str] = {'.pdf', '.txt', '.csv', '.xlsx'}

    @staticmethod
    def sanitize_filename(filename: str) -> str:
        """
        Sanitize filename to prevent path traversal.

        Raises:
            ValueError: If filename is invalid or has forbidden extension
        """
        if not filename or not filename.strip():
            raise ValueError('Filename cannot be empty')

        # Get just the filename, removing any path components
        clean_name = Path(filename).name

        # Remove dangerous characters
        clean_name = re.sub(r'[^a-zA-Z0-9._-]', '_', clean_name)

        # Validate extension
        extension = Path(clean_name).suffix.lower()
        if extension not in SecureFilenameHandler.ALLOWED_EXTENSIONS:
            raise ValueError(f'File type not allowed: {extension}')

        return clean_name

# Usage:

safe_filename = SecureFilenameHandler.sanitize_filename(user_input)
file_path = Path('/uploads') / safe_filename

Why this works:

  • Path(filename).name discards all directory components, ensuring the result is a simple filename.
  • Character allowlisting reduces problematic characters for filesystem use and logging.
  • Extension allowlisting enforces a restricted set of accepted file types.
  • This provides filename hygiene only; real-path containment and safe file handling must still be enforced when accessing the filesystem.

Framework-Specific Guidance

Django - Secure File Handling

# SECURE - Django file upload and serving

from django.core.files.storage import FileSystemStorage
from django.core.exceptions import SuspiciousFileOperation
from django.utils.text import get_valid_filename
from pathlib import PurePosixPath

class SecureFileStorage(FileSystemStorage):
    def generate_filename(self, filename):
        # Django expects POSIX-style paths for storage names (even on Windows)
        p = PurePosixPath(filename)

        if p.is_absolute() or ".." in p.parts:
            raise SuspiciousFileOperation("Invalid upload path")

        # Clean only the final component; keep upload_to subdirs
        safe_name = get_valid_filename(p.name)
        return str(p.with_name(safe_name))

# In models.py:

from django.db import models

class Document(models.Model):
    file = models.FileField(
        upload_to='documents/%Y/%m/',
        storage=SecureFileStorage()
    )
    uploaded_at = models.DateTimeField(auto_now_add=True)

# Secure file serving view:

from django.http import FileResponse, Http404
from django.views import View
from pathlib import Path

class SecureFileDownloadView(View):
    def get(self, request, file_id):
        try:
            doc = Document.objects.get(id=file_id)
        except Document.DoesNotExist:
            raise Http404("File not found")

        # TODO: enforce authorization (owner/role/tenant checks) here.

        fh = doc.file.open("rb")
        return FileResponse(
            fh,
            as_attachment=True,
            filename=Path(doc.file.name).name,  # storage name, not local path
        )


# settings.py:

MEDIA_ROOT = '/var/app/media/'
MEDIA_URL = '/media/'

Why this works:

  • Upload names are validated during Django’s filename generation, rejecting absolute paths and .. segments.
  • Filenames are cleaned using Django utilities, reducing unsafe characters in stored names.
  • Files are accessed through Django’s storage API (doc.file.open()), avoiding direct filesystem path handling in views.
  • Downloads should enforce authorization on the model object to prevent IDOR (streaming alone is not access control).

Flask - Secure File Operations

import uuid
from pathlib import Path
from flask import Flask, request, abort, send_from_directory
from werkzeug.utils import secure_filename

app = Flask(__name__)

UPLOAD_FOLDER = Path("/var/app/uploads").resolve()
ALLOWED_EXTENSIONS = {"pdf", "txt", "csv", "xlsx"}

def allowed_file(filename: str) -> bool:
    return "." in filename and filename.rsplit(".", 1)[1].lower() in ALLOWED_EXTENSIONS

def save_upload(file) -> str:
    if not file or not file.filename:
        raise ValueError("No file provided")

    original = secure_filename(file.filename)
    if not original or not allowed_file(original):
        raise ValueError("Invalid file type")

    ext = original.rsplit(".", 1)[1].lower()
    stored = f"{uuid.uuid4().hex}.{ext}"   # server-controlled name

    UPLOAD_FOLDER.mkdir(parents=True, exist_ok=True)
    dest = UPLOAD_FOLDER / stored

    # Avoid overwrites; fail if exists
    if dest.exists():
        raise ValueError("Upload collision")

    file.save(str(dest))
    return stored

@app.route("/upload", methods=["POST"])
def upload_file():
    f = request.files.get("file")
    if not f:
        abort(400, "No file provided")
    try:
        stored_name = save_upload(f)
        return {"file": stored_name}, 200
    except ValueError as e:
        abort(400, str(e))

@app.route("/download/<path:filename>")
def download_file(filename):
    # TODO: enforce authorization here (token/user/tenant checks)

    return send_from_directory(
        UPLOAD_FOLDER,
        filename,
        as_attachment=True,
        download_name=filename,  # Flask>=2.0; optional
    )

Why this works:

  • Uploads are stored under a fixed, server-controlled directory with server-generated filenames (no user-controlled paths).
  • Filenames are sanitized and restricted to an extension allowlist (policy enforcement).
  • Downloads use send_from_directory(), which applies Werkzeug’s safe_join() to prevent path traversal outside the upload directory.
  • Authorization should be enforced on download requests; safe path handling alone does not prevent IDOR.

FastAPI - Async Secure File Handling

import re
import uuid
from pathlib import Path
from fastapi import FastAPI, UploadFile, HTTPException, File
from fastapi.responses import FileResponse

app = FastAPI()

class FastAPISecureFileHandler:
    UPLOAD_DIR = Path("/var/app/uploads").resolve()
    MAX_FILE_SIZE = 10 * 1024 * 1024  # 10MB

    # Keep both; content_type is advisory, extension is policy
    ALLOWED = {
        "application/pdf": ".pdf",
        "text/plain": ".txt",
        "text/csv": ".csv",
    }

    @classmethod
    async def save_upload(cls, file: UploadFile) -> str:
        ext = cls.ALLOWED.get(file.content_type)
        if not ext:
            raise HTTPException(400, f"File type not allowed: {file.content_type}")

        cls.UPLOAD_DIR.mkdir(parents=True, exist_ok=True)

        safe_filename = f"{uuid.uuid4().hex}{ext}"
        file_path = cls.UPLOAD_DIR / safe_filename

        try:
            size = 0
            with file_path.open("wb") as f:
                while True:
                    chunk = await file.read(8192)
                    if not chunk:
                        break
                    size += len(chunk)
                    if size > cls.MAX_FILE_SIZE:
                        raise HTTPException(413, "File too large")
                    f.write(chunk)
        except Exception:
            if file_path.exists():
                file_path.unlink()
            raise
        finally:
            await file.close()

        return safe_filename

@app.post("/upload")
async def upload_file(file: UploadFile = File(...)):
    return {"filename": await FastAPISecureFileHandler.save_upload(file)}

@app.get("/download/{filename}")
async def download_file(filename: str):
    # UUID hex + approved extension
    if not re.fullmatch(r"[0-9a-f]{32}\.(pdf|txt|csv)", filename):
        raise HTTPException(400, "Invalid filename")

    file_path = FastAPISecureFileHandler.UPLOAD_DIR / filename
    if not file_path.is_file():
        raise HTTPException(404, "File not found")

    # TODO: enforce authorization (user/tenant ownership) here.

    return FileResponse(path=file_path, filename=filename, media_type="application/octet-stream")

Why this works:

  • Files are stored under a fixed server-controlled directory using server-generated UUID filenames (no user-controlled paths).
  • Uploads are constrained by an allowlist (size limit plus approved types/extensions), reducing abuse and DoS risk.
  • Downloads only serve filenames matching a strict UUID+extension pattern, preventing path traversal via crafted names.
  • Partial files are removed on error, avoiding accumulation of incomplete uploads.
  • Authorization should still be enforced on downloads; unguessable names reduce risk but are not access control.

Verification

After implementing the recommended secure patterns, verify the fix through multiple approaches:

  • Manual testing: Submit malicious payloads relevant to this vulnerability and confirm they're handled safely without executing unintended operations
  • Code review: Confirm all instances use the secure pattern (parameterized queries, safe APIs, proper encoding) with no string concatenation or unsafe operations
  • Static analysis: Use security scanners to verify no new vulnerabilities exist and the original finding is resolved
  • Regression testing: Ensure legitimate user inputs and application workflows continue to function correctly
  • Edge case validation: Test with special characters, boundary conditions, and unusual inputs to verify proper handling
  • Framework verification: If using a framework or library, confirm the recommended APIs are used correctly according to documentation
  • Authentication/session testing: Verify security controls remain effective and cannot be bypassed (if applicable to the vulnerability type)
  • Rescan: Run the security scanner again to confirm the finding is resolved and no new issues were introduced

Additional Resources