CWE-134: Use of Externally-Controlled Format String

Overview

Format string vulnerabilities occur when untrusted user input is used as a format string in functions like printf, sprintf, or String.format, allowing attackers to read from or write to arbitrary memory locations, leak sensitive data, or crash the application.

Risk

Critical: Format string attacks enable reading arbitrary memory (passwords, keys), writing arbitrary memory (code execution), crashing applications (DoS), and bypassing security controls. Attackers can use format specifiers like %x, %n, and %s to exploit these vulnerabilities.

Remediation Steps

Core principle: Never use untrusted input as a format string; keep format templates constant.

Locate the Format String Vulnerability

Review the security findings to identify where untrusted data is used as a format string:

Find the vulnerable call: Identify the format function (printf, sprintf, String.format, logger.log)
Identify the source: Determine where the format string comes from (user input, external file, database, network request)
Trace the data flow: Review how untrusted data reaches the format function
Check if user input is the format: Verify if untrusted data is used as the format string itself

Use Literal Format Strings

Replace user-controlled format strings with literal, static formats:

Always use string literals: Format string should be a compile-time constant, never user input
User data as arguments only: Pass user input as arguments to the format function, not as the format itself
Correct pattern: printf("%s", user_input) not printf(user_input)
Log frameworks: Use parameterized logging: logger.info("User: {}", username) not logger.info(username)
Never concatenate: Don't build format strings by concatenating user input

Use Safe Formatting APIs

Leverage language-specific safe formatting methods:

C/C++: Use static format strings: printf("%s %d", user_str, value) - format is literal
Java: String.format("%s", userInput) - format is literal string
Python: Use f-strings or % formatting with literal format: f"User: {user_input}" or "User: %s" % user_input
C#: String.Format("{0}", userInput) - format is literal
Structured logging: Use JSON or key-value logging to separate data from structure

Validate If Format Must Be Dynamic

If the format truly must be dynamic (rare), apply strict validation:

Use allowlist: Only allow known-good format strings from a predefined list
Remove format specifiers: Strip %, {, } and other format characters from user input
Validate against pattern: Use regex to ensure format matches expected pattern
Reject dangerous characters: Block %n, %s, %x, and other format specifiers
Consider alternatives: Question if dynamic formatting is really necessary

Enable Compiler Protections and Static Analysis

Use tools to detect format string vulnerabilities:

Compiler warnings: Enable -Wformat -Wformat-security (GCC/Clang) to detect format issues
Fortify Source: Compile with -D_FORTIFY_SOURCE=2 for runtime format checking
Static analyzers: Run Static Analysis tools regularly as part of your SDLC
Code review: Have security-aware developers review format string usage
Linters: Use language-specific linters that detect format string issues

Test with Format String Payloads

Verify the fix prevents format string attacks:

Test with %x specifiers: Try %x %x %x %x (should print as literal, not leak stack)
Test with %s specifier: Try %s (should print literal, not crash on invalid pointer)
Test with %n specifier: Try %n (should print literal, not write to memory)
Test with position specifiers: Try %7$x (should be harmless)
Verify normal formatting: Ensure legitimate format operations still work

Common Vulnerable Patterns

Using user input as format string in C

#include <stdio.h>

void log_message(const char *user_input) {
    // CRITICAL VULNERABILITY: user input as format string
    printf(user_input);  // Attacker can use %x, %s, %n
    // Also vulnerable
    char buffer[100];
    sprintf(buffer, user_input);

    // Vulnerable logging
    syslog(LOG_INFO, user_input);
}

// Attack examples:
// user_input = "%x %x %x %x"  -> Leaks stack memory
// user_input = "%s"            -> Reads from invalid pointer, crashes
// user_input = "%n"            -> Writes to memory, code execution

sprintf(buffer, user_input) - user controls format
logger.log(user_message) - unvalidated log format
String.format(user_template, args) - user controls template

Secure Patterns

Use literal format strings in C

#include <stdio.h>
#include <syslog.h>

void log_message(const char *user_input) {
    // Safe: literal format string, user input as argument
    printf("%s", user_input);

    char buffer[100];
    // Safe: snprintf with literal format
    snprintf(buffer, sizeof(buffer), "User message: %s", user_input);

    // Safe: user input is argument, not format
    syslog(LOG_INFO, "User message: %s", user_input);
}

void formatted_output(const char *username, int score) {
    // Safe: all format specifiers are literal
    printf("User %s scored %d points\n", username, score);
}

Why this works: The format string is a compile-time constant that cannot be influenced by user input. User data is passed as arguments to be formatted, not as the format string itself. This prevents attackers from injecting format specifiers (%x, %s, %n) that could leak memory or achieve code execution.

Use parameterized formatting in Java

public class SecureLogging {
    private static final Logger logger = Logger.getLogger("App");

    public void logUserAction(String userMessage, String username) {
        // Safe: literal format string
        String message = String.format("User action: %s", userMessage);

        // Safe: parameterized logging
        logger.log(Level.INFO, "User {0} performed action: {1}", 
                   new Object[]{username, userMessage});

        // Best: use parameterized logging with placeholders
        logger.info("User {} performed action: {}", username, userMessage);
    }

    public void formatOutput(String username, int score) {
        // Safe: literal format with user data as arguments
        String output = String.format(
            "User %s scored %d points", username, score);
        System.out.println(output);
    }
}

Why this works: The format template is a string literal controlled by the developer. User data is passed as separate arguments that get substituted into placeholders. Java's String.format prevents format string attacks by separating template from data.

Use safe formatting in Python

import logging

def log_message(user_input):
    # VULNERABLE - user input in format string
    # logging.info(user_input)  # DON'T DO THIS

    # Safe: user input as argument
    logging.info("User message: %s", user_input)

    # Safe: f-string with user input (Python 3.6+)
    logging.info(f"User message: {user_input}")

    # Safe: format() with literal template
    logging.info("User {} sent message: {}".format("username", user_input))

def format_output(username, score):
    # Safe: literal format string
    message = "User %s scored %d points" % (username, score)
    print(message)

    # Safe: f-string
    print(f"User {username} scored {score} points")

Why this works: Using %s with user input as an argument (not the format string) prevents format string injection. F-strings and .format() with literal templates ensure user data is safely interpolated without interpretation as format specifiers.

Use safe formatting in PHP

<?php
function log_message($user_input) {
    // VULNERABLE
    // printf($user_input);  // DON'T DO THIS

    // Safe: literal format, user input as argument
    printf("User message: %s", $user_input);

    // Safe: use sprintf with literal format
    $message = sprintf("User message: %s", $user_input);
    error_log($message);
}

function format_output($username, $score) {
    // Safe: literal format string
    printf("User %s scored %d points\n", $username, $score);
}

Why this works: The format string is always a literal string constant. User data is passed as arguments to be safely formatted. PHP's printf/sprintf functions safely handle user data when passed as arguments rather than as the format specification.