CWE-134: Use of Externally-Controlled Format String
Overview
Format string vulnerabilities occur when untrusted user input is used as a format string in functions like printf, sprintf, or String.format, allowing attackers to read from or write to arbitrary memory locations, leak sensitive data, or crash the application.
Risk
Critical: Format string attacks enable reading arbitrary memory (passwords, keys), writing arbitrary memory (code execution), crashing applications (DoS), and bypassing security controls. Attackers can use format specifiers like %x, %n, and %s to exploit these vulnerabilities.
Remediation Steps
Core principle: Never use untrusted input as a format string; keep format templates constant.
Locate the Format String Vulnerability
Review the security findings to identify where untrusted data is used as a format string:
- Find the vulnerable call: Identify the format function (printf, sprintf, String.format, logger.log)
- Identify the source: Determine where the format string comes from (user input, external file, database, network request)
- Trace the data flow: Review how untrusted data reaches the format function
- Check if user input is the format: Verify if untrusted data is used as the format string itself
Use Literal Format Strings
Replace user-controlled format strings with literal, static formats:
- Always use string literals: Format string should be a compile-time constant, never user input
- User data as arguments only: Pass user input as arguments to the format function, not as the format itself
- Correct pattern:
printf("%s", user_input)notprintf(user_input) - Log frameworks: Use parameterized logging:
logger.info("User: {}", username)notlogger.info(username) - Never concatenate: Don't build format strings by concatenating user input
Use Safe Formatting APIs
Leverage language-specific safe formatting methods:
- C/C++: Use static format strings:
printf("%s %d", user_str, value)- format is literal - Java:
String.format("%s", userInput)- format is literal string - Python: Use f-strings or % formatting with literal format:
f"User: {user_input}"or"User: %s" % user_input - C#:
String.Format("{0}", userInput)- format is literal - Structured logging: Use JSON or key-value logging to separate data from structure
Validate If Format Must Be Dynamic
If the format truly must be dynamic (rare), apply strict validation:
- Use allowlist: Only allow known-good format strings from a predefined list
- Remove format specifiers: Strip
%,{,}and other format characters from user input - Validate against pattern: Use regex to ensure format matches expected pattern
- Reject dangerous characters: Block
%n,%s,%x, and other format specifiers - Consider alternatives: Question if dynamic formatting is really necessary
Enable Compiler Protections and Static Analysis
Use tools to detect format string vulnerabilities:
- Compiler warnings: Enable
-Wformat -Wformat-security(GCC/Clang) to detect format issues - Fortify Source: Compile with
-D_FORTIFY_SOURCE=2for runtime format checking - Static analyzers: Run Static Analysis tools regularly as part of your SDLC
- Code review: Have security-aware developers review format string usage
- Linters: Use language-specific linters that detect format string issues
Test with Format String Payloads
Verify the fix prevents format string attacks:
- Test with %x specifiers: Try
%x %x %x %x(should print as literal, not leak stack) - Test with %s specifier: Try
%s(should print literal, not crash on invalid pointer) - Test with %n specifier: Try
%n(should print literal, not write to memory) - Test with position specifiers: Try
%7$x(should be harmless) - Verify normal formatting: Ensure legitimate format operations still work
Common Vulnerable Patterns
Using user input as format string in C
#include <stdio.h>
void log_message(const char *user_input) {
// CRITICAL VULNERABILITY: user input as format string
printf(user_input); // Attacker can use %x, %s, %n
// Also vulnerable
char buffer[100];
sprintf(buffer, user_input);
// Vulnerable logging
syslog(LOG_INFO, user_input);
}
// Attack examples:
// user_input = "%x %x %x %x" -> Leaks stack memory
// user_input = "%s" -> Reads from invalid pointer, crashes
// user_input = "%n" -> Writes to memory, code execution
- sprintf(buffer, user_input) - user controls format
- logger.log(user_message) - unvalidated log format
- String.format(user_template, args) - user controls template
Secure Patterns
Use literal format strings in C
#include <stdio.h>
#include <syslog.h>
void log_message(const char *user_input) {
// Safe: literal format string, user input as argument
printf("%s", user_input);
char buffer[100];
// Safe: snprintf with literal format
snprintf(buffer, sizeof(buffer), "User message: %s", user_input);
// Safe: user input is argument, not format
syslog(LOG_INFO, "User message: %s", user_input);
}
void formatted_output(const char *username, int score) {
// Safe: all format specifiers are literal
printf("User %s scored %d points\n", username, score);
}
Why this works: The format string is a compile-time constant that cannot be influenced by user input. User data is passed as arguments to be formatted, not as the format string itself. This prevents attackers from injecting format specifiers (%x, %s, %n) that could leak memory or achieve code execution.
Use parameterized formatting in Java
public class SecureLogging {
private static final Logger logger = Logger.getLogger("App");
public void logUserAction(String userMessage, String username) {
// Safe: literal format string
String message = String.format("User action: %s", userMessage);
// Safe: parameterized logging
logger.log(Level.INFO, "User {0} performed action: {1}",
new Object[]{username, userMessage});
// Best: use parameterized logging with placeholders
logger.info("User {} performed action: {}", username, userMessage);
}
public void formatOutput(String username, int score) {
// Safe: literal format with user data as arguments
String output = String.format(
"User %s scored %d points", username, score);
System.out.println(output);
}
}
Why this works: The format template is a string literal controlled by the developer. User data is passed as separate arguments that get substituted into placeholders. Java's String.format prevents format string attacks by separating template from data.
Use safe formatting in Python
import logging
def log_message(user_input):
# VULNERABLE - user input in format string
# logging.info(user_input) # DON'T DO THIS
# Safe: user input as argument
logging.info("User message: %s", user_input)
# Safe: f-string with user input (Python 3.6+)
logging.info(f"User message: {user_input}")
# Safe: format() with literal template
logging.info("User {} sent message: {}".format("username", user_input))
def format_output(username, score):
# Safe: literal format string
message = "User %s scored %d points" % (username, score)
print(message)
# Safe: f-string
print(f"User {username} scored {score} points")
Why this works: Using %s with user input as an argument (not the format string) prevents format string injection. F-strings and .format() with literal templates ensure user data is safely interpolated without interpretation as format specifiers.
Use safe formatting in PHP
<?php
function log_message($user_input) {
// VULNERABLE
// printf($user_input); // DON'T DO THIS
// Safe: literal format, user input as argument
printf("User message: %s", $user_input);
// Safe: use sprintf with literal format
$message = sprintf("User message: %s", $user_input);
error_log($message);
}
function format_output($username, $score) {
// Safe: literal format string
printf("User %s scored %d points\n", $username, $score);
}
Why this works: The format string is always a literal string constant. User data is passed as arguments to be safely formatted. PHP's printf/sprintf functions safely handle user data when passed as arguments rather than as the format specification.