CWE-532: Insertion of Sensitive Information into Log File

Overview

Sensitive information in log files occurs when applications write confidential data (passwords, tokens, PII, session IDs) to application logs, system logs, or debug output. Logs are often stored with weak access controls, retained for long periods, backed up to multiple locations, and aggregated to centralized logging systems - multiplying exposure risk.

Common examples include authentication failures logging passwords, request/response logging containing sensitive data, error messages with credentials, debug output, database query logs, and API interaction logs containing tokens or payment information.

OWASP Classification

A09:2025 - Security Logging and Alerting Failures

Risk

Sensitive data in logs creates multiple attack vectors:

Credential exposure: Passwords, API keys, tokens accessible in logs
Session hijacking: Session IDs in logs enable account takeover
Privacy violations: PII, PHI exposed in log files violating GDPR, HIPAA
Insider threats: Authorized personnel access sensitive customer data
Long-term exposure: Logs retained for months/years extend breach window
Backup exposure: Log backups distributed widely, often with weaker security
Third-party access: Log aggregation services become data breach vectors
Compliance violations: Logging credit cards violates PCI-DSS
Legal discovery: Logs subpoenaed in lawsuits may contain sensitive data

Logs are often overlooked in security assessments despite broad access.

Common Log Leakage Sources and Access Points

Common sources of log leakage:

Authentication failures: Logging failed login attempts with passwords
Request/response logging: Full HTTP bodies containing sensitive data
Error messages: Stack traces with database credentials, API keys
Debug logging: Verbose output left enabled in production
Database queries: SQL logs showing sensitive WHERE clauses or INSERTs
API interactions: Logging requests to payment gateways, external APIs
Session data: Session IDs, tokens, cookies in access logs

Logs are accessible to:

System administrators
Security teams
Support personnel
Log aggregation services (Splunk, ELK, Datadog)
Backup systems and archives
Developers with server access
Attackers who compromise servers

Remediation Steps

Core principle: Never log sensitive information; redact at source and treat logs as untrusted output.

Locate sensitive information in log files

Review the flaw details to identify what sensitive data is being logged
Identify log sources: authentication logging, request/response logging, error logging, debug output
Check what's being logged: passwords, tokens, API keys, PII, session IDs, credit cards, health data
Trace logging calls: logger.info(), console.log(), syslog(), framework logging

Sanitize all log output (Primary Defense)

Filter sensitive data before logging: Never log passwords, tokens, API keys, session IDs, PII, credit cards
Log generic messages instead: "Login attempt: user=X" not "Login attempt: user=X, password=Y"
Sanitize request/response bodies: Don't log full HTTP bodies containing sensitive data
Use sanitization functions: sanitizeForLogging(object) that redacts sensitive fields
Filter error messages: Don't log exceptions containing credentials ("Access denied for user 'admin' using password: YES")

Implement log filtering and redaction

Pattern-based redaction: Replace credit cards (\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}) with *--***-XXXX
Field-based filtering: Automatically redact fields named 'password', 'token', 'apiKey', 'ssn', 'creditCard'
Framework-level filtering: Use Logback <replace>, Log4j2 PatternLayout, Winston custom format, Python logging Filter
Regex scrubbing: Remove API keys matching /api[_-]?key[\s:=]+[A-Za-z0-9+/=]{20,}/
Email masking: Replace user@domain.com with u***@d***.com

Configure appropriate log levels for production

Disable DEBUG in production: DEBUG logs are too verbose and often contain sensitive data
Use INFO minimally: Only log business events, not detailed request/response data
WARN for security events: Failed authentication, authorization failures
ERROR for exceptions: Log exception class and generic message, not full details with sensitive data
Environment-based configuration: Separate log levels for dev (DEBUG) vs production (INFO/WARN/ERROR only)

Monitor and audit log file security

Review logs regularly for sensitive data exposure (scan for passwords, tokens, credit cards)
Restrict log file access to authorized personnel only (chmod 600 or 640)
Implement log retention policies (delete old logs, don't retain indefinitely)
Encrypt log files at rest and in transit to centralized logging
Audit log aggregation service access (who can view logs in Splunk, ELK, Datadog)

Test the log sanitization fix thoroughly

Test with authentication containing password (verify password not logged)
Test with request bodies containing credit cards (verify redacted)
Review actual log files for sensitive data
Test error scenarios (verify sensitive data not in exception logs)
Verify log filtering regex patterns work correctly
Re-scan with security scanner to confirm the issue is resolved

Secure Patterns

Structured Logging with Field Filtering

import logging
import json

class SensitiveDataFilter(logging.Filter):
    """Filter to redact sensitive fields from log records"""
    SENSITIVE_FIELDS = {'password', 'token', 'apiKey', 'api_key', 'ssn', 'creditCard', 'secret'}

    def filter(self, record):
        if hasattr(record, 'msg') and isinstance(record.msg, dict):
            record.msg = self._redact_dict(record.msg)
        return True

    def _redact_dict(self, data):
        """Recursively redact sensitive fields"""
        if not isinstance(data, dict):
            return data

        redacted = {}
        for key, value in data.items():
            if key.lower() in self.SENSITIVE_FIELDS:
                redacted[key] = '***REDACTED***'
            elif isinstance(value, dict):
                redacted[key] = self._redact_dict(value)
            else:
                redacted[key] = value
        return redacted

# Setup logger with filter
logger = logging.getLogger(__name__)
logger.addFilter(SensitiveDataFilter())

# Safe logging
user_data = {'username': 'alice', 'password': 'secret123', 'email': 'alice@example.com'}
logger.info(user_data)
# Output: {'username': 'alice', 'password': '***REDACTED***', 'email': 'alice@example.com'}

Why this works: The custom SensitiveDataFilter intercepts log records before they're written and recursively scans for sensitive field names, replacing their values with a redaction marker. This provides defense-in-depth by sanitizing data at the logging layer rather than relying on developers to manually filter each log call. The filter is reusable across the application and automatically handles nested dictionaries. By operating on the logging framework level, it catches accidental logging of sensitive data even when developers aren't aware of the risk.

Pattern-Based Redaction

// Node.js with Winston
const winston = require('winston');
const { format } = winston;

// Custom format to redact sensitive patterns
const redactSensitiveData = format((info) => {
    let message = typeof info.message === 'string' ? info.message : JSON.stringify(info.message);

    // Redact credit cards: 4111-1111-1111-1111 -> ****-****-****-1111
    message = message.replace(/\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?(\d{4})\b/g, '****-****-****-$1');

    // Redact SSN: 123-45-6789 -> ***-**-6789
    message = message.replace(/\b\d{3}-\d{2}-(\d{4})\b/g, '***-**-$1');

    // Redact API keys: api_key=abc123xyz -> api_key=***REDACTED***
    message = message.replace(/api[_-]?key[\s:=]+[A-Za-z0-9+/=]{20,}/gi, 'api_key=***REDACTED***');

    // Redact email addresses: user@example.com -> u***@e***.com
    message = message.replace(/\b([a-zA-Z])[a-zA-Z0-9._-]*@([a-zA-Z])[a-zA-Z0-9.-]*\.[a-zA-Z]{2,}\b/g, '$1***@$2***.com');

    info.message = message;
    return info;
});

const logger = winston.createLogger({
    format: winston.format.combine(
        redactSensitiveData(),
        winston.format.json()
    ),
    transports: [new winston.transports.File({ filename: 'app.log' })]
});

// Sensitive data automatically redacted
logger.info('Payment processed: card 4111-1111-1111-1111, email alice@example.com');
// Logged: "Payment processed: card ****-****-****-1111, email a***@e***.com"

Why this works: Pattern-based redaction uses regular expressions to identify and mask sensitive data formats (credit cards, SSNs, API keys, emails) regardless of field names or structure. This catches sensitive data that might appear in free-form text fields, error messages, or concatenated strings where field-based filtering wouldn't work. The redaction preserves partial information (last 4 digits of credit card, domain of email) for debugging purposes while removing the sensitive portions. By implementing this at the logger format level, it applies to all log messages automatically without requiring code changes throughout the application.

Environment-Based Log Levels

// Java with Logback - logback.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <!-- Production profile: only WARN and ERROR -->
    <springProfile name="production">
        <root level="WARN">
            <appender-ref ref="FILE"/>
        </root>
    </springProfile>

    <!-- Development profile: DEBUG allowed -->
    <springProfile name="development">
        <root level="DEBUG">
            <appender-ref ref="CONSOLE"/>
        </root>
    </springProfile>

    <!-- File appender with size limits -->
    <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>logs/application.log</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
            <fileNamePattern>logs/application-%d{yyyy-MM-dd}.%i.log</fileNamePattern>
            <maxFileSize>10MB</maxFileSize>
            <maxHistory>30</maxHistory>
        </rollingPolicy>
        <encoder>
            <pattern>%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36} - %msg%n</pattern>
        </encoder>
    </appender>
</configuration>

Why this works: Separating log levels by environment prevents verbose DEBUG and INFO logs from running in production where they're more likely to contain sensitive data. Production uses WARN/ERROR only, capturing security-relevant events and failures without detailed request/response data. DEBUG logs in development help troubleshooting but are disabled before deployment. The rolling policy automatically purges old logs after 30 days, limiting retention of potentially sensitive data. File size limits prevent log files from growing unbounded and consuming disk space, reducing the attack surface by limiting the total volume of logged data.

Verification and Testing

Log File Inspection

# Search production logs for sensitive patterns
grep -i "password" /var/log/app/*.log
grep -i "api[_-]key" /var/log/app/*.log
grep -E "[0-9]{3}-[0-9]{2}-[0-9]{4}" /var/log/app/*.log  # SSN
grep -E "[0-9]{4}[\s-]?[0-9]{4}[\s-]?[0-9]{4}[\s-]?[0-9]{4}" /var/log/app/*.log  # Credit card

# Expected: No matches, or only redacted versions
# Acceptable: "User login successful" 
# Unacceptable: "User login: password=secret123"

Test Sensitive Operations

Perform operations that might log sensitive data:

Failed login with wrong password - Check logs for password appearance
API call with auth token - Check logs for token leakage
Payment processing - Check logs for credit card numbers
Error conditions - Trigger errors and check stack traces for credentials

Expected: Generic log entries without sensitive data

Code Review for Logging

# Search codebase for dangerous logging
grep -r "logger.*password" --include="*.java" --include="*.py" --include="*.js"
grep -r "log.*token" --include="*.java" --include="*.py" --include="*.js"
grep -r "console.log.*req.body" --include="*.js"
grep -r "logger.debug.*body" --include="*.py"

# Expected: No direct logging of sensitive variables