Skip to content

CWE-532: Insertion of Sensitive Information into Log File

Overview

Sensitive information in log files occurs when applications write confidential data (passwords, tokens, PII, session IDs) to application logs, system logs, or debug output. Logs are often stored with weak access controls, retained for long periods, backed up to multiple locations, and aggregated to centralized logging systems - multiplying exposure risk.

Common examples include authentication failures logging passwords, request/response logging containing sensitive data, error messages with credentials, debug output, database query logs, and API interaction logs containing tokens or payment information.

OWASP Classification

A09:2025 - Security Logging and Alerting Failures

Risk

Sensitive data in logs creates multiple attack vectors:

  • Credential exposure: Passwords, API keys, tokens accessible in logs
  • Session hijacking: Session IDs in logs enable account takeover
  • Privacy violations: PII, PHI exposed in log files violating GDPR, HIPAA
  • Insider threats: Authorized personnel access sensitive customer data
  • Long-term exposure: Logs retained for months/years extend breach window
  • Backup exposure: Log backups distributed widely, often with weaker security
  • Third-party access: Log aggregation services become data breach vectors
  • Compliance violations: Logging credit cards violates PCI-DSS
  • Legal discovery: Logs subpoenaed in lawsuits may contain sensitive data

Logs are often overlooked in security assessments despite broad access.

Common Log Leakage Sources and Access Points

Common sources of log leakage:

  • Authentication failures: Logging failed login attempts with passwords
  • Request/response logging: Full HTTP bodies containing sensitive data
  • Error messages: Stack traces with database credentials, API keys
  • Debug logging: Verbose output left enabled in production
  • Database queries: SQL logs showing sensitive WHERE clauses or INSERTs
  • API interactions: Logging requests to payment gateways, external APIs
  • Session data: Session IDs, tokens, cookies in access logs

Logs are accessible to:

  • System administrators
  • Security teams
  • Support personnel
  • Log aggregation services (Splunk, ELK, Datadog)
  • Backup systems and archives
  • Developers with server access
  • Attackers who compromise servers

Remediation Steps

Core principle: Never log sensitive information; redact at source and treat logs as untrusted output.

Locate sensitive information in log files

  • Review the flaw details to identify what sensitive data is being logged
  • Identify log sources: authentication logging, request/response logging, error logging, debug output
  • Check what's being logged: passwords, tokens, API keys, PII, session IDs, credit cards, health data
  • Trace logging calls: logger.info(), console.log(), syslog(), framework logging

Sanitize all log output (Primary Defense)

  • Filter sensitive data before logging: Never log passwords, tokens, API keys, session IDs, PII, credit cards
  • Log generic messages instead: "Login attempt: user=X" not "Login attempt: user=X, password=Y"
  • Sanitize request/response bodies: Don't log full HTTP bodies containing sensitive data
  • Use sanitization functions: sanitizeForLogging(object) that redacts sensitive fields
  • Filter error messages: Don't log exceptions containing credentials ("Access denied for user 'admin' using password: YES")

Implement log filtering and redaction

  • Pattern-based redaction: Replace credit cards (\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}) with *--***-XXXX
  • Field-based filtering: Automatically redact fields named 'password', 'token', 'apiKey', 'ssn', 'creditCard'
  • Framework-level filtering: Use Logback <replace>, Log4j2 PatternLayout, Winston custom format, Python logging Filter
  • Regex scrubbing: Remove API keys matching /api[_-]?key[\s:=]+[A-Za-z0-9+/=]{20,}/
  • Email masking: Replace user@domain.com with u***@d***.com

Configure appropriate log levels for production

  • Disable DEBUG in production: DEBUG logs are too verbose and often contain sensitive data
  • Use INFO minimally: Only log business events, not detailed request/response data
  • WARN for security events: Failed authentication, authorization failures
  • ERROR for exceptions: Log exception class and generic message, not full details with sensitive data
  • Environment-based configuration: Separate log levels for dev (DEBUG) vs production (INFO/WARN/ERROR only)

Monitor and audit log file security

  • Review logs regularly for sensitive data exposure (scan for passwords, tokens, credit cards)
  • Restrict log file access to authorized personnel only (chmod 600 or 640)
  • Implement log retention policies (delete old logs, don't retain indefinitely)
  • Encrypt log files at rest and in transit to centralized logging
  • Audit log aggregation service access (who can view logs in Splunk, ELK, Datadog)

Test the log sanitization fix thoroughly

  • Test with authentication containing password (verify password not logged)
  • Test with request bodies containing credit cards (verify redacted)
  • Review actual log files for sensitive data
  • Test error scenarios (verify sensitive data not in exception logs)
  • Verify log filtering regex patterns work correctly
  • Re-scan with security scanner to confirm the issue is resolved

Secure Patterns

Structured Logging with Field Filtering

import logging
import json

class SensitiveDataFilter(logging.Filter):
    """Filter to redact sensitive fields from log records"""
    SENSITIVE_FIELDS = {'password', 'token', 'apiKey', 'api_key', 'ssn', 'creditCard', 'secret'}

    def filter(self, record):
        if hasattr(record, 'msg') and isinstance(record.msg, dict):
            record.msg = self._redact_dict(record.msg)
        return True

    def _redact_dict(self, data):
        """Recursively redact sensitive fields"""
        if not isinstance(data, dict):
            return data

        redacted = {}
        for key, value in data.items():
            if key.lower() in self.SENSITIVE_FIELDS:
                redacted[key] = '***REDACTED***'
            elif isinstance(value, dict):
                redacted[key] = self._redact_dict(value)
            else:
                redacted[key] = value
        return redacted

# Setup logger with filter
logger = logging.getLogger(__name__)
logger.addFilter(SensitiveDataFilter())

# Safe logging
user_data = {'username': 'alice', 'password': 'secret123', 'email': 'alice@example.com'}
logger.info(user_data)
# Output: {'username': 'alice', 'password': '***REDACTED***', 'email': 'alice@example.com'}

Why this works: The custom SensitiveDataFilter intercepts log records before they're written and recursively scans for sensitive field names, replacing their values with a redaction marker. This provides defense-in-depth by sanitizing data at the logging layer rather than relying on developers to manually filter each log call. The filter is reusable across the application and automatically handles nested dictionaries. By operating on the logging framework level, it catches accidental logging of sensitive data even when developers aren't aware of the risk.

Pattern-Based Redaction

// Node.js with Winston
const winston = require('winston');
const { format } = winston;

// Custom format to redact sensitive patterns
const redactSensitiveData = format((info) => {
    let message = typeof info.message === 'string' ? info.message : JSON.stringify(info.message);

    // Redact credit cards: 4111-1111-1111-1111 -> ****-****-****-1111
    message = message.replace(/\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?(\d{4})\b/g, '****-****-****-$1');

    // Redact SSN: 123-45-6789 -> ***-**-6789
    message = message.replace(/\b\d{3}-\d{2}-(\d{4})\b/g, '***-**-$1');

    // Redact API keys: api_key=abc123xyz -> api_key=***REDACTED***
    message = message.replace(/api[_-]?key[\s:=]+[A-Za-z0-9+/=]{20,}/gi, 'api_key=***REDACTED***');

    // Redact email addresses: user@example.com -> u***@e***.com
    message = message.replace(/\b([a-zA-Z])[a-zA-Z0-9._-]*@([a-zA-Z])[a-zA-Z0-9.-]*\.[a-zA-Z]{2,}\b/g, '$1***@$2***.com');

    info.message = message;
    return info;
});

const logger = winston.createLogger({
    format: winston.format.combine(
        redactSensitiveData(),
        winston.format.json()
    ),
    transports: [new winston.transports.File({ filename: 'app.log' })]
});

// Sensitive data automatically redacted
logger.info('Payment processed: card 4111-1111-1111-1111, email alice@example.com');
// Logged: "Payment processed: card ****-****-****-1111, email a***@e***.com"

Why this works: Pattern-based redaction uses regular expressions to identify and mask sensitive data formats (credit cards, SSNs, API keys, emails) regardless of field names or structure. This catches sensitive data that might appear in free-form text fields, error messages, or concatenated strings where field-based filtering wouldn't work. The redaction preserves partial information (last 4 digits of credit card, domain of email) for debugging purposes while removing the sensitive portions. By implementing this at the logger format level, it applies to all log messages automatically without requiring code changes throughout the application.

Environment-Based Log Levels

// Java with Logback - logback.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <!-- Production profile: only WARN and ERROR -->
    <springProfile name="production">
        <root level="WARN">
            <appender-ref ref="FILE"/>
        </root>
    </springProfile>

    <!-- Development profile: DEBUG allowed -->
    <springProfile name="development">
        <root level="DEBUG">
            <appender-ref ref="CONSOLE"/>
        </root>
    </springProfile>

    <!-- File appender with size limits -->
    <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>logs/application.log</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
            <fileNamePattern>logs/application-%d{yyyy-MM-dd}.%i.log</fileNamePattern>
            <maxFileSize>10MB</maxFileSize>
            <maxHistory>30</maxHistory>
        </rollingPolicy>
        <encoder>
            <pattern>%d{yyyy-MM-dd HH:mm:ss} [%thread] %-5level %logger{36} - %msg%n</pattern>
        </encoder>
    </appender>
</configuration>

Why this works: Separating log levels by environment prevents verbose DEBUG and INFO logs from running in production where they're more likely to contain sensitive data. Production uses WARN/ERROR only, capturing security-relevant events and failures without detailed request/response data. DEBUG logs in development help troubleshooting but are disabled before deployment. The rolling policy automatically purges old logs after 30 days, limiting retention of potentially sensitive data. File size limits prevent log files from growing unbounded and consuming disk space, reducing the attack surface by limiting the total volume of logged data.

Verification and Testing

Log File Inspection

# Search production logs for sensitive patterns
grep -i "password" /var/log/app/*.log
grep -i "api[_-]key" /var/log/app/*.log
grep -E "[0-9]{3}-[0-9]{2}-[0-9]{4}" /var/log/app/*.log  # SSN
grep -E "[0-9]{4}[\s-]?[0-9]{4}[\s-]?[0-9]{4}[\s-]?[0-9]{4}" /var/log/app/*.log  # Credit card

# Expected: No matches, or only redacted versions
# Acceptable: "User login successful" 
# Unacceptable: "User login: password=secret123"

Test Sensitive Operations

Perform operations that might log sensitive data:

  1. Failed login with wrong password - Check logs for password appearance
  2. API call with auth token - Check logs for token leakage
  3. Payment processing - Check logs for credit card numbers
  4. Error conditions - Trigger errors and check stack traces for credentials

Expected: Generic log entries without sensitive data

Code Review for Logging

# Search codebase for dangerous logging
grep -r "logger.*password" --include="*.java" --include="*.py" --include="*.js"
grep -r "log.*token" --include="*.java" --include="*.py" --include="*.js"
grep -r "console.log.*req.body" --include="*.js"
grep -r "logger.debug.*body" --include="*.py"

# Expected: No direct logging of sensitive variables

Security Checklist

  • No passwords, tokens, or API keys logged in authentication flows
  • Request/response logging filters or redacts sensitive fields
  • Error messages and stack traces don't contain credentials
  • DEBUG and TRACE log levels disabled in production
  • Sensitive field names (password, token, apiKey, ssn, creditCard) automatically redacted
  • Pattern-based redaction for credit cards, SSNs, email addresses
  • Production log level set to WARN or ERROR only
  • Log files have restrictive permissions (chmod 600 or 640)
  • Log retention policy limits how long logs are kept
  • Logs encrypted at rest and in transit to centralized systems
  • Access to log aggregation services restricted to authorized personnel
  • Searched production logs for sensitive patterns (none found)
  • Tested failed authentication scenarios (passwords not logged)
  • Tested error conditions (no sensitive data in exceptions)
  • Verified log filtering with real sensitive data samples
  • Code review confirms no direct logging of sensitive variables
  • Centralized logging configuration includes field masking

Additional Resources