CWE-80: Cross-Site Scripting (XSS) - PHP

Overview

XSS occurs when untrusted data is included in web output without proper encoding. PHP provides multiple encoding functions, and frameworks like Laravel and Symfony offer additional protection.

Primary Defence: Use htmlspecialchars() with ENT_QUOTES | ENT_HTML5 flags and UTF-8 encoding for HTML contexts, json_encode() for JavaScript contexts, or framework auto-escaping (Laravel Blade, Twig).

Common Vulnerable Patterns

Direct Echo of User Input

<?php
// VULNERABLE - No encoding
$name = $_GET['name'];
echo "<h1>Welcome, $name</h1>";

// VULNERABLE - Interpolation without encoding
$comment = $_POST['comment'];
?>
<div class="comment"><?php echo $comment; ?></div>

Building HTML Without Escaping

<?php
// VULNERABLE - String concatenation
function displayUser($userId) {
    $user = getUser($userId);
    $html = '<div class="profile">';
    $html .= '<h2>' . $user['name'] . '</h2>';
    $html .= '<p>' . $user['bio'] . '</p>';
    $html .= '</div>';
    return $html;  // No escaping!
}

JavaScript Context Without Escaping

<?php
<script>
    // VULNERABLE - Can break out with quotes
    var message = '<?php echo $_GET['msg']; ?>';
    alert(message);
</script>

Using print_r or var_dump on User Data

<?php
// VULNERABLE - Exposes raw data
print_r($_POST);
var_dump($_GET);

Secure Patterns

htmlspecialchars() with Correct Flags

<?php
// SECURE - Proper HTML encoding
$name = $_GET['name'] ?? 'Guest';
$safeName = htmlspecialchars($name, ENT_QUOTES, 'UTF-8');
echo "<h1>Welcome, $safeName</h1>";

// SECURE - Function wrapper
function h($string) {
    return htmlspecialchars($string, ENT_QUOTES | ENT_HTML5, 'UTF-8');
}

$userBio = $_POST['bio'] ?? '';
?>
<div class="bio"><?= h($userBio) ?></div>

Important Flags:

ENT_QUOTES: Encode both single and double quotes
ENT_HTML5: Use HTML5 encoding rules
UTF-8: Specify character encoding

Why this works: htmlspecialchars() is PHP's standard function for HTML encoding. It converts dangerous characters (<, >, &, ", ') into HTML entities, preventing browsers from interpreting user input as markup. The ENT_QUOTES flag is critical - without it, single quotes aren't encoded, allowing attribute injection attacks. The ENT_HTML5 flag ensures modern HTML5 encoding rules are used. Specifying UTF-8 prevents character encoding attacks where multi-byte characters could bypass filtering. Creating a helper function like h() reduces typing and ensures consistent usage. This is the primary defense against XSS in PHP applications.

htmlentities() for More Comprehensive Encoding

<?php
// SECURE - Encodes all applicable characters
function escape($string) {
    return htmlentities($string, ENT_QUOTES | ENT_HTML5, 'UTF-8');
}

$userComment = $_POST['comment'];
?>
<p><?= escape($userComment) ?></p>

Why this works: htmlentities() provides more comprehensive encoding than htmlspecialchars() by converting all characters that have HTML entity equivalents, not just the dangerous special characters. This includes accented characters and symbols. While htmlspecialchars() is usually sufficient for XSS prevention, htmlentities() is useful when you need to ensure perfect round-trip encoding/decoding or when working with legacy systems that expect entity-encoded output. Both functions are equally secure against XSS when used with proper flags; the choice depends on whether you need the extra character conversions.

Context-Specific Encoding

HTML Context

<?php
// SECURE - HTML body content
function htmlEncode($text) {
    return htmlspecialchars($text, ENT_QUOTES | ENT_HTML5, 'UTF-8');
}

$message = $_GET['msg'];
echo '<div>' . htmlEncode($message) . '</div>';

Why this works: This pattern uses htmlspecialchars() with proper flags to encode HTML special characters, ensuring user input is displayed as text rather than interpreted as markup. The function wrapper provides a consistent API for HTML encoding throughout your application.

JavaScript Context

<?php
// SECURE - JavaScript string context
function jsEncode($string) {
    // JSON encode provides proper JavaScript escaping
    return json_encode($string, JSON_HEX_TAG | JSON_HEX_AMP | JSON_HEX_APOS | JSON_HEX_QUOT);
}

$userName = $_SESSION['username'];
?>
<script>
    var currentUser = <?= jsEncode($userName) ?>;
    console.log(currentUser);
</script>

Why this works: JavaScript context requires different encoding than HTML. json_encode() with security flags creates valid JSON that properly escapes quotes, backslashes, and control characters. HTML encoding alone is insufficient - an attacker could inject </script> which closes the script tag even when HTML-encoded. The JSON_HEX_* flags provide additional protection by encoding potentially dangerous characters like <, >, &, and quotes into Unicode escape sequences, preventing XSS in JavaScript contexts.

URL Context

<?php
// SECURE - URL parameter encoding
function urlEncode($string) {
    return urlencode($string);
}

$query = $_GET['search'];
$searchUrl = '/search?q=' . urlEncode($query);
?>
<a href="<?= htmlspecialchars($searchUrl, ENT_QUOTES, 'UTF-8') ?>">Search</a>

Why this works: URL encoding is necessary when embedding user data in URLs to prevent injection of additional query parameters or URL manipulation. urlencode() percent-encodes special characters like &, =, and ?, ensuring user input is treated as data rather than URL syntax. Note that you still need to HTML-encode the entire URL when placing it in an HTML attribute (as shown with the outer htmlspecialchars() call), demonstrating proper defense-in-depth with context-appropriate encoding at each layer.

JSON Responses

<?php
// SECURE - JSON encoding handles escaping
header('Content-Type: application/json; charset=utf-8');

$user = [
    'name' => $_GET['name'],
    'bio' => $_POST['bio']
];

echo json_encode($user, JSON_HEX_TAG | JSON_HEX_AMP | JSON_HEX_APOS | JSON_HEX_QUOT);

Why this works: json_encode() automatically escapes special characters according to JSON specification, and the Content-Type: application/json header prevents browsers from interpreting the response as HTML. The JSON_HEX_* flags provide additional security by encoding HTML special characters as Unicode escape sequences. This combination ensures that even if user input contains <script> tags, they're JSON-encoded and the browser never parses them as HTML because of the content type. This makes JSON APIs secure by default without manual HTML encoding.

urlencode() / rawurlencode() - URL Parameter Encoding

<?php
// SECURE - URL-encode for query parameters
$search = $_GET['query'];
$safe_url = urlencode($search);
echo "<a href='/search?q=$safe_url'>Search</a>";

// urlencode: Spaces become +, RFC 1738
// rawurlencode: Spaces become %20, RFC 3986 (preferred for URLs)
$redirect = rawurlencode($_GET['return_url']);
echo "<a href='/login?redirect=$redirect'>Login</a>";

Why this works: URL encoding is necessary when embedding user data in URLs to prevent injection of additional query parameters or URL manipulation. urlencode() percent-encodes special characters, turning &, =, ?, and other URL metacharacters into %26, %3D, %3F, etc. This ensures user input is treated as data, not URL syntax. The difference between urlencode() (spaces become +) and rawurlencode() (spaces become %20) matters for modern URLs - rawurlencode() follows RFC 3986 and is preferred for path components and query values.

HTMLPurifier - Rich HTML Sanitization

<?php
// SECURE - Allow safe HTML tags while removing XSS
require_once 'HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);

$userHtml = $_POST['content'];  // May contain HTML tags
$clean = $purifier->purify($userHtml);
echo $clean;  // Safe HTML output

// Allows: <p>, <b>, <i>, <a>, etc. (configurable)
// Removes: <script>, javascript:, onclick=, etc.

Why this works: When you need to accept rich HTML (like from a WYSIWYG editor), simple HTML encoding would destroy the formatting. HTMLPurifier uses a allowlist-based approach: it parses the user's HTML, removes dangerous tags and attributes, and reconstructs safe HTML. It handles complex attack vectors including CSS expression injection, JavaScript protocol handlers, and DOM clobbering. The library is highly configurable - you can specify exactly which tags and attributes are allowed.

Laravel Blade Auto-Escaping

<?php
// SECURE - Laravel Blade auto-escapes {{ }}
// resources/views/profile.blade.php
<h1>Welcome, {{ $user->name }}</h1>  <!-- Auto-escaped -->
<p>{{ $request->input('comment') }}</p>  <!-- Auto-escaped -->

// Use {!! !!} only for trusted HTML (dangerous!)
<div>{!! $trustedHtml !!}</div>  <!-- NOT escaped - use with caution -->

Why this works: Laravel's Blade templating engine automatically HTML-encodes all output within {{ }} brackets. This is a secure-by-default design that makes XSS much harder to introduce accidentally. The deliberate naming of {!! !!} (with exclamation marks suggesting danger) serves as a clear warning that you're bypassing protections.

Mews Purifier (Laravel Package)

<?php
// SECURE - Laravel HTMLPurifier wrapper
use Mews\Purifier\Facades\Purifier;

$userContent = $request->input('content');
$clean = Purifier::clean($userContent);
echo $clean;

// Or in Blade template
<div>{{ Purifier::clean($userContent) }}</div>

// Configure allowed tags in config/purifier.php
'HTML.Allowed' => 'p,b,a[href],i,strong,em'

Why this works: Mews\Purifier is a Laravel wrapper around HTMLPurifier, providing the same allowlist-based HTML sanitization with Laravel-friendly syntax. It integrates seamlessly with Blade templates and can be configured via Laravel's config files. The package provides a facade for easy access and supports different purifier profiles for different use cases (e.g., stricter rules for comments, more permissive for admin-authored content). When you call Purifier::clean(), the HTML is parsed, dangerous elements are removed according to your allowlist configuration, and safe HTML is returned. This makes it safe to use with {!! !!} in Blade or when outputting to the page.

Framework-Specific Guidance

Laravel Blade Templates

<?php
{{-- SECURE - Blade auto-escapes {{ }} --}}
<div class="user-profile">
    <h1>{{ $user->name }}</h1>
    <p>{{ $user->bio }}</p>
    <small>Joined: {{ $user->created_at }}</small>
</div>

{{-- DANGEROUS - Unescaped output --}}
<div>{!! $user->bio !!}</div>

{{-- SECURE - Sanitize before using {!! !!} --}}
@php
    $sanitized = Purifier::clean($user->richBio);
@endphp
<div class="rich-content">{!! $sanitized !!}</div>

{{-- Controller --}}
<?php
namespace App\Http\Controllers;

class ProfileController extends Controller
{
    public function show(Request $request, $id)
    {
        $user = User::findOrFail($id);

        // Blade automatically escapes these in {{ }}
        return view('profile', [
            'user' => $user,
            'message' => $request->query('msg', '')
        ]);
    }
}

Why this works: Laravel Blade provides automatic HTML escaping for all {{ }} expressions. The framework parses templates, identifies variable interpolations, and applies HTML encoding before rendering. This secure-by-default design means developers must explicitly opt out (using {!! !!}) to render raw HTML. The controller code shows that you can safely pass user input directly to the view - Blade handles the encoding. When combined with Laravel's validation and middleware, this creates multiple layers of protection against XSS.

Laravel HTMLPurifier Package:

composer require mews/purifier

<?php
use Mews\Purifier\Facades\Purifier;

// Clean HTML before storing
$cleanHtml = Purifier::clean($request->input('content'));

$article = Article::create([
    'title' => $request->input('title'),
    'content' => $cleanHtml
]);

// config/purifier.php
return [
    'encoding' => 'UTF-8',
    'finalize' => true,
    'cachePath' => storage_path('app/purifier'),
    'settings' => [
        'default' => [
            'HTML.Doctype' => 'HTML 4.01 Transitional',
            'HTML.Allowed' => 'p,br,strong,em,ul,ol,li,a[href|title]',
            'AutoFormat.AutoParagraph' => true,
            'AutoFormat.RemoveEmpty' => true,
        ],
    ],
];

Why this works: The Laravel HTMLPurifier package configuration defines exactly which HTML tags and attributes are allowed in user content. By sanitizing before storage (as shown in the example), you ensure malicious content never enters your database. The configuration is centralized in config/purifier.php, making it easy to maintain consistent security policies across your application. The allowlist approach (HTML.Allowed) means only explicitly permitted tags are kept, and dangerous elements are automatically removed.

Symfony Twig Templates

{# SECURE - Twig auto-escapes {{ }} #}
<div class="profile">
    <h1>{{ user.name }}</h1>
    <p>{{ user.bio }}</p>
</div>

{# DANGEROUS - Raw output #}
<div>{{ user.content|raw }}</div>

{# SECURE - Sanitize first #}
<div>{{ user.content|sanitize_html|raw }}</div>

{# Controller #}
<?php
namespace App\Controller;

use Symfony\Component\HttpFoundation\Request;
use Symfony\Component\HttpFoundation\Response;

class ProfileController extends AbstractController
{
    public function show(Request $request, int $id): Response
    {
        $user = $this->getDoctrine()

            ->getRepository(User::class)
            ->find($id);

        // Twig auto-escapes in templates
        return $this->render('profile.html.twig', [
            'user' => $user,
            'message' => $request->query->get('msg', '')
        ]);
    }
}

Why this works: Symfony's Twig templating engine automatically escapes all {{ }} expressions using HTML encoding, similar to Laravel Blade. The |raw filter bypasses this protection and should only be used with sanitized content. Twig's auto-escaping is context-aware - it can apply different encoding based on where the variable appears (HTML, JavaScript, CSS, URL). The controller example shows that user input can be passed directly to templates, as Twig handles the encoding. This separation of concerns (controllers handle logic, templates handle presentation with automatic security) reduces the risk of XSS.

HTML Sanitizer Component (Symfony 6.1+):

composer require symfony/html-sanitizer

<?php
use Symfony\Component\HtmlSanitizer\HtmlSanitizer;
use Symfony\Component\HtmlSanitizer\HtmlSanitizerConfig;

$config = (new HtmlSanitizerConfig())

    ->allowSafeElements()
    ->allowElement('a', ['href', 'title']);

$sanitizer = new HtmlSanitizer($config);
$cleanHtml = $sanitizer->sanitize($userInput);

Why this works: Symfony's HTML Sanitizer component (introduced in Symfony 6.1) provides a modern, performance-optimized alternative to HTMLPurifier. It uses a allowlist-based approach where you explicitly define which HTML elements and attributes are allowed. The allowSafeElements() method provides a preset of commonly safe tags, and you can further customize with allowElement(). The component is maintained by the Symfony team and integrated with Symfony's ecosystem, making it a natural choice for Symfony applications that need to accept rich HTML content.

Rich HTML Sanitization

For allowing safe HTML (e.g., WYSIWYG editors):

<?php
// Use HTML Purifier library

require_once 'vendor/autoload.php';
use HTMLPurifier;
use HTMLPurifier_Config;

function sanitizeHtml($dirtyHtml) {
    $config = HTMLPurifier_Config::createDefault();

    // Set cache path
    $config->set('Cache.SerializerPath', '/tmp');

    // Define allowed elements and attributes
    $config->set('HTML.Allowed', 'p,br,strong,em,ul,ol,li,a[href|title]');

    // Encoding
    $config->set('Core.Encoding', 'UTF-8');

    // Remove empty paragraphs
    $config->set('AutoFormat.RemoveEmpty', true);

    $purifier = new HTMLPurifier($config);
    return $purifier->purify($dirtyHtml);
}

// Usage:
$userContent = $_POST['article_content'];
$cleanContent = sanitizeHtml($userContent);

// Now safe to output with minimal escaping
echo $cleanContent;

Installation:

composer require ezyang/htmlpurifier

Why this works: This custom sanitization function shows how to configure HTMLPurifier for specific use cases. By setting HTML.Allowed, you create a allowlist of permitted tags and attributes\u2014anything not listed is removed. The cache path configuration improves performance by storing parsed definitions. Setting Core.Encoding to UTF-8 ensures proper handling of international characters. The AutoFormat.RemoveEmpty option cleans up empty paragraphs that might result from sanitization. Once content is sanitized, it's safe to output directly because dangerous elements have been removed, though you can still apply HTML encoding for additional defense-in-depth.

Input Validation (Defense in Depth)

<?php
// Validation before storage

class CommentValidator {
    public static function validate($data) {
        $errors = [];

        // Validate author name
        if (!isset($data['author']) || empty(trim($data['author']))) {
            $errors[] = 'Author name is required';
        } elseif (strlen($data['author']) > 100) {
            $errors[] = 'Author name too long';
        } elseif (!preg_match('/^[a-zA-Z0-9\s]+$/', $data['author'])) {
            $errors[] = 'Author name contains invalid characters';
        }

        // Validate comment text
        if (!isset($data['text']) || empty(trim($data['text']))) {
            $errors[] = 'Comment text is required';
        } elseif (strlen($data['text']) > 1000) {
            $errors[] = 'Comment too long';
        }

        return $errors;
    }
}

// Controller:
$errors = CommentValidator::validate($_POST);
if (empty($errors)) {
    // Still encode output even after validation!
    $comment = [
        'author' => htmlspecialchars($_POST['author'], ENT_QUOTES, 'UTF-8'),
        'text' => htmlspecialchars($_POST['text'], ENT_QUOTES, 'UTF-8')
    ];
    saveComment($comment);
}

Why this works: Input validation is a defense-in-depth measure that complements output encoding. By validating input format (alphanumeric characters only for author names, length limits), you reduce the attack surface and provide better user feedback. However, validation alone is NOT sufficient for XSS prevention\u2014as the example shows, you must still encode output even after validation. This is because: (1) validation requirements might change over time, (2) data might come from other sources (like databases or APIs) that bypass validation, and (3) defense-in-depth requires multiple layers of protection. Always encode output regardless of input validation.

Content Security Policy

<?php
// Set CSP headers

header("Content-Security-Policy: " .
    "default-src 'self'; " .
    "script-src 'self' https://trusted-cdn.com; " .
    "style-src 'self' 'unsafe-inline'; " .
    "img-src 'self' data: https:; " .
    "frame-ancestors 'none';"
);

header("X-Content-Type-Options: nosniff");
header("X-Frame-Options: DENY");
header("X-XSS-Protection: 1; mode=block");

// Or in a middleware/bootstrap file:
class SecurityHeaders {
    public static function apply() {
        if (!headers_sent()) {
            header("Content-Security-Policy: default-src 'self'");
            header("X-Content-Type-Options: nosniff");
            header("X-Frame-Options: SAMEORIGIN");
        }
    }
}

// Call early in application bootstrap
SecurityHeaders::apply();

Why this works: Content Security Policy (CSP) is a browser security feature that provides an additional layer of XSS protection. Even if an attacker manages to inject a <script> tag into your page (due to a missed encoding), CSP can prevent the browser from executing it. The default-src 'self' directive restricts content to your own domain. The script-src directive controls where scripts can be loaded from. Additional headers like X-Content-Type-Options: nosniff prevent MIME-type confusion attacks, and X-Frame-Options prevents clickjacking. CSP is defense-in-depth\u2014it doesn't replace proper output encoding but provides a safety net if encoding is missed.

Verification and Detection

Security testing requires multiple approaches - unit tests alone are insufficient.

Static Application Security Testing (SAST)

Use automated tools to detect XSS in PHP code:

Commercial Tools: - Checkmarx - Comprehensive PHP security scanning - Fortify - Deep data flow analysis - Veracode - Cloud-based PHP analysis - Snyk Code - Real-time security scanning

Open Source Tools: - Psalm with security plugin

composer require --dev vimeo/psalm psalm/plugin-phpunit
vendor/bin/psalm --init
vendor/bin/psalm-plugin enable psalm/plugin-security

PHPStan with security rules

composer require --dev phpstan/phpstan
vendor/bin/phpstan analyse src

SonarQube - Continuous security analysis

composer require --dev sonarqube/scanner

Semgrep - Pattern-based scanning

semgrep --config=p/php .
semgrep --config=p/xss .

Dynamic Application Security Testing (DAST)

Test running PHP applications:

OWASP ZAP - Automated web scanner
Burp Suite Professional - Comprehensive testing
Acunetix - PHP-aware XSS detection
Nikto - Web server vulnerability scanner

Code Review Checklist

Manually verify:

All echo statements use htmlspecialchars() with ENT_QUOTES and UTF-8
No raw <?= $variable ?> without encoding
Laravel Blade uses {{ }} not {!! !!} for user data
Twig templates don't use |raw filter on user input
JSON responses use json_encode() with security flags
All input sources identified (GET, POST, COOKIE, SERVER variables)
CSP headers configured

Framework-Specific Tools

Laravel:

# Laravel security scanner
composer require --dev enlightn/enlightn
php artisan enlightn

# Check for security issues
composer audit

Symfony:

# Symfony security checker
symfony check:security

# Code analysis
composer require --dev symfony/phpunit-bridge

WordPress:

# WordPress coding standards with security sniffs
composer require --dev wp-coding-standards/wpcs
phpcs --standard=WordPress-Extra .

Limited Role of Unit Tests

Tests verify encoding functions work but not comprehensive security:

<?php
use PHPUnit\Framework\TestCase;

// Tests verify encoding - NOT comprehensive security
class EncodingTest extends TestCase
{
    public function testHtmlspecialcharsWorks()
    {
        $malicious = '<script>alert("xss")</script>';
        $encoded = htmlspecialchars($malicious, ENT_QUOTES, 'UTF-8');

        $this->assertStringNotContainsString('<script>', $encoded);
        $this->assertStringContainsString('&lt;script&gt;', $encoded);
    }
}

Important: Passing tests does NOT mean your app is secure. Use SAST/DAST to find actual vulnerabilities.

Integration Testing

<?php
use PHPUnit\Framework\TestCase;

class XssIntegrationTest extends TestCase
{
    public function testXssPayloadIsEncoded()
    {
        $response = $this->get('/profile?bio=<script>alert(1)</script>');

        $this->assertStringNotContainsString('<script>', $response->getContent());
        $this->assertStringContainsString('&lt;script&gt;', $response->getContent());
    }
}

Security Headers Testing

# Check security headers with curl
curl -I https://yoursite.com | grep -i "content-security-policy\|x-frame-options\|x-content-type"

# Use securityheaders.com
curl -s https://securityheaders.com/?q=yoursite.com&hide=on

Continuous Security

CI/CD Integration - Run SAST in GitHub Actions/GitLab CI

.github/workflows/security.yml

- name: Security scan
  run: |
    composer install
    vendor/bin/psalm --show-info=true
    vendor/bin/phpstan analyse

Pre-commit Hooks - Prevent insecure code commits
Dependency Scanning - Monitor for vulnerable packages (composer audit)
Security Champions - Train developers on secure coding
Penetration Testing - Regular professional assessments

PHP Configuration

; php.ini security settings

; Don't expose PHP version
expose_php = Off

; Disable dangerous functions
disable_functions = exec,passthru,shell_exec,system,proc_open,popen,curl_exec,curl_multi_exec,parse_ini_file,show_source

; Session security
session.cookie_httponly = 1
session.cookie_secure = 1
session.cookie_samesite = "Strict"